1.1
用top-down的方法分析AP读一个Nand
Flash上的file的全过程
我先简单看一个例子,看User Application如何打开一个Yaffs2 file并读写之:
int main (int argc, char* argv[])
{
/* Open the file for reading. */
int fd = open (argv[1], O_RDONLY);
do {
bytes_read = read (fd, buffer, sizeof (buffer));
offset += bytes_read;
}
close (fd);
return 0;
}
1.1.1
int fd = open (argv[1], O_RDONLY)的来龙去脉
当APP打开一个file的时候,无论该file是什么样的file(device
file,FIFO,extn file,fat file,Yaffs2 file,proc
file,sysfs file等等)其前半部分是完成一致的,这个过程我们已经在4.3中已经分析过,由于我们现在以打开一个regular Yaffs2 file为例子来进行分析,所以我们从yaffs_FillInodeFromObject
function开始作为入口了:
static void yaffs_FillInodeFromObject(struct inode *inode, yaffs_Object * obj)
{
。。。。。。
case S_IFREG: /* file */
inode->i_op = &yaffs_file_inode_operations;
inode->i_fop = &yaffs_file_operations;
inode->i_mapping->a_ops = &yaffs_file_address_operations;
break;
。。。。。。。
}
而:
static struct file_operations yaffs_file_operations = {
.read = do_sync_read,
.write = do_sync_write,
.aio_read = generic_file_aio_read,
.aio_write = generic_file_aio_write,
.mmap = generic_file_mmap,
.flush = yaffs_file_flush,
.fsync = yaffs_sync_object,
.splice_read = generic_file_splice_read,
.splice_write = generic_file_splice_write,
};
static struct address_space_operations yaffs_file_address_operations = {
.readpage = yaffs_readpage,
.writepage = yaffs_writepage,
.prepare_write = yaffs_prepare_write,
.commit_write = yaffs_commit_write,
};
到此对该file的操作的file operation和address_space_operations已经建立起来了。接下来我们就要开始read了。
1.1.1.1
address space的概念
Linux Kernel从
disk或
flash读写一个
phyiscal file上的数据的时候为了提高读写的
performance,尤其是多个
process读写同一个
file,或是同一个
process多次进行读写的时候,建立了
page
cache的管理机制。这个和
hardware cache有点类似了:
◆
read 一个
page的
data时,先从
page cache中查找,有就不用去
flash中读,没有就从
flash中读出来,并且
allocate
one page frame,并将其加入到
page cache中。
◆
write时类似。但是写入
page cache的数据何时
update到
flash,一般有两种作法了:一是同步直接写入
flash,二是
deferred
write。不同的
file system做法不同,下面我们会分析
yaffs2是怎样做的。
而管理这个page cache的就是address space,其structure的定义为:
struct address_space {
struct inode *host; /* owner: inode, block_device */
struct radix_tree_root page_tree; /* radix tree of all pages */
rwlock_t tree_lock; /* and rwlock protecting it */
unsigned int i_mmap_writable;/* count VM_SHARED mappings */
struct prio_tree_root i_mmap; /* tree of private and shared mappings */
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
spinlock_t i_mmap_lock; /* protect tree, count, list */
unsigned int truncate_count; /* Cover race condition with truncate */
unsigned long nrpages; /* number of total pages */
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops;
/* methods */ //定义了操作这些page cache的方法
unsigned long flags; /* error bits/gfp mask */
struct backing_dev_info *backing_dev_info; /* device readahead, etc */
spinlock_t private_lock; /* for use by the address_space */
struct list_head private_list; /* ditto */
struct address_space *assoc_mapping; /* ditto */
} __attribute__((aligned(sizeof(long))));
struct inode中由
inode->i_mapping指向该
structure。其中我们需要重点关注的是上面红 {MOD}
highlight出来的
a_ops,其对应的
struct定义如下:(仅仅列出我们关注的部分)
struct address_space_operations {
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
。。。。。。
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
。。。。。。
};
1.1.2
bytes_read = read (fd, buffer, sizeof (buffer))的来龙去脉
AP进行
read时:
1.
sys_read ==> vfs_read ==> ret = file->f_op->read(file, buf, count, pos),即do_sync_read。
2.
do_sync_read ==> ret = filp->f_op->aio_read(&kiocb, &iov, 1, kiocb.ki_pos);
即
generic_file_aio_read
3.
generic_file_aio_read ==> do_generic_file_read(filp,ppos,&desc,file_read_actor); ==> do_generic_mapping_read:
a)
首先它就是从page cache中找是否对应的data已经在page cache中了,如果在就copy to user (via file_read_actor function );
b)
如果没有,就启动read ahead。详见:do_generic_mapping_read ==> page_cache_sync_readahead ==> ondemand_readahead ==> __do_page_cache_readahead ==> read_pages,由于在yaffs_file_address_operations中没有定义readpages,所以实际上它是通过多次的readpage读来完成read
ahead。
c)
如果在page cache中,但是其内容已经不是up to date了,那么也要再次读:error = mapping->a_ops->readpage(filp, page);
总之,都会由a_ops->readpage进行读取一个page,即yaffs_readpage。下面就开始到Yaffs2
file system了
4.
yaffs_readpage ==>
yaffs_readpage_unlock ==> yaffs_readpage_nolock ==> yaffs_ReadDataFromFile ==> yaffs_ReadChunkDataFromObject ==> yaffs_ReadChunkWithTagsFromNAND
==> result = dev->readChunkWithTagsFromNAND(dev, realignedChunkInNAND, buffer, tags);。在
yaffs_internal_read_super中
dev->readChunkWithTagsFromNAND
= nandmtd2_ReadChunkWithTagsFromNAND;
5.
nandmtd2_ReadChunkWithTagsFromNAND ==> retval = mtd->read(mtd,
addr, dev->nDataBytesPerChunk, &dummy, data);,我们在实现
U3 Nand Flash Device Driver的时候:
ad6900_nand_probe ==> nand_scan ==> nand_scan_tail ==> mtd->read =
nand_read; 至此,我们的code到了MTD
Block Device Driver了。
6.
nand_read ==> nand_do_read_ops ==> ret = chip->ecc.read_page(mtd, chip, bufpoi); 同上ad6900_nand_probe
==> nand_scan ==> nand_scan_tail
中:
chip->ecc.read_page = nand_read_page_swecc;
7.
nand_read_page_swecc ==>
chip->ecc.read_page_raw(mtd, chip, buf); 并做software ecc校验。同上ad6900_nand_probe
==> nand_scan ==> nand_scan_tail
中:chip->ecc.read_page_raw =
nand_read_page_raw;
8.
nand_read_page_raw ==> chip->read_buf(mtd, buf, mtd->writesize);
该chip是在ad6900_nand_probe ==>
ad6900_nand_init_chip
中初始化:chip->read_buf
= ad6900_nand_read_buf;
9.
ad6900_nand_read_buf就是我们要实现的AD6900上具体的Nand
Flash读取data的实现。这里也到了Hardware的具体操作了。
讨论:
1.
Yaffs2
不支持O_DIRECT的,为什么?有兴趣的朋友可以自行阅读code来找到来龙去脉(参见__dentry_open,且yaffs_file_address_operations也没有实现direct_io,因为Yaffs2
file基本上都是直接写入到flash,有一个例外,详见后面的讨论4。但是这个例外只是写入了yaffs自身实现的internal
cache,而不是page cache。在VFS中定义的O_DIRECT是相对与page
cache而言的。)
2.
为什么要read ahead?它有什么好处?在特定的时候反而带来坏处,
3.
我们可以关闭read ahead么?一是可以通过menuconfig进行全局的关闭,但是这是非常不好了。二是我们可以用posix_fadvise
(POSIX_FADV_NORMAL,其所对应的syscall为:sys_fadvise64_64,syscall
no为:270,请自行分析了)针对某个file进行单一的设定,如改变一次read
ahead的pages的数目,目前Linux 2.6.23 default value为32(为什么是32个呢,大家自行根据Linux
Kernel source code可以分析出来么?提示:yaffs_lookup ==> yaffs_lookup ==> iget ==> iget_locked ==> get_new_inode_fast ==> alloc_inode ==> mapping->backing_dev_info
= &default_backing_dev_info;)。
1.2
用top-down的方法分析AP写一个Nand
Flash上的file的全过程
AP进行
write时:
1.
sys_write ==> vfs_write ==> ret = file->f_op->write(file, buf, count, pos);
即:
do_sync_write
2.
do_sync_write ==>
ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos); 即:
generic_file_aio_write
3.
generic_file_aio_write ==>
__generic_file_aio_write_nolock ==>
generic_file_buffered_write:
a)
检查当前要写的page是否已经存在于当前file的indoe之page cache中,如果不在就分配一个page,并添加到page
cache中。
b)
status = a_ops->prepare_write(file, page, offset, offset+bytes);
即:
yaffs_prepare_write。如果当前要写的
page,不是
uptodate,且要写的
data不是整个
page,那么我们首先就要通过
yaffs_readpage_nolock(参考上一节)将此
page从
flash中读出,简单从
code中分析一下为什么。
c)
将要写的数据copy from user 到该page中。
d)
a_ops->commit_write(file, page, offset, offset+bytes);
即yaffs_commit_write。
e)
yaffs_commit_write ==> yaffs_file_write ==> yaffs_WriteDataToFile ==>
yaffs_WriteChunkDataToObject ==> yaffs_WriteNewChunkWithTagsToNAND ==> yaffs_WriteChunkWithTagsToNAND ==>
dev->writeChunkWithTagsToNAND。在
yaffs_internal_read_super中
dev->writeChunkWithTagsToNAND
= nandmtd2_WriteChunkWithTagsToNAND;
f)
nandmtd2_WriteChunkWithTagsToNAND ==>
,之后由于和读类似,我们就不再分析,请自己补充之。