DSP

再读uclinux-2008r1(bf561)内核存储区域管理(3):zone初始化

2019-07-13 16:57发布

  快乐虾 http://blog.csdn.net/lights_joy/ lights@hb165.com    本文适用于 ADI bf561 DSP 优视BF561EVB开发板 uclinux-2008r1-rc8 (移植到vdsp5) Visual DSP++ 5.0    欢迎转载,但请保留作者信息  

1.1.1   zone初始化

在对pglist_data中页表数量入页面描述初始化完成之后,转入对可用的zone的初始化,当然,实际只使用了ZONE_DMA这个区域。

1.1.1.1             free_area_init_core

这个函数也在mm/page_alloc.c中,其代码如下: /*  * Set up the zone data structures:  *   - mark all pages reserved  *   - mark all memory queues empty  *   - clear the memory bitmaps  */ static void __meminit free_area_init_core(struct pglist_data *pgdat,          unsigned long *zones_size, unsigned long *zholes_size) {      enum zone_type j;      int nid = pgdat->node_id;      unsigned long zone_start_pfn = pgdat->node_start_pfn;      int ret;        // 空语句,啥也不做      pgdat_resize_init(pgdat);      pgdat->nr_zones = 0;        // 初始化kswapd_wait这个链表,不过加上了spinlock的支持      init_waitqueue_head(&pgdat->kswapd_wait);      pgdat->kswapd_max_order = 0;           // MAX_NR_ZONES的值为,不过实际只使用ZONE_DMA      for (j = 0; j < MAX_NR_ZONES; j++) {          struct zone *zone = pgdat->node_zones + j;          unsigned long size, realsize, memmap_pages;            // size = realsize = SDRAM的页表数量,对64M SDRAM(限制为60M),其值为x3bff          size = zone_spanned_pages_in_node(nid, j, zones_size);          realsize = size - zone_absent_pages_in_node(nid, j,                                      zholes_size);            /*           * Adjust realsize so that it accounts for how much memory           * is used by this zone for memmap. This affects the watermark           * and per-cpu initialisations           */          memmap_pages = (size * sizeof(struct page)) >> PAGE_SHIFT;          if (realsize >= memmap_pages) {               realsize -= memmap_pages;               printk(KERN_DEBUG                    "  %s zone: %lu pages used for memmap/n",                    zone_names[j], memmap_pages);          } else               printk(KERN_WARNING                    "  %s zone: %lu pages exceeds realsize %lu/n",                    zone_names[j], memmap_pages, realsize);            /* Account for reserved pages */          // dma_reserve的值可以从引导程序导入,在此为0          if (j == 0 && realsize > dma_reserve) {               realsize -= dma_reserve;               printk(KERN_DEBUG "  %s zone: %lu pages reserved/n",                        zone_names[0], dma_reserve);          }            // is_highmem_idx恒为0          if (!is_highmem_idx(j))               nr_kernel_pages += realsize;          nr_all_pages += realsize;            zone->spanned_pages = size;          zone->present_pages = realsize;          zone->name = zone_names[j];          spin_lock_init(&zone->lock);          spin_lock_init(&zone->lru_lock);          zone_seqlock_init(zone);    // 空语句          zone->zone_pgdat = pgdat;            zone->prev_priority = DEF_PRIORITY;            zone_pcp_init(zone);          INIT_LIST_HEAD(&zone->active_list);          INIT_LIST_HEAD(&zone->inactive_list);          zone->nr_scan_active = 0;          zone->nr_scan_inactive = 0;          zap_zone_vm_stats(zone);    // vm_stat成员清0          atomic_set(&zone->reclaim_in_progress, 0);          if (!size)               continue;            ret = init_currently_empty_zone(zone, zone_start_pfn,                             size, MEMMAP_EARLY);          BUG_ON(ret);          zone_start_pfn += size;      } }   当程序运行到此的时候,pgdat->node_id的值为0pgdat->node_start_pfn的值也为0 从上述代码可以看出,nr_kernel_pagesnr_all_pages这两个值都表示可用的页的数量,其表示的内存范围从060M不包含page数组所占用的页。对于64MSDRAM(实际限制为60M),不启用MTD的情况,其值为0x3b6a 从上述代码还可以看出zone->spanned_pageszone->present_pages这两个成员都表示可用的SDRAM的页的数量,但present_pagesspanned_pages的基础上减去了page数组所占用的页数。对于64MSDRAM,不启用MTD而言,内存实际限制在60Mspanned_pages的值为0x3bff,而present_pages的值则为0x3b6a

1.1.1.2             init_currently_empty_zone

如果一个zone的大小非0,将调用本函数,实际上内核也仅对ZONE_DMA这个区域调用此函数。这个函数位于mm/page_alloc.c __meminit int init_currently_empty_zone(struct zone *zone,                        unsigned long zone_start_pfn,                        unsigned long size,                        enum memmap_context context) {      struct pglist_data *pgdat = zone->zone_pgdat;      int ret;      ret = zone_wait_table_init(zone, size);      if (ret)          return ret;      pgdat->nr_zones = zone_idx(zone) + 1;        zone->zone_start_pfn = zone_start_pfn;        memmap_init(size, pgdat->node_id, zone_idx(zone), zone_start_pfn);        zone_init_free_lists(pgdat, zone, zone->spanned_pages);        return 0; } 当调用此函数时,zone_start_pfn的值为0size的值为整个SDRAM区域的页面数量,对于64M内存(实际限制为60M),这个值为0x3bffcontext的值则为MEMMAP_EARLY zone_wait_table_init函数将计算zonewait_table相关成员的值。 memmap_init这个函数将为每个page结构体设置初始值。 zone_init_free_lists这个函数将初始化与buddy算法有关的free_area成员。

1.1.1.3             zone_wait_table_init

关于zone里面的wait_table相关的3个成员,注释已经说得很清楚了:      /*       * wait_table      -- the array holding the hash table       * wait_table_hash_nr_entries    -- the size of the hash table array       * wait_table_bits -- wait_table_size == (1 << wait_table_bits)       *       * The purpose of all these is to keep track of the people       * waiting for a page to become available and make them       * runnable again when possible. The trouble is that this       * consumes a lot of space, especially when so few things       * wait on pages at a given time. So instead of using       * per-page waitqueues, we use a waitqueue hash table.       *       * The bucket discipline is to sleep on the same queue when       * colliding and wake all in that wait queue when removing.       * When something wakes, it must check to be sure its page is       * truly available, a la thundering herd. The cost of a       * collision is great, but given the expected load of the       * table, they should be so rare as to be outweighed by the       * benefits from the saved space.       *       * __wait_on_page_locked() and unlock_page() in mm/filemap.c, are the       * primary users of these fields, and in mm/page_alloc.c       * free_area_init_core() performs the initialization of them.       */ 下面来看看它们是怎样初始化的: static noinline __init_refok int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages) {      int i;      struct pglist_data *pgdat = zone->zone_pgdat;      size_t alloc_size;        /*       * The per-page waitqueue mechanism uses hashed waitqueues       * per zone.       */      zone->wait_table_hash_nr_entries =           wait_table_hash_nr_entries(zone_size_pages);      zone->wait_table_bits =          wait_table_bits(zone->wait_table_hash_nr_entries);      alloc_size = zone->wait_table_hash_nr_entries                        * sizeof(wait_queue_head_t);        if (system_state == SYSTEM_BOOTING) {          zone->wait_table = (wait_queue_head_t *)               alloc_bootmem_node(pgdat, alloc_size);      } else {          /*           * This case means that a zone whose size was 0 gets new memory           * via memory hot-add.           * But it may be the case that a new node was hot-added.  In           * this case vmalloc() will not be able to use this new node's           * memory - this wait_table must be initialized to use this new           * node itself as well.           * To use this new node's memory, further consideration will be           * necessary.           */          zone->wait_table = (wait_queue_head_t *)vmalloc(alloc_size);      }      if (!zone->wait_table)          return -ENOMEM;        for(i = 0; i < zone->wait_table_hash_nr_entries; ++i)          init_waitqueue_head(zone->wait_table + i);        return 0; } 这里比较值得关注的是wait_table_hash_nr_entries的计算,它通过wait_table_hash_nr_entries函数来计算:   /*  * Helper functions to size the waitqueue hash table.  * Essentially these want to choose hash table sizes sufficiently  * large so that collisions trying to wait on pages are rare.  * But in fact, the number of active page waitqueues on typical  * systems is ridiculously low, less than 200. So this is even  * conservative, even though it seems large.  *  * The constant PAGES_PER_WAITQUEUE specifies the ratio of pages to  * waitqueues, i.e. the size of the waitq table given the number of pages.  */ #define PAGES_PER_WAITQUEUE 256   static inline unsigned long wait_table_hash_nr_entries(unsigned long pages) {      unsigned long size = 1;        pages /= PAGES_PER_WAITQUEUE;        while (size < pages)          size <<= 1;        /*       * Once we have dozens or even hundreds of threads sleeping       * on IO we've got bigger problems than wait queue collision.       * Limit the size of the wait table to a reasonable size.       */      size = min(size, 4096UL);        return max(size, 4UL); } 在这里pages为内存区的总页数,对于64M内存(限制为60M),其值为0x3bff。此函数计算所得的结果将为0x40。即zone->wait_table_hash_nr_entries的值将为0x40,而zone->wait_table_bits的值将为6  

1.1.1.4             zone_init_free_lists

void zone_init_free_lists(struct pglist_data *pgdat, struct zone *zone,                    unsigned long size) {      int order;      for (order = 0; order < MAX_ORDER ; order++) {          INIT_LIST_HEAD(&zone->free_area[order].free_list);          zone->free_area[order].nr_free = 0;      } } #define MAX_ORDER 11 buddy算法中,将空闲页面分为11个块链表,每个块链表分别包含大小为12481632641282565121024个连续的页。为了表示此链表,在zone结构体中使用了      /*       * free areas of different sizes       */      spinlock_t         lock;      struct free_area   free_area[MAX_ORDER]; 进行表示,在这个函数中实际就是初始化这个成员。      

参考资料

uClinux2.6(bf561)中的CPLB(2008/2/19) uclinux2.6(bf561)中的bootmem分析(1):猜测(2008/5/9) uclinux2.6(bf561)中的bootmem分析(2):调用前的参数分析(2008/5/9) uclinux2.6(bf561)中的bootmem分析(3)init_bootmem_node(2008/5/9) uclinux2.6(bf561)中的bootmem分析(4)alloc_bootmem_pages(2008/5/9) uclinux2.6(bf561)内核中的paging_init(2008/5/12) uclinux-2008r1(bf561)内核的icache支持(1):寄存器配置初始化(2008/5/16) uclinux-2008r1(bf561)内核的icache支持(2)icplb_table的生成(2008/5/16)