Booting ARM Linux SMP on MPCore
Seealso
Running ARM linux on asoftware model
Credits
Author
Charly Bechara [c_becharahotmail.com]
Using facilities kindly provided by
NXP
It is important to understand what happens from the time the power buttonis switched on until the popup of the command shell environment with all the 4CPU cores running. The boot process of an embedded Linux kernel differs fromthe PC environment,
typically because the environment setting and the availablehardware change from one platform to another. For example, an embedded systemdoesn’t have a hard disk or a PC BIOS, but include a boot monitor and flash memories.So basically, the main difference between
each architecture’s boot process isin the application used to find and load the kernel. Once the kernel is in thememory, the same sequence of events occurs for all the CPU architectures, withsome overloaded functionalities specific to each of them.
The Linux boot process can be represented in 3 stages as shown in Figure1:
When we press the system power on, a Boot Monitor code executes from apredefined address location from the NOR flash memory (0x00000000). The BootMonitor initializes the PB11MPCore
?
hardware peripherals, and thenlaunches the real bootloader U-Boot in case an automatic script is provided;else the user runs U-Boot manually by entering the appropriate command in theBoot Monitor command shell. U-Boot initializes the main memory and copies
thecompressed Linux kernel image (uImage), which is located either on the on-boardNOR flash memory, MMC, CompactFlash or on a host PC, to the main memory to beexecuted by the ARM11 MPCore, after passing some initialization parameters tothe kernel. Then the
Linux kernel image decompresses itself, startsinitializing its data structures, creates some user processes, boots all theCPU cores and finally runs the command shell environment in the user-space.
This was a brief introduction to the whole boot process. In the nextsections, we will explain each stage in details and highlight the Linux sourcecode that is executing the corresponding stage.
1 System startup(Boot Monitor)
When the system is powered on or reset, all CPUs of the ARM11 MPCore fetchthe next instruction from the reset vector address to their PC register. In ourcase, it is the first address in the NOR flash memory (0x00000000), where theBoot Monitor
program exists. Only CPU0 continues to execute the Boot Monitorcode and the secondary CPUs (CPU1, CPU2, and CPU3) execute a WFI instruction,which is actually a loop that checks the value of SYS_FLAGS register. Thesecondary CPUs start executing meaningful code
during Linux Kernel bootprocess, which is explained in details later in this section in paragraph ARMLinux.
The Boot Monitor is the standard ARM application that runs when the systemis booted and is built with the ARM platform library.
On reset, the Boot Monitor performs the following actions:
- Executes on CPU0 the main code and on the secondary CPUs the WFI instruction
- Initialize the memory controllers and configure the main board peripherals
- Set up a stack in memory
- Copy itself to the main memory DRAM
- Reset the boot memory remapping
- Remap and redirect the C library I/O routines depending on the settings of the switches on the front panel of the PB11MPCore?
(output: UART0 or LCD – input: UART0 or keyboard)
- Run a bootscript automatically, if it exists in the NOR flash memory and the corresponding switch is ON on the front panel of the PB11MPCore?.
Else, the Boot Monitor command shell is prompted
So basically, the Boot Monitor application shipped with the board issimilar to BIOS in the PC. It has limited functionalities and cannot boot aLinux kernel image. So, another bootloader is needed to complete the bootingprocess, which is U-Boot.
The U-Boot code is cross-compiled to the ARM platformand flashed to the NOR flash memory. The final step is to launch U-Boot imagefrom the Boot Monitor command line. This can be done using a script or manuallyby entering the appropriate command.
2 Bootloader (U-Boot)
When the bootloader is called by the Boot Monitor, it is located in theNOR flash memory without access to system RAM because the memory controller isnot initialized properly as U-Boot expects. So how U-Boot moves itself from theflash memory
to the main memory?
In order to get the C environment working properly and run theinitialization code, U-Boot needs to allocate a minimal stack. In case of theARM11 MPCore, this is done in a locked part of the L1 data cache memory. Inthis way, the cache memory
is used as temporary data storage to initializeU-Boot before the SDRAM controller is setup. Then, U-Boot initializes the ARM11MPCore, its caches and the SCU. Next, all available memory banks are mappedusing a preliminary mapping and a simple memory test is
run to determine thesize of the SDRAM banks. Finally, the bootloader installs itself at the upperend of the SDRAM area and allocates memory for use by malloc() and for theglobal board info data. In the low memory, the exception vector code is copied.Now, the
final stack is set up.
At this stage, the 2nd bootloader U-Boot is in the main memory and a Cenvironment is set up. The bootloader is ready to launch the Linux kernel imagefrom a pre-specified location after passing some boot parameters to it. Inaddition, it initializes
a serial or video console for the kernel. Finally, itcalls the kernel image by jumping directly to the ‘start’ label inarch/arm/boot/compressed/head.S assembly file, which is the start header of theLinux kernel decompressor.
The bootloader [or boot monitor/loader combination - Peter Pearse] canperform lot of functionalities; however a minimal set of requirements should bealways achieved:
- Configure the system's main memory:
The Linux kernel does not have the knowledge of the setup or configuration of the RAM within a system.
This is the task of the bootloader to find and initialize the entire RAM that the kernel will use for volatile data storage in a machine dependent manner,
and then passes the physical memory layout to the kernel using ATAG_MEM parameter, which will be explained later. - Load the kernel image at the correct memory address:
The ‘uImage’ encapsulates a compressed Linux kernel image with header information that is marked by a special magic number and a data portion. Both the header and data are secured against corruption by a CRC32 checksum. In the data field, the start and end
offsets of the size of the image are stored. They are used to determine the length of the compressed image in order to know how much memory can be allocated. The ARM Linux kernel expects to be loaded at address 0x7fc0 in the main memory. - Initialize a console: Since a serial console is essential on all the platforms in order to allow communication with the target and early kernel debugging facilities, the bootloader should initialize and enable one serial port on the target. Then it passes
the relevant console parameter option to the kernel in order to inform it of the already enabled port.
- Initialize the boot parameters to pass to the kernel:
The bootloader must pass parameters to the kernel in form of tags, to describe the setup it has performed, the size and shape of memory in the system and, optionally, numerous other values as described in Table 1:
Tag name
Description
ATAG_NONE
Empty tag used to end list
ATAG_CORE
First tag used to start list
ATAG_MEM
Describes a physical area of memory
ATAG_VIDEOTEXT
Describes a VGA text display
ATAG_RAMDISK
Describes how the ramdisk will be used in kernel
ATAG_INITRD2
Describes where the compressed ramdisk image is placed in memory
ATAG_SERIAL
64 bit board serial number
ATAG_REVISION
32 bit board revision number
ATAG_VIDEOLFB
Initial values for vesafb-type framebuffers
ATAG_CMDLINE
Command line to pass to kernel
- Obtain the ARM Linux machine type:
The bootloader should provide the machine type of the ARM system, which is a simple unique number that identifies the platform. It can be hard coded in the source code since it is pre-defined, or read from some board registry. The machine type number can be
fetched from
ARM-Linux project website. - Enter the kernel with the appropriate register values:
Finally, and before starting execution of the Linux kernel image, the ARM11 MPCore registers must be set in an appropriate way:
- Supervisor (SVC) mode
- IRQ and FIQ interrupts disabled
- MMU off (no translation of memory addresses is required)
- Data cache off
- Instruction cache may be either on or off
- CPU register0 = 0
- CPU register1 = ARM Linux machine type
- CPU register2 = physical address of the parameter list
- 这句话是说u-boot将参数分别存放在R0,R1,R2中,所以u-boot在最后一个函数是theKernel(0, S3C2440_MATHINE_TYPE, DRAM_TAGS_START); 这个在反汇编并且根据apcs调用规则中,会将r0,r1,r2填入相应的数据。
3 ARM Linux
As mentioned earlier, the bootloader jumped to the compressed kernel imagecode and passed some initialization parameters denoted by ATAG. The beginningof the compressed Linux kernel image is the ‘start’ label inarch/arm/boot/compressed/head.S
assembly file. From this stage, the bootprocess comprises of 3 main stages. First the kernel decompresses itself. Then,the processor-dependent (ARM11 MPCore) kernel code executes which initializesthe CPU and memory. And finally, the processor-independent kernel
code executeswhich startup the ARM Linux SMP kernel by booting up all the ARM11 cores andinitializes all the kernel components and data structures.
The flowchart in Figure 2 summarizes the boot process of the ARM Linuxkernel:
In the Linux SMP environment, CPU0 is responsible for initializing allresources just as in a uniprocessor environment. Once configured, access to aresource is tightly controlled using synchronization rules such as a spinlock.CPU0 will configure
the boot page translation so secondary cores boot from adedicated section of Linux rather than the default reset vector. When secondarycores boot the same Linux image, they will enter Linux at a specific locationso they simply initialize resources specific
only to their core (caches, MMU)and don’t reinitialize resources that have already been configured, and thenexecute the idle process with PID 0.
A step-by-step walkthrough for the Linux kernel boot process is providedbelow:
This appendix will provide a walkthrough in the Linux kernel boot processfor the ARM-based systems, specifically the ARM11 MPCore, by highlighting thesource code of the kernel that executes each step. The boot process comprisesof 3 main stages:
a) Image decompression:
Ø U-Boot jumps at the ‘start’ label in arch/arm/boot/compressed/head.S
Ø The parameters passed by U-Boot in r1 (CPU architecture ID) and r2 (ATAGparameter list pointer) are saved
Ø Turn off Interrupts, Execute arch dependent code, adjust addresses ifkernel is relocated
Ø Now the C environment is setup sufficiently
Ø Turn on the cache memory again by calling cache_on procedure which walkthrough proc_types list and find the corresponding ARM architecture. For theARM11 MPCore (ARM v6), __armv4_mmu_cache_on, __armv4_mmu_cache_off, and__armv6_mmu_cache_flush
procedures are called to turn on, off, and flush thecache memory to RAM respectively
Ø Assign the appropriate values to the registers and stack pointer. i.e:r4= kernel physical start address – sp=decompressor code
Ø Check if the decompressed image will overwrite the compressed image andjump to the appropriate routine
Ø Call the decompressor routine decompress_kernel() which is located inarch/arm/boot/compressed/misc.c. The decompress_kernel() will display the“Uncompressing Linux...” message on the output terminal, followed by callinggunzip() function, then
displaying “ done, booting the kernel” message.
Ø Flush the cache memory contents to RAM using __armv6_mmu_cache_flush
Ø Turn off the cache using __armv4_mmu_cache_off, because the kernelinitialization routines expects that the cache memory is off at the beginning
Ø Jump to start of kernel in RAM, where its address is stored in r4register. The kernel start address is specific for each platform architecture.For the PB11MPCore
?,
it is stored inarch/arm/mach-realview/Makefile.boot in zreladdr-y variable (zreladdr-y :=0x00008000)
b)Processor dependent (ARM) specific kernel code:
The kernel startup entry point is in stext procedure inarch/arm/kernel/head.S file, where the decompressor has jumped after turningoff the MMU and cache memory and setting the appropriate registers. At thisstage, the following sequence of events
is done in stext:(arch/arm/kernel/head.S)
Ø Switch to Supervisor protected mode and disable all the interrupts
Ø Lookup for the processor type using __lookup_processor_type proceduredefined in arch/arm/kernel/head-common.S. This will return a pointer to aproc_info_list defined in arch/arm/include/asm/procinfo.h filled with cpuspecific code in arch/arm/mm/proc-v6.S:__v6_proc_info
for the ARM11 MPCore
Ø Lookup for the machine type using __lookup_machine_type proceduredefined in arch/arm/kernel/head-common.S. This will return a pointer to amachine_desc struct defined for the PB11MPCore
?
Ø Create the page table using __create_page_tables procedure, which willsetup the barest amount of page tables required to get the kernel running; inother words to map in the kernel code
Ø Jump to __v6_setup procedure in arch/arm/mm/proc-v6.S, which willinitialize the TLB, cache and MMU state of CPU0
Ø Enable the MMU using __enable_mmu procedure, which will setup someconfiguration bits and then call __turn_mmu_on (arch/arm/kernel/head.S)
Ø In __turn_mmu_on, the appropriate control registers are set and then itjumps to __switch_data which will execute the first procedure __mmap_switched(arch/arm/kernel/head-common.S)
Ø In __mmap_switched procedure, the data segment is copied to RAM and theBSS segment is cleared. Finally, it jumps to start_kernel() routine in theinit/main.c source code where the Linux kernel starts
c)Processor independent kernel code
From this stage on, it is a common sequence of events for the boot processof the Linux Kernel independent of the hardware architecture. Well somefunctions are still hardware dependent, and they actually override theindependent implementation.
We will concentrate mainly on how the SMP part ofLinux will boot and how the CPUs in the ARM11 MPCore are initialized.
In start_kernel(): (init/main.c)
- Disable the interrupts on CPU0 using local_irq_disable() (include/linux/irqflags.h)
- Lock the kernel using lock_kernel() to prevent from being interrupted or preempted from high priority interrupts (include/linux/smp-lock.h)
- Initialize the kernel tick control using tick_init() (kernel/time/tick-common.c)
- Activate the first processor (CPU0) using boot_cpu_init() (init/main.c)
- Initialize the memory subsystem using page_address_init() (mm/highmem.c)
- Display the kernel version on the console using printk(linux_banner) (init/version.c)
- Setup architecture specific subsystems such as memory, I/O, processors, etc…by using setup_arch(&command_line). The command_line is the parameter list passed by U-Boot when calling the kernel. (arch/arm/kernel/setup.c)
- In setup_arch(&command_line) function, we execute architecture dependent code. For the ARM11 MPCore, smp_init_cpus() is called, which initialize the CPU map. It is in this stage where the kernel knows that there are 4 cores in the ARM11 MPCore. (arch/arm/mach-realview/platsmp.c)
- Initialize one processor (CPU0 in this case) using cpu_init() which dumps the cache information, initializes SMP specific information, and sets up the per-cpu stacks (arch/arm/kernel/setup.c)
- Setup a multiprocessing environment using setup_per_cpu_areas(). This function determines the size of memory a single CPU requires, allocates and initializes the memory for each corresponding CPU (4 CPUs). This way, each CPU has its own region to place
its data. (init/main.c)
- Allow the booting processor (CPU0) to access its own storage data already initialized using smp_prepare_boot_cpu() (arch/arm/kernel/smp.c)
- Setup the Linux scheduler using sched_init() (kernel/sched.c)
- Initialize a runqueue for each of the 4 CPUs with its corresponding data (kernel/sched.c)
- Fork an idle thread for CPU0 using init_idle(current, smp_processor_id()) (kernel/sched.c)
ØInitialize the memory zones such as DMA, normal, high memory usingbuild_all_zonelists() (mm/page_alloc.c)
Ø Parse the arguments passed to Linux kernel using parse_early_param()(init/main.c) and parse_args() (kernel/params.c)
Ø Initialize the interrupt table and GIC and trap exception vectors usinginit_IRQ() (arch/arm/kernel/irq.c) and trap_init() (arch/arm/kernel/traps.c).Also assign the processor affinity for each interrupt.
Ø Prepare the boot CPU (CPU0) to accept notifications from tasklets usingsoftirq_init() (kernel/softirq.c)
Ø Initialize and run the system timer using time_init()(arch/arm/kernel/time.c)
Ø Enable the local interrupts on CPU0 using local_irq_enable()(include/linux/irqflags.h)
Ø Initialize the console terminal using console_init() (drivers/char/tty_io.c)
Ø Find the total number of free pages in all memory zones using mem_init()(arch/arm/mm/init.c)
Ø Initialize the slab allocation using kmem_cache_init() (mm/slab.c)
Ø Determine the speed of the CPU clock in BogoMips
? using calibrate_delay()(init/calibrate.c)
Ø Initialize the kernel internal components such as page tables, SLABcaches, VFS, buffers, signals queues, max number of threads and processes, etc…
Ø Initialize the proc/ filesystem using proc_root_init() (fs/proc/root.c)
Ø Call rest_init() which will create Process 1
In rest_init(): (init/main.c)
Ø Create the init process, which is also called Process 1, usingkernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND)
Ø Create the kernel thread daemon, which is the parent of all kernelthreads and has PID 2, using pid = kernel_thread(kthreadd, NULL, CLONE_FS |CLONE_FILES) (kernel/kthread.c)
Ø Release the kernel lock that was locked at the beginning ofstart_kernel() using unlock_kernel()(include/linux/smp-lock.h)
Ø Execute the schedule() instruction to start running the scheduler(kernel/sched.c)
Ø Execute the CPU idle thread on CPU0 using cpu_idle(). This thread yieldsCPU0 to the scheduler and is returned to when the scheduler has no otherpending process to run on CPU0. CPU idle thread tries to conserve power andkeep overall latency
low (arch/arm/kernel/process.c)
In kernel_init(): (init/main.c)
Ø Start preparing the SMP environment by calling smp_prepare_cpus()(arch/arm/mach-realview/platsmp.c)
o Enable the local timer of the current processor which is CPU0, usinglocal_timer_setup(cpu) (arch/arm/mach-realview/localtimer.c)
o Move data corresponding to CPU0 to its own storage usingsmp_store_cpu_info(cpu) (arch/arm/kernel/smp.c)
o Initialize the present CPU map which describes the set of CPUs actuallypopulated at the present time using cpu_set(i, cpu_present_map). This willinform the kernel that there are 4 CPUs.
o Initialize the Snoop Control Unit using scu_enable()(arch/arm/mach-realview/platsmp.c)
o Call poke_milo() function which will take care of booting the secondaryprocessors (arch/arm/mach-realview/platsmp.c)
§ In poke_milo(), it triggers the other CPUs to executerealview_secondary_startup procedure by clearing the lower 2 bits ofSYS_FLAGSCLR register and writing the physical address ofrealview_secondary_startup procedure in SYS_FLAGSSET(arch/arm/mach-realview/headsmp.S)
§ In realview_secondary_startup procedure, the secondary CPUs are waitinga synchronization signal from the kernel (running on CPU0) which says that theyare ready to be initialized. When all the processors are ready, then they willbe initialized
using secondary_startup procedure(arch/arm/mach-realview/headsmp.S)
§ secondary_startup procedure does a similar operation as the stextprocedure when CPU0 was booted: (arch/arm/mach-realview/headsmp.S)
· Switch to Supervisor protected mode and disable all the interrupts
· Lookup for the processor type using __lookup_processor_type proceduredefined in arch/arm/kernel/head-common.S. This will return a pointer to aproc_info_list defined in arch/arm/mm/proc-v6.S for the ARM11 MPCore
· Use the page tables supplied from __cpu_up for each of the CPUs (to beexplained later in cpu_up function)
· Jump to __v6_setup procedure in arch/arm/mm/proc-v6.S, which willinitialize the TLB, cache and MMU state of the corresponding secondary CPU
· Enable the MMU using __enable_mmu procedure, which will setup someconfiguration bits and then call __turn_mmu_on (arch/arm/kernel/head.S)
· In __turn_mmu_on, the appropriate control registers are set and then itjumps to __secondary_data which will execute __secondary_switched procedure(arch/arm/kernel/head.S)
· In __secondary_switched procedure, it jumps to secondary_start_kernelroutine in arch/arm/kernel/smp.c source code after setting the stack pointer toa thread structure allocated via cpu_up function that is running on CPU0. (tobe explained later)
· secondary_start_kernel (arch/arm/kernel/smp.c) is the official start ofthe kernel for the secondary CPUs. It is considered as a kernel thread which isrunning on the corresponding CPU (see previous step). In this thread, furtherinitialization
is done such as:
o Initialize the CPU using cpu_init() which dumps the cache information,initializes SMP specific information, and sets up the per-cpu stacks(arch/arm/kernel/setup.c)
o Synchronize with the boot thread in CPU0 and enable some interrupts suchas timer irq in the corresponding CPU interface of the Distributed InterruptController using platform_secondary_init(cpu) function(arch/arm/mach-realview/platsmp.c)
o Enable the local interrupts using local_irq_enable() and local_fiq_enable()(include/linux/irqflags.h)
o Setup the local timer of the corresponding CPU usinglocal_timer_setup(cpu) (arch/arm/mach-realview/localtimer.c)
o Determine the speed of the CPU clock in BogoMips
? using calibrate_delay()(init/calibrate.c)
o Move data corresponding to CPUx to its own storage usingsmp_store_cpu_info(cpu) (arch/arm/kernel/smp.c)
o Execute the idle thread (also can be called as process 0) on thecorresponding secondary CPU using cpu_idle() which will yield CPUx to thescheduler and is returned to when the scheduler has no other pending process torun on CPUx (arch/arm/kernel/process.c)
Ø Call smp_init() (init/main.c)
§ Boot every offline CPU which are CPU1,CPU2 and CPU3 using cpu_up(cpu):(arch/arm/kernel/smp.c)
· Create a new idle process manually using fork_idle(cpu) and assign it tothe data structure of the corresponding CPU
· Allocate initial page tables to allow the secondary CPU to enable theMMU safely using pgd_alloc()
· Inform the secondary CPU where to find its stack and page tables
· Boot the secondary CPU using boot_secondary(cpu,idle):(arch/arm/mach-realview/platsmp.c)
o Synchronize between the boot processor (CPU0) and the secondaryprocessor using locking mechanism spin_lock(&boot_lock);
o Inform the secondary processor that it can start booting its part of thekernel
o Wake the secondary core up using smp_cross_call(mask_cpu), which willsend a soft interrupt (include/asm-arm/mach-realview/smp.h)
o Wait for the secondary core to finish its booting and calibrations thatare done using secondary_start_kernel function (explained before)
· Repeat this process for every secondary CPU
§ Display the kernel message on the console “SMP: Total of 4 processorsactivated (334.02 BogoMIPS
?), using
smp_cpus_done(max_cpus)(arch/arm/kernel/smp.c)
Ø Call sched_init_smp() (kernel/sched.c)
§ Build the scheduler domains usingarch_init_sched_domains(&cpu_online_map) which will set the topology of themulticore (kernel/sched.c)
§ Check how many online CPUs exist and adjust the scheduler granularityvalue appropriately using sched_init_granularity() (kernel/sched.c)
Ø The do_basic_setup() function initializes the driver model usingdriver_init() (drivers/base/init.c), the sysctl interface, the network socketinterface u, and work queue support using init_workqueues(). Finally it callsdo_initcalls () which
initializes the built-in device drivers routines(init/main.c)
Ø Call init_post() (init/main.c)
In init_post() (init/main.c):
This is where we switch to user mode by calling sequentially the followingprocesses:
run_init_process("/sbin/init");
run_init_process("/etc/init");
run_init_process("/bin/init");
run_init_process("/bin/sh");
/sbin/init process executes and displays lot of messages on the console,and finally it transfers the control to the console and stays alive.
VOILA!
Possible problems
Code overwrite
Codestarted on the "idle" cores (.e. those not running the kernel loader)during the boot process,
and running in volatile memory may be corrupted by the kernel before itstarts those cores running kernel code.
This has been observed on some models.
Work round
- Extend the memory used for initrd
- Start the idle code there
实际上 A9之后,u-boot还应该将mmu4K的table维护好,A9有write buffer,如果外设的IP不设定为device的话,你写入的值其实是放在cache中,而没有立即写入这个寄存器中。所以在u-boot的时候,就建立好这个mmu_table,并将这个table的值给P15处理器。