开发嵌入式Linux的时候经常会遇到segmentation fault,也就是段异常错误,一般是使用错误的指针访问内存导致。这种错误可以通过打开内核的异常信息输出,再用gdb对发生段异常的地址进行定位。
1.打开内核的异常信息输出:
mips的内核代码关闭了arch/mips/mm/fault.c的do_page_fault():133中的这段代码:
#if 0
printk("do_page_fault() #2: sending SIGSEGV to %s for "
"invalid %s/n%0*lx (epc == %0*lx, ra == %0*lx)/n",
tsk->comm,
write ? "write access to" : "read access from",
field, address,
field, (unsigned long) regs->cp0_epc,
field, (unsigned long) regs->regs[31]);
#endif
在发生段异常的时候会打印当前是哪个进程出错,出错的epc和返回地址ra是多少。打开这段代码,这样可以得到发生段异常时的指令地址。还可以增加打印内核栈和寄存器的函数,例如:
#if 1
printk("do_page_fault() #2: sending SIGSEGV to %s for "
"invalid %s/n%0*lx (epc == %0*lx, ra == %0*lx)/n",
tsk->comm,
write ? "write access to" : "read access from",
field, address,
field, (unsigned long) regs->cp0_epc,
field, (unsigned long) regs->regs[31]);
show_registers(regs);
dump_stack();
#endif
2.定位发生段异常的代码
打印的出错信息:
do_page_fault() #2: sending SIGSEGV to myprog for invalid read access from
00000004 (epc == 00479628, ra == 00482228)
.......
Lo : 00000001
epc : 00479628 0x479628 Not tainted
ra : 00482228 0x482228
Status: 0000f413 USER EXL IE
Cause : 00800008
BadVA : 00000004
PrId : 00019374
Modules linked in: pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
Process myprog (pid: 7487, threadinfo=81096000, task=811135b8)
Stack : 2ab4f360 00000001 100055a0 004d0000 1000b320 004d0000 100055a0 00482228
10036248 004d0000 2ab4f360 00000001 1000b320 004d0000 004cac44 004825f0
00000000 004ba948 004bebe0 004ccce8 004cac44 000000a8 100055a4 0043886c
1000b320 0040d8d0 00000000 0040d8d0 00000000 10003494 0040ddfc 0040dd18
004b2250 004b2278 004b2284 0040d83c 1000b320 00000000 1000b320 7fcdff80
...
Call Trace:
Code: 00000000 1090000a 00000000 <8c820004> 8c830000 8f998078 ac430000 0320f809 ac620004
Call Trace:
[<8006dc04>] do_page_fault+0x214/0x430
[<8006dbfc>] do_page_fault+0x20c/0x430
[<800a881c>] handle_IRQ_event+0x70/0xfc
[<800a89e4>] __do_IRQ+0x13c/0x158
[<8006e328>] tlb_do_page_fault_0+0xf8/0x100
[<8006130c>] ar7100_interrupt_receive+0xec/0x100
Segmentation fault
可以看到,发生异常的程序地址(epc寄存器)是0x00479628,程序返回地址(ra寄存器)是0x00482228。使用gdb来定位,注意编译myprog程序时要带编译参数-g -ggdb,否则myprog中没有调试信息。
#mipseb-linux-uclibc-gdb myprog
(gdb) l *(0x00479628)
0x479628 is in DropAll (../../../include/util.h:107).
102 *
103 * This is only for internal list manipulation where we know
104 * the prev/next entries already!
105 */
106 static inline void __list_del(struct list_head *prev, struct list_head *next)
107 {
108 next->prev = prev;
109 prev->next = next;
110 }
111
(gdb) l *(0x00482228)
0x482228 is in FW_CleanEnv (firewall.c:325).
320 {
321 g_apstFwHashTable[i] = NULL;
322 }
323
324 /* ?3?¢ */
325 DropAll(&g_lLanDev);
326 DropAll(&g_lWanDev);
327 }
328
329
可以看到FW_CleanEnv()中的第325行调用的DropAll(&g_lLanDev)导致了段异常,具体是在DropAll()使用的util.h:107的宏
__list_del()产生的异常。