Oops是linux内核发生致命错误是输出的信息,信息输出在/var/log/messages文件中。
以下测试在SUSE 11 sp2上通过。
1.准备测试程序
oopsdemo.c :
#include /* 引入与模块相关的宏 */
#include
/*
引入module_init() module_exit()函数 */
#include /*
引入module_param() */
MODULE_AUTHOR("Garfield");
MODULE_LICENSE("GPL");
static int __init
init_oopsdemo(void)
{
*((int*)0x00) = 0x19760817;
return
0;
}
module_init(init_oopsdemo);
static void __exit cleanup_oopsdemo(void)
{
}
module_exit(cleanup_oopsdemo);
Makefile:
obj-m := oopsdemo.o
modules-objs:= oopsdemo.oKDIR := /lib/modules/`uname -r`/build
PWD := $(shell pwd)default:
make -C $(KDIR) M=$(PWD) modulesclean:
rm -rf *.o .*.cmd *.ko *.mod.c *.order *.symvers .tmp_versions
编译: make生成 oopsdemo.ko 2.加载模块insmod oopsdemo.ko
Killed modules的相关命令:modprobeinsmodrmmodlsmod3.查看/var/log/messagesJul 8 13:13:21 linux-200 kernel: [ 176.694468] BUG: unable to handle kernel NULL pointer dereference at (null)
Jul 8 13:13:21 linux-200 kernel: [ 176.694476] IP: [<ffffffffa004b000>] 0xffffffffa004afff
Jul 8 13:13:21 linux-200 kernel: [ 176.694510] PGD 1c587067 PUD 1c5aa067 PMD 0
Jul 8 13:13:21 linux-200 kernel: [ 176.694517] Oops: 0002 [#1] SMP
Jul 8 13:13:21 linux-200 kernel: [ 176.694532] CPU 0
Jul 8 13:13:21 linux-200 kernel: [ 176.694535] Modules linked in: oopsdemo(N+) snd_pcm_oss snd_mixer_oss snd_seq_midi snd_seq_midi_event snd_seq edd mperf microcode fuse loop dm_mod ipv6 snd_ens1371 gameport snd_rawmidi snd_seq_device snd_ac97_codec ac97_bus snd_pcm snd_timer snd ipv6_lib ppdev soundcore parport_pc snd_page_alloc shpchp sr_mod parport floppy intel_agp rtc_cmos sg cdrom e1000 vmw_balloon(X) i2c_piix4 pciehp i2c_core pci_hotplug pcspkr intel_gtt button container ac mptctl ext3 jbd mbcache uhci_hcd sd_mod crc_t10dif ehci_hcd processor thermal_sys hwmon usbcore usb_common scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata mptspi mptscsih mptbase scsi_transport_spi scsi_mod
Jul 8 13:13:21 linux-200 kernel: [ 176.694569] Supported: Yes
Jul 8 13:13:21 linux-200 kernel: [ 176.694571]
Jul 8 13:13:21 linux-200 kernel: [ 176.694575] Pid: 3256, comm: insmod Tainted: G NX 3.0.13-0.27-default #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
Jul 8 13:13:21 linux-200 kernel: [ 176.694581] RIP: 0010:[<ffffffffa004b000>] [<ffffffffa004b000>] 0xffffffffa004afff
Jul 8 13:13:21 linux-200 kernel: [ 176.694587] RSP: 0018:ffff88001c589f30 EFLAGS: 00010246
Jul 8 13:13:21 linux-200 kernel: [ 176.694590] RAX: ffff88001c589fd8 RBX: ffffffffa042a000 RCX: 0000000000000000
Jul 8 13:13:21 linux-200 kernel: [ 176.694592] RDX: 0000000000000670 RSI: 0000000000000004 RDI: ffffffffa004b000
Jul 8 13:13:21 linux-200 kernel: [ 176.694594] RBP: 0000000000015d39 R08: 0000000000000004 R09: ffffffff81bdfb40
Jul 8 13:13:21 linux-200 kernel: [ 176.694599] R10: 0000000000000000 R11: ffffffff81034d00 R12: ffffffffa004b000
Jul 8 13:13:21 linux-200 kernel: [ 176.694602] R13: 0000000000000000 R14: 00007ffff83ad80b R15: 0000000000603030
Jul 8 13:13:21 linux-200 kernel: [ 176.694614] FS: 00007f149df2b700(0000) GS:ffff88001f400000(0000) knlGS:0000000000000000
Jul 8 13:13:21 linux-200 kernel: [ 176.694617] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 8 13:13:21 linux-200 kernel: [ 176.694619] CR2: 0000000000000000 CR3: 000000001c541000 CR4: 00000000000006f0
Jul 8 13:13:21 linux-200 kernel: [ 176.694641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 8 13:13:21 linux-200 kernel: [ 176.694671] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 8 13:13:21 linux-200 kernel: [ 176.694674] Process insmod (pid: 3256, threadinfo ffff88001c588000, task ffff88001c592680)
Jul 8 13:13:21 linux-200 kernel: [ 176.694676] Stack:
Jul 8 13:13:21 linux-200 kernel: [ 176.694677] ffffffff810001cb 0000000000603030 ffffffffa042a000 0000000000015d39
Jul 8 13:13:21 linux-200 kernel: [ 176.694681] 0000000000603030 0000000000603010 ffffffff810993bd 0000000000015d39
Jul 8 13:13:21 linux-200 kernel: [ 176.694684] 0000000000020000 0000000000020000 ffffffff81449692 0000000000000206
Jul 8 13:13:21 linux-200 kernel: [ 176.694687] Call Trace:
Jul 8 13:13:21 linux-200 kernel: [ 176.694721] [<ffffffff810001cb>] do_one_initcall+0x3b/0x180
Jul 8 13:13:21 linux-200 kernel: [ 176.694745] [<ffffffff810993bd>] sys_init_module+0xcd/0x240
Jul 8 13:13:21 linux-200 kernel: [ 176.694765] [<ffffffff81449692>] system_call_fastpath+0x16/0x1b
Jul 8 13:13:21 linux-200 kernel: [ 176.695784] DWARF2 unwinder stuck at system_call_fastpath+0x16/0x1b
Jul 8 13:13:21 linux-200 kernel: [ 176.695788]
Jul 8 13:13:21 linux-200 kernel: [ 176.695789] Leftover inexact backtrace:
Jul 8 13:13:21 linux-200 kernel: [ 176.695790]
Jul 8 13:13:21 linux-200 kernel: [ 176.695792] Code: 04 25 00 00 00 00 17 08 76 19 31 c0 c3 00 00 00 00 00 00 00 00
Jul 8 13:13:21 linux-200 kernel: [ 176.695808] RIP [<ffffffffa004b000>] 0xffffffffa004afff
Jul 8 13:13:21 linux-200 kernel: [ 176.695814] RSP
Jul 8 13:13:21 linux-200 kernel: [ 176.695816] CR2: 0000000000000000
Jul 8 13:13:21 linux-200 kernel: [ 176.695823] ---[ end trace 1f7408a306659cda ]---
分析:
BUG: unable to handle kernel NULL pointer dereference at (null) //错误的内容
RIP: 0010:[] [] 0xffffffffa004afff //错误发生的地址,RIP寄存器
PGD 1c587067 PUD 1c5aa067 PMD 0 //试图访问的地址,为0
因为KALLSYMS的标志没有打开,所有没有更多的信息,如需要更详细的信息可以编译内核,打开KALLSYMS。
以上是在vmware上测试,不在虚拟机上测试,发生oops时,服务器将不能远程访问。
有时候服务器不能访问,只能ping通,而且重启后messages里没有oops的信息,这种情况调试起来比较困难,可使用minicom通过串口调试,或者sysrq键。