使用上一遍文章的代码,更改一下Makefile选项测试,原来反汇编生成的代码及Makefile文件内容如下
led_elf: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: e3a00453 mov r0, #1392508928 ; 0x53000000
4: e3a01000 mov r1, #0 ; 0x0
8: e5801000 str r1, [r0]
c: e3a0da01 mov sp, #4096 ; 0x1000
10: eb000000 bl 18
00000014 :
14: eafffffe b 14
00000018 :
18: e1a0c00d mov ip, sp
1c: e92dd800 stmdb sp!, {fp, ip, lr, pc}
20: e24cb004 sub fp, ip, #4 ; 0x4
24: e3a03456 mov r3, #1442840576 ; 0x56000000
28: e2833050 add r3, r3, #80 ; 0x50
2c: e3a02c01 mov r2, #256 ; 0x100
30: e5832000 str r2, [r3]
34: e3a03456 mov r3, #1442840576 ; 0x56000000
38: e2833054 add r3, r3, #84 ; 0x54
3c: e3a02000 mov r2, #0 ; 0x0
40: e5832000 str r2, [r3]
44: e3a03000 mov r3, #0 ; 0x0
48: e1a00003 mov r0, r3
4c: e89da800 ldmia sp, {fp, sp, pc}
led.bin:startup.S led.c
arm-linux-gcc -g -c -o startup.o startup.S
arm-linux-gcc -g -c -o led.o led.c
arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf
arm-linux-objcopy -O binary -S led_elf led.bin
arm-linux-objdump -D -m arm led_elf > led.dis
决定去掉-g选项,同时优化级别定位O1,makefile及反汇编代码如下
led_elf: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: e3a00453 mov r0, #1392508928 ; 0x53000000
4: e3a01000 mov r1, #0 ; 0x0
8: e5801000 str r1, [r0]
c: e3a0da01 mov sp, #4096 ; 0x1000
10: eb000000 bl 18
00000014 :
14: eafffffe b 14
00000018 :
18: e3a03456 mov r3, #1442840576 ; 0x56000000
1c: e2833050 add r3, r3, #80 ; 0x50
20: e3a02c01 mov r2, #256 ; 0x100
24: e4032050 str r2, [r3], #-80
28: e2833054 add r3, r3, #84 ; 0x54
2c: e3a00000 mov r0, #0 ; 0x0
30: e5830000 str r0, [r3]
34: e1a0f00e mov pc, lr
led.bin:startup.S led.c
arm-linux-gcc -O -c -o startup.o startup.S
arm-linux-gcc -O -c -o led.o led.c
arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf
arm-linux-objcopy -O binary -S led_elf led.bin
arm-linux-objdump -D -m arm led_elf > led.dis
可以清楚地看到,优化以后代码大小从0x4c缩减到0x34。编译器认为main函数没有局部变量,没有对寄存器操作,所以,调用main函数的时候省去压栈及设置堆栈操作,返回的时候直接使用mov pc,lr返回,优化了一些不必要的操作,我们再试试,选择O2选项,看看优化后的代码是怎样的
led_elf: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: e3a00453 mov r0, #1392508928 ; 0x53000000
4: e3a01000 mov r1, #0 ; 0x0
8: e5801000 str r1, [r0]
c: e3a0da01 mov sp, #4096 ; 0x1000
10: eb000000 bl 18
00000014 :
14: eafffffe b 14
00000018 :
18: e3a02000 mov r2, #0 ; 0x0
1c: e3a01456 mov r1, #1442840576 ; 0x56000000
20: e3a03c01 mov r3, #256 ; 0x100
24: e1a00002 mov r0, r2
28: e5813050 str r3, [r1, #80]
2c: e5812054 str r2, [r1, #84]
30: e1a0f00e mov pc, lr
led.bin:startup.S led.c
arm-linux-gcc -O2 -c -o startup.o startup.S
arm-linux-gcc -O2 -c -o led.o led.c
arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf
arm-linux-objcopy -O binary -S led_elf led.bin
arm-linux-objdump -D -m arm led_elf > led.dis
使用O2级别生成的可执行代码从O1的0x34减少到了0x30,节省了一条指令,然后,我们再看看O3选项会怎样
led_elf: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: e3a00453 mov r0, #1392508928 ; 0x53000000
4: e3a01000 mov r1, #0 ; 0x0
8: e5801000 str r1, [r0]
c: e3a0da01 mov sp, #4096 ; 0x1000
10: eb000000 bl 18
00000014 :
14: eafffffe b 14
00000018 :
18: e3a02000 mov r2, #0 ; 0x0
1c: e3a01456 mov r1, #1442840576 ; 0x56000000
20: e3a03c01 mov r3, #256 ; 0x100
24: e1a00002 mov r0, r2
28: e5813050 str r3, [r1, #80]
2c: e5812054 str r2, [r1, #84]
30: e1a0f00e mov pc, lr
led.bin:startup.S led.c
arm-linux-gcc -O3 -c -o startup.o startup.S
arm-linux-gcc -O3 -c -o led.o led.c
arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf
arm-linux-objcopy -O binary -S led_elf led.bin
arm-linux-objdump -D -m arm led_elf > led.dis
我们看一下代码,跟O2结果一样,说明代码基本已经没有优化的余地了,再试一下最后一个选项O0,不优化,看看代码是怎样的
led_elf: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: e3a00453 mov r0, #1392508928 ; 0x53000000
4: e3a01000 mov r1, #0 ; 0x0
8: e5801000 str r1, [r0]
c: e3a0da01 mov sp, #4096 ; 0x1000
10: eb000000 bl 18
00000014 :
14: eafffffe b 14
00000018 :
18: e1a0c00d mov ip, sp
1c: e92dd800 stmdb sp!, {fp, ip, lr, pc}
20: e24cb004 sub fp, ip, #4 ; 0x4
24: e3a03456 mov r3, #1442840576 ; 0x56000000
28: e2833050 add r3, r3, #80 ; 0x50
2c: e3a02c01 mov r2, #256 ; 0x100
30: e5832000 str r2, [r3]
34: e3a03456 mov r3, #1442840576 ; 0x56000000
38: e2833054 add r3, r3, #84 ; 0x54
3c: e3a02000 mov r2, #0 ; 0x0
40: e5832000 str r2, [r3]
44: e3a03000 mov r3, #0 ; 0x0
48: e1a00003 mov r0, r3
4c: e89da800 ldmia sp, {fp, sp, pc}
led.bin:startup.S led.c
arm-linux-gcc -O0 -c -o startup.o startup.S
arm-linux-gcc -O0 -c -o led.o led.c
arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf
arm-linux-objcopy -O binary -S led_elf led.bin
arm-linux-objdump -D -m arm led_elf > led.dis
我们发现,不优化代码跟我们不加优化代码选项是一样的。通过以上反汇编分析,我大致得到如下经验:
1.不加优化选项默认就是不优化
2.O1级别仅仅对函数调用进行优化,设计实际代码内容不优化,可以调试(经验,不一定准确,暂时这么认为)
3.使用O2级别,对指令执行序列进行了优化,不可以调试(因为优化后执行代码跟源代码无法全部对应,人分辨不出来)
4.使用O3级别,对指令执行序列进行了最高级别优化,不可以调试