嵌入式linux编程arm初步接触之优化级别

2019-07-13 04:41发布

使用上一遍文章的代码,更改一下Makefile选项测试,原来反汇编生成的代码及Makefile文件内容如下 led_elf: file format elf32-littlearm Disassembly of section .text: 00000000 <_start>: 0: e3a00453 mov r0, #1392508928 ; 0x53000000 4: e3a01000 mov r1, #0 ; 0x0 8: e5801000 str r1, [r0] c: e3a0da01 mov sp, #4096 ; 0x1000 10: eb000000 bl 18
00000014 : 14: eafffffe b 14 00000018
: 18: e1a0c00d mov ip, sp 1c: e92dd800 stmdb sp!, {fp, ip, lr, pc} 20: e24cb004 sub fp, ip, #4 ; 0x4 24: e3a03456 mov r3, #1442840576 ; 0x56000000 28: e2833050 add r3, r3, #80 ; 0x50 2c: e3a02c01 mov r2, #256 ; 0x100 30: e5832000 str r2, [r3] 34: e3a03456 mov r3, #1442840576 ; 0x56000000 38: e2833054 add r3, r3, #84 ; 0x54 3c: e3a02000 mov r2, #0 ; 0x0 40: e5832000 str r2, [r3] 44: e3a03000 mov r3, #0 ; 0x0 48: e1a00003 mov r0, r3 4c: e89da800 ldmia sp, {fp, sp, pc} led.bin:startup.S led.c arm-linux-gcc -g -c -o startup.o startup.S arm-linux-gcc -g -c -o led.o led.c arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf arm-linux-objcopy -O binary -S led_elf led.bin arm-linux-objdump -D -m arm led_elf > led.dis 决定去掉-g选项,同时优化级别定位O1,makefile及反汇编代码如下 led_elf: file format elf32-littlearm Disassembly of section .text: 00000000 <_start>: 0: e3a00453 mov r0, #1392508928 ; 0x53000000 4: e3a01000 mov r1, #0 ; 0x0 8: e5801000 str r1, [r0] c: e3a0da01 mov sp, #4096 ; 0x1000 10: eb000000 bl 18
00000014 : 14: eafffffe b 14 00000018
: 18: e3a03456 mov r3, #1442840576 ; 0x56000000 1c: e2833050 add r3, r3, #80 ; 0x50 20: e3a02c01 mov r2, #256 ; 0x100 24: e4032050 str r2, [r3], #-80 28: e2833054 add r3, r3, #84 ; 0x54 2c: e3a00000 mov r0, #0 ; 0x0 30: e5830000 str r0, [r3] 34: e1a0f00e mov pc, lr led.bin:startup.S led.c arm-linux-gcc -O -c -o startup.o startup.S arm-linux-gcc -O -c -o led.o led.c arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf arm-linux-objcopy -O binary -S led_elf led.bin arm-linux-objdump -D -m arm led_elf > led.dis 可以清楚地看到,优化以后代码大小从0x4c缩减到0x34。编译器认为main函数没有局部变量,没有对寄存器操作,所以,调用main函数的时候省去压栈及设置堆栈操作,返回的时候直接使用mov pc,lr返回,优化了一些不必要的操作,我们再试试,选择O2选项,看看优化后的代码是怎样的 led_elf: file format elf32-littlearm Disassembly of section .text: 00000000 <_start>: 0: e3a00453 mov r0, #1392508928 ; 0x53000000 4: e3a01000 mov r1, #0 ; 0x0 8: e5801000 str r1, [r0] c: e3a0da01 mov sp, #4096 ; 0x1000 10: eb000000 bl 18
00000014 : 14: eafffffe b 14 00000018
: 18: e3a02000 mov r2, #0 ; 0x0 1c: e3a01456 mov r1, #1442840576 ; 0x56000000 20: e3a03c01 mov r3, #256 ; 0x100 24: e1a00002 mov r0, r2 28: e5813050 str r3, [r1, #80] 2c: e5812054 str r2, [r1, #84] 30: e1a0f00e mov pc, lr led.bin:startup.S led.c arm-linux-gcc -O2 -c -o startup.o startup.S arm-linux-gcc -O2 -c -o led.o led.c arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf arm-linux-objcopy -O binary -S led_elf led.bin arm-linux-objdump -D -m arm led_elf > led.dis 使用O2级别生成的可执行代码从O1的0x34减少到了0x30,节省了一条指令,然后,我们再看看O3选项会怎样 led_elf: file format elf32-littlearm Disassembly of section .text: 00000000 <_start>: 0: e3a00453 mov r0, #1392508928 ; 0x53000000 4: e3a01000 mov r1, #0 ; 0x0 8: e5801000 str r1, [r0] c: e3a0da01 mov sp, #4096 ; 0x1000 10: eb000000 bl 18
00000014 : 14: eafffffe b 14 00000018
: 18: e3a02000 mov r2, #0 ; 0x0 1c: e3a01456 mov r1, #1442840576 ; 0x56000000 20: e3a03c01 mov r3, #256 ; 0x100 24: e1a00002 mov r0, r2 28: e5813050 str r3, [r1, #80] 2c: e5812054 str r2, [r1, #84] 30: e1a0f00e mov pc, lr led.bin:startup.S led.c arm-linux-gcc -O3 -c -o startup.o startup.S arm-linux-gcc -O3 -c -o led.o led.c arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf arm-linux-objcopy -O binary -S led_elf led.bin arm-linux-objdump -D -m arm led_elf > led.dis 我们看一下代码,跟O2结果一样,说明代码基本已经没有优化的余地了,再试一下最后一个选项O0,不优化,看看代码是怎样的 led_elf: file format elf32-littlearm Disassembly of section .text: 00000000 <_start>: 0: e3a00453 mov r0, #1392508928 ; 0x53000000 4: e3a01000 mov r1, #0 ; 0x0 8: e5801000 str r1, [r0] c: e3a0da01 mov sp, #4096 ; 0x1000 10: eb000000 bl 18
00000014 : 14: eafffffe b 14 00000018
: 18: e1a0c00d mov ip, sp 1c: e92dd800 stmdb sp!, {fp, ip, lr, pc} 20: e24cb004 sub fp, ip, #4 ; 0x4 24: e3a03456 mov r3, #1442840576 ; 0x56000000 28: e2833050 add r3, r3, #80 ; 0x50 2c: e3a02c01 mov r2, #256 ; 0x100 30: e5832000 str r2, [r3] 34: e3a03456 mov r3, #1442840576 ; 0x56000000 38: e2833054 add r3, r3, #84 ; 0x54 3c: e3a02000 mov r2, #0 ; 0x0 40: e5832000 str r2, [r3] 44: e3a03000 mov r3, #0 ; 0x0 48: e1a00003 mov r0, r3 4c: e89da800 ldmia sp, {fp, sp, pc} led.bin:startup.S led.c arm-linux-gcc -O0 -c -o startup.o startup.S arm-linux-gcc -O0 -c -o led.o led.c arm-linux-ld -Ttext 0x00000000 -g startup.o led.o -o led_elf arm-linux-objcopy -O binary -S led_elf led.bin arm-linux-objdump -D -m arm led_elf > led.dis 我们发现,不优化代码跟我们不加优化代码选项是一样的。通过以上反汇编分析,我大致得到如下经验:
1.不加优化选项默认就是不优化
2.O1级别仅仅对函数调用进行优化,设计实际代码内容不优化,可以调试(经验,不一定准确,暂时这么认为)
3.使用O2级别,对指令执行序列进行了优化,不可以调试(因为优化后执行代码跟源代码无法全部对应,人分辨不出来)
4.使用O3级别,对指令执行序列进行了最高级别优化,不可以调试