Shared library with dynamic relocation (without -fpic) works fine for X86 and X64 and introduced no performance penalty in running time, while the linker needs to scan whole object to fill the relocated data and function with correct and address. Also, the
.text library can not be shared by other thread using the same library. ( the .data section is copyed for each process.)
PIC could be helpful to reuse the code in shared library. GOT table is introduced to record the actually virtual address of one data or function ( function introduce PLT also).
data = [GOT[data-index]]
function = call PLT -> jmp GOT[function-index]
It actually longer the pathlength with extra indirect jump which also brings more memory reference and register allocate pressure, especially in X86 which only has 6 register available at most time.
Another bad news is, X86 can not direct refer the EIP register, which means you are totally not able to use get one data value by [base-add + offset ] format. One trick needs to be done before use PIC, which is to fetch the address of EIP.
call TMPLABEL
TMPLABEL:
pop ebx
Also the ebx register is occupied to hold the address, unless you choose the regenerate the EIP value every time needs it.
Things becomes more interesting in X64: first, the RIP(EIP) value can be directly referenced, it makes easy to refer the PIC data and function call. But call function by PC relative is limited with 32-bit sized offset (+- 2GB) as near function call. Far
call and near call larger than 2GB distance which needs to encode whole 64-bit address into instruction( but actual unable) is actually implement by first move imm64 value into one register and call. Or directly call the imm64 address from memory. It is caused
by the only instruction who has ability to encode whole 64-bit address : mov(movaps) instruction.
So, GCC actually include 3 types of flag:
-mcmodel=small : only support code in +-2GB relative reference, just use RIP reference
-mcmodel=large : support > 2GB reference, just use imm64 type refernce
-mcmodel=medium: those in 2GB distance use RIP reference, > 2GB use imm64 type.
As far as we know, the performance difference between X64-No-PIC and X64-PIC is few.
ARMv7 and ARMv8
Good news in ARMv7 and ARMv8: they both can directly use PC in relative form for data and function call, however, bad news: with such a limit distance.
instruction
data
function call
ARMv7
ldr