DSP

32位与64位、单精度(single-precision)与双精度(double-precision

2019-07-13 20:20发布

What’s the difference between a single precision and double precision floating point operation?
  • float:精度范围 10381038" role="presentation">10381038
    • exp(102)1044" role="presentation">exp(102)1044 ,float 下溢
  • double:精度范围 1030810308" role="presentation">1030810308
    • exp(103)10434" role="presentation">exp(103)10434,double 下溢;

0. 64-bits CPU

如果说一个 CPU 是 64 位机,通常意味着,其具有 64 位的通用寄存器(general purpose register)以及内存地址空间的大小(memory address size),这与最终执行的数学运算,是单精度还是双精度,没有关系。

1. 单精度

S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF 0 1 8 9 31
  • 第 1 个 bit 位,表示的是符号位,S;
  • 中间 8 位,表示指数部分,E;
  • 末尾的 23 位,则表示小数部分,F;
  • E=0,F=0,S=1,=> -0
  • E=0,F=0,S=0,=> 0
  • 0
0 00000000 00000000000000000000000 = 0 E=0,F=0,S=0,=> 0 1 00000000 00000000000000000000000 = -0 E=0,F=0,S=1,=> -0 0 11111111 00000000000000000000000 = Infinity 1 11111111 00000000000000000000000 = -Infinity 0 11111111 00000100000000000000000 = NaN E=255,F 非零 1 11111111 00100010001001010101010 = NaN E=255,F 非零 0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2 0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5 1.101 => 1+0.5+0.125=1.625 1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5 0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126) 0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127) 0 00000000 00000000000000000000001 = +1 * 2**(-126) * 0.00000000000000000000001 = 2**(-149) (Smallest positive value)

2. 双精度

S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 0 1 11 12 63
  • 1 位;
  • 11 位;
  • 52 位;