9.3 Floating Point Performance

Note. This section contains data that is so antiquated that we’re not sure it has much relevance to modern hardware configurations. It is included for the sake of completeness and as a historical curiosity .

Table 2: Floating Point Benchmarks
With 80387 No 80387
Repetitions Average Repetitions Average
2,000k Time 200k Time
Operation (seconds) ($\mu$sec) (seconds) ($\mu$sec)
Subtract 25.9 13.0 35.3 177
Multiply 28.4 14.2 44.3 222
Divide 33.6 16.8 50.9 255
Inner product 1 40.3 20.2 61.9 310
Scalar multiply 2 30.2 15.1 45.0 225
• 1 sum += a[cursor[i]]*y[cursor[j]]
2 a[cursor[i]] *= scalar

• A variety of floating point operations were monitored under MS DOS version 3.30 on a 16 Mhz IBM PS/2 Model 70 with a 16 Mhz 80387 coprocessor and a 27 msec, 60 Mbyte fixed disk drive. The 32 bit operations available on the 80386 were not used. Table 2 catalogs the time requirements of the simple arithmetical operations, inner product accumulation, and multiplying a vector by a scalar. All benchmarks were performed using double precision real numbers. The test contains a loop that was restarted after every 500 operations, e.g. 200k repetitions also includes the overhead of starting and stopping a loop 400 times. With this testing scheme, all loop counters and array indices were maintained in the registers.

The measurements in Table 3 provide a similar analysis of math functions in the Microsoft C Version 5.1 math library. These benchmarks were conducted with a single loop whose counter was a long integer.

Table 3: Math Library Benchmarks
With 80387 No 80387
Repetitions Average Repetitions Average
300k Time 10k Time
Operation (seconds) ($\mu$sec) (seconds) ($\mu$sec)
acos 36.2 121 30.5 3,050
asin 35.1 117 29.9 2,990
atan 26.0 87 23.0 2,300
cos 37.7 126 25.3 2,530
sin 37.0 123 24.7 2,470
tan 31.7 106 19.2 1,920
log 25.4 85 18.5 1,850
sqrt 16.5 55 5.7 570
pow 51.4 171 38.6 3,860
j0 1 235.1 784 60.7 6,070
j6 662.02 2,207 176.3 17,603
y0 3 510.02 1,700 146.4 14,640
• Differences in loop overheads found in Table 2 and Table 3 are accounted for by the differences in the loop counter implementation described above. The 3 $\mu$sec overhead reflects the time required to increment a long integer and monitor the termination condition (which also involved a long integer comparison). The 1.3 $\mu$sec overhead reflects the time required to increment a register and monitor the termination condition (which involved a register comparison).