What is faster: double or float? For a long time, I’ve been simply using floats - I thought they’re faster and smaller than doubles… it is also an obvious choice in graphics programming.

But what about doubles? Are they that bad? It seems that the answers is not that obvious!

## The Tests

Here’s my test scenario:

• Allocate `ARR\SIZE` numbers
• Initialize elements with a simple pattern
• Compute some value, use different arithmetic operations
`````` // test float:
float *floatArray = (float *)malloc(ARR\SIZE * sizeof(float));
START_TIME();
for (int i = 0; i < ARR_SIZE; ++i)
{
floatArray[i] = (float)(i*i)/100.0f;
}

for (int i = 0; i < ARR_SIZE; ++i)
{
float temp = 0.0f;
for (int j = 0; j < NUM_ITER; ++j)
{
temp += floatArray[j]*2.0f;
}
temp = sqrtf(temp);
floatArray[i] = temp;
}
END_TIME();

free(floatArray);
``````

And the double code:

``````// test double:
double *doubleArray = (double *)malloc(ARR_SIZE * sizeof(double));
START_TIME();
for (int i = 0; i < ARR_SIZE; ++i)
{
doubleArray[i] = (double)(i*i)/100.0;
}

for (int i = 0; i < ARR_SIZE; ++i)
{
double temp = 0.0;
for (int j = 0; j < NUM_ITER; ++j)
{
temp += doubleArray[j]*2.0;
}
temp = sqrt(temp);
doubleArray[i] = temp;
}

END_TIME();

free(doubleArray);
``````

## The Results

``````Core 2 Duo T7300 @2.0Ghz
Visual Studio 2008, Release, /Ox, /fp:precise

processing float: 308 msec
processing double: 92 msec

Release, /Ox, /fp:precise, /arch:SSE2

processing float: 307 msec
processing double: 96 msec

Release, /Ox, /fp:fast, /arch:SSE2

processing float: 111 msec
processing double: 93 msec
``````

Wow… what a huge difference between standard version and SSE2! And moreover it seems that double type is sometimes even triple times faster that single precision! Worth considering… and worth more and proper testing!

## The Reason

the main problem: conversions

Below there is asm code generated by VS (Release, /Ox, /fp:precise, /arch:SSE2):

``````// for float
; 35 : for (int j = 0; j < NUM_ITER; ++j)
; 36 : { ; 37 : temp += floatArray[j]*2.0f;
movss xmm3,
DWORD PTR [eax-8] **cvtps2pd** xmm3,
xmm3 cvtss2sd xmm1,
xmm1 mulsd xmm3,
xmm1 xorps xmm1,
xmm1 cvtpd2ps xmm1,
xmm3 movss xmm3,
DWORD PTR [eax-4] ...
``````

And for `double`:

``````// for double
; 59 : for (int j = 0; j < NUM_ITER; ++j)
; 60 : { ; 61 : temp += doubleArray[j]*2.0;
movsd xmm3,
QWORD PTR [eax-16] mulsd xmm3,
Listing for floats is longer because of `cvtps2pd` and `cvtss2sd` instructions that converts single floating point value into double precision floating point value… and the reverse way.