64 versus 32 bit processors

 As manufacturers like Intel and AMD work to bring 64 bit
 microprocessors into the mainstream the question of what makes a 64
 bit processor 64 bit and what implications the "bittedness" has on
 application performance becomes increasingly relevant.

 Strictly speaking a processor is said to be a "N bit processor" if
 the size of the integer registers is N bits. This normally means that
 the processor can use no more than N bits in a memory address (since
 addresses are usually manipulated as integers in modern processors),
 although there is certainly no requirement that a N bit processor
 implement N bit addresses.

What are the benefits of a 64 bit processor?

 The chief benefit is that a lot of limits are removed transparently.
 In a pure 64 bit environment the user doesn't have to think about
 compiling with the right flags or working around pointer size limits
 using library calls and opaque data types. Large processes and big
 memories work in a straightforward manner with a minimum of fuss.
 Programs that need to use more than 2 or 4 GB of memory can usually
 do that with a simple recompile.

 Codes that perform many operations on integer values larger than
 2^32 bits _may_ also run faster on a 64 bit processor (see below).

 However...

Some myths :

- A 64 bit processor is faster than a 32 bit processor.

  This may be true, but not on account of the increased number of
  bits.  

  The size of the integer registers does not normally have a large
  impact on performance. Integer codes that work with large numbers
  could benefit from the increased register size but since memory
  latency is the most important performance limiter on modern
  processors it will in many cases mask a lot of the potential gain.

  If the addresses in the 64 bit processor are larger than those
  in the 32 bit processor the 64 bit processor may actually turn
  out to be slightly slower since the larger addresses eat up
  a larger portion of caches and memory bandwidth.

  Since a 64 bit version of a processor family is in most cases
  developed later than a 32 bit version it often benefits from a
  lot of other changes as well such as larger caches, improved
  memory systems and higher clock frequencies. These changes
  can cause a great performance difference that marketing
  departments are fond of attributing to the "bittedness" of
  the processor.

- A 32 bit processor can only use 2^32 bytes (4GB) of RAM

  It is true that the maximum size of an address in a program in a 32
  bit processor is usually 32 bits, however it is possible for the
  processor to extend the address using other means to enable it to
  access a larger physical memory. Some members of the 32 bit Intel
  IA32 family of processors (modern Pentiums) can access 2^36 bytes
  (64GB) of physical memory using the ESMA (Extended Server Memory
  Architecure [2]). A virtual address is still limited to 32 bits but
  4 extra bits can be tacked onto it during translation to a physical
  address. This means that each process can only access at most 4GB at
  once transparently though a pointer but the system as a whole can
  use more and the process can keep more data in memory with the help
  of the OS. This requires the code to explicitly support this model.

- A 64 bit system can use 2^64 bytes (16EB) of RAM

  As previously stated a 64 bit processor may have addresses that use
  fewer than 64 bits and usually doesn't actually have memory
  interfaces that support the full 64 bit address range. Many systems
  also reserve parts of the address range for various uses so that
  some bits of the address range are lost there. For the time being the
  number of address bits actually supported is usually "large enough"
  anyway, often in the Petabyte range.

- The precision of a floating point value depends on the "bittedness"
  of the processor

  The question of the width of the floating point registers is
  basically orthogonal to that of the width of the integer
  registers. Practically all modern commercial processors with a
  floating point unit conforms to the IEEE 754 standard which
  specifies that a single precision floating point value is
  represented with 32 bits and a double precision floating point value
  is represented with 64 bits. For instance, the 32 bit Power2
  processor has 64 bit wide floating point registers and the 32 bit
  IA32 family of processors actually has 80 bit wide floating point
  registers.

  Some processors incorporate SIMD (Single Instruction, Multiple Data)
  extensions where single precision (32 bit) floating point values can
  be packed into 64 or 128 bit registers and operated on in parallel.
  Because they operate on several values in parallel, programs that
  can use these extensions can improve their floating point
  performance quite a bit. Examples of such extensions are SSE/SSE2 in
  Intel processors, 3Dnow in AMD Athlon processors and AltiVec in
  PowerPC.

- My PlayStation2 has a 128 bit processor

  The processor in the PlayStation2, the Emotion Engine, consists of a
  MIPS III RISC core and two vector units [3]. The MIPS III is a 64
  bit architecture with 64 bit integer registers. The vector units
  each has 16 16 bit integer registers and 32 128 bit floating point
  registers. So, ignoring the marketing value of being able to claim
  "128 bit power", the PlayStation2 uses a 64 bit processor.

- I need a 64 bit processor in order to use 64 bit wide integer types

  A compiler can well support integer types that are wider than the
  maximum width of the architectural integer registers by using more
  than one register to store them. For instance, the "long long" type
  in GCC under Linux on a IA32 processor is 64 bits. Using these types
  is typically slower than using the "native" types since the compiler
  has to generate code to load/store more registers and stitch
  together the results.

- The maximum size of a file is tied to the "bittedness" of the
  processor

  The maximum size of a file in a Unix system is defined by the
  maximum size of the type off_t. This type can be 64 bits even on a
  32 bit processor. A common solution is to have a 32 bit off_t by
  default and a 64 bit off64_t type that can be used if
  necessary. Most modern filesystems support 64 bit off_t on 32 bit
  systems. Programs often have to define a constant (such as setting
  _FILE_OFFSET_BITS to 64 in Linux) or be compiled with some special
  flag order to use the larger version of off_t.

Potential points of contention

 The definition of 64 bit processors comes from John Mashey [1],
 someone who knows a lot more about it than I do. Feel free to argue
 against it but know that you must argue you point more effectively
 than Mashey.

 There are machines out there that put the lie to what I've written
 above. Older Cray machines didn't support IEEE 754 floating point for
 instance. I'd be happy to note any other deviances but bear in mind
 that I've purposfully glossed some things over to keep it short(ish)
 and not to obscure the point too much.

Contact

 Corrections, improvement suggestions and questions welcome.

 pek@pdc.kth.se

References

[1] John Masheys comp.arch post about "bittedness" :
    http://www.pdc.kth.se/~pek/bittedness.Mashery.txt

[2] Extended Memory Access on IA-32 Platforms :
    http://www.intel.com/idf/us/fall2002/presentations/DES124PS.pdf

[3] Masaaki Oka, Masakazu Suzuoki. Designing and Programming the
    Emotion Engine. IEEE Micro, Vol. 19, No. 6, pp. 20-28

20030923