64 versus 32 bit processors As manufacturers like Intel and AMD work to bring 64 bit microprocessors into the mainstream the question of what makes a 64 bit processor 64 bit and what implications the "bittedness" has on application performance becomes increasingly relevant. Strictly speaking a processor is said to be a "N bit processor" if the size of the integer registers is N bits. This normally means that the processor can use no more than N bits in a memory address (since addresses are usually manipulated as integers in modern processors), although there is certainly no requirement that a N bit processor implement N bit addresses. What are the benefits of a 64 bit processor? The chief benefit is that a lot of limits are removed transparently. In a pure 64 bit environment the user doesn't have to think about compiling with the right flags or working around pointer size limits using library calls and opaque data types. Large processes and big memories work in a straightforward manner with a minimum of fuss. Programs that need to use more than 2 or 4 GB of memory can usually do that with a simple recompile. Codes that perform many operations on integer values larger than 2^32 bits _may_ also run faster on a 64 bit processor (see below). However... Some myths : - A 64 bit processor is faster than a 32 bit processor. This may be true, but not on account of the increased number of bits. The size of the integer registers does not normally have a large impact on performance. Integer codes that work with large numbers could benefit from the increased register size but since memory latency is the most important performance limiter on modern processors it will in many cases mask a lot of the potential gain. If the addresses in the 64 bit processor are larger than those in the 32 bit processor the 64 bit processor may actually turn out to be slightly slower since the larger addresses eat up a larger portion of caches and memory bandwidth. Since a 64 bit version of a processor family is in most cases developed later than a 32 bit version it often benefits from a lot of other changes as well such as larger caches, improved memory systems and higher clock frequencies. These changes can cause a great performance difference that marketing departments are fond of attributing to the "bittedness" of the processor. - A 32 bit processor can only use 2^32 bytes (4GB) of RAM It is true that the maximum size of an address in a program in a 32 bit processor is usually 32 bits, however it is possible for the processor to extend the address using other means to enable it to access a larger physical memory. Some members of the 32 bit Intel IA32 family of processors (modern Pentiums) can access 2^36 bytes (64GB) of physical memory using the ESMA (Extended Server Memory Architecure [2]). A virtual address is still limited to 32 bits but 4 extra bits can be tacked onto it during translation to a physical address. This means that each process can only access at most 4GB at once transparently though a pointer but the system as a whole can use more and the process can keep more data in memory with the help of the OS. This requires the code to explicitly support this model. - A 64 bit system can use 2^64 bytes (16EB) of RAM As previously stated a 64 bit processor may have addresses that use fewer than 64 bits and usually doesn't actually have memory interfaces that support the full 64 bit address range. Many systems also reserve parts of the address range for various uses so that some bits of the address range are lost there. For the time being the number of address bits actually supported is usually "large enough" anyway, often in the Petabyte range. - The precision of a floating point value depends on the "bittedness" of the processor The question of the width of the floating point registers is basically orthogonal to that of the width of the integer registers. Practically all modern commercial processors with a floating point unit conforms to the IEEE 754 standard which specifies that a single precision floating point value is represented with 32 bits and a double precision floating point value is represented with 64 bits. For instance, the 32 bit Power2 processor has 64 bit wide floating point registers and the 32 bit IA32 family of processors actually has 80 bit wide floating point registers. Some processors incorporate SIMD (Single Instruction, Multiple Data) extensions where single precision (32 bit) floating point values can be packed into 64 or 128 bit registers and operated on in parallel. Because they operate on several values in parallel, programs that can use these extensions can improve their floating point performance quite a bit. Examples of such extensions are SSE/SSE2 in Intel processors, 3Dnow in AMD Athlon processors and AltiVec in PowerPC. - My PlayStation2 has a 128 bit processor The processor in the PlayStation2, the Emotion Engine, consists of a MIPS III RISC core and two vector units [3]. The MIPS III is a 64 bit architecture with 64 bit integer registers. The vector units each has 16 16 bit integer registers and 32 128 bit floating point registers. So, ignoring the marketing value of being able to claim "128 bit power", the PlayStation2 uses a 64 bit processor. - I need a 64 bit processor in order to use 64 bit wide integer types A compiler can well support integer types that are wider than the maximum width of the architectural integer registers by using more than one register to store them. For instance, the "long long" type in GCC under Linux on a IA32 processor is 64 bits. Using these types is typically slower than using the "native" types since the compiler has to generate code to load/store more registers and stitch together the results. - The maximum size of a file is tied to the "bittedness" of the processor The maximum size of a file in a Unix system is defined by the maximum size of the type off_t. This type can be 64 bits even on a 32 bit processor. A common solution is to have a 32 bit off_t by default and a 64 bit off64_t type that can be used if necessary. Most modern filesystems support 64 bit off_t on 32 bit systems. Programs often have to define a constant (such as setting _FILE_OFFSET_BITS to 64 in Linux) or be compiled with some special flag order to use the larger version of off_t. Potential points of contention The definition of 64 bit processors comes from John Mashey [1], someone who knows a lot more about it than I do. Feel free to argue against it but know that you must argue you point more effectively than Mashey. There are machines out there that put the lie to what I've written above. Older Cray machines didn't support IEEE 754 floating point for instance. I'd be happy to note any other deviances but bear in mind that I've purposfully glossed some things over to keep it short(ish) and not to obscure the point too much. Contact Corrections, improvement suggestions and questions welcome. pek@pdc.kth.se References [1] John Masheys comp.arch post about "bittedness" : http://www.pdc.kth.se/~pek/bittedness.Mashery.txt [2] Extended Memory Access on IA-32 Platforms : http://www.intel.com/idf/us/fall2002/presentations/DES124PS.pdf [3] Masaaki Oka, Masakazu Suzuoki. Designing and Programming the Emotion Engine. IEEE Micro, Vol. 19, No. 6, pp. 20-28 20030923