The Capacity to Lie: Lying About Capacity

Posted:

Computers use the binary base to manage and process numbers.  For this reason, they like numbers that are in powers of 2, and work best with processors and memory chips that are built on powers of 2, which is why you usually see numbers like 8, 16, 32, etc. associated with electronic devices and not “strange” numbers like 10, 21, 43, etc.  For this reason computer systems have traditionally used 1024 (210) as a base for digit grouping when reporting storage usage in kilobytes, megabytes, gigabytes, etc., as opposed to the Occident tradition of using 1000 (103), e.g. $10K = ten thousand dollars.

When it comes to a physical storage medium, however, capacity does not have to be a power of 2; it can be whatever number of storage positions will physically fit, the only real requirement being that the computer processor can count that high.

Manufacturers are notorious for cutting corners in manufacturing wherever possible then hyping up their products to be more that what they really are.  Take for example the Amazon Kindle.  Amazon used the 1000-digit-grouping as the base of its “8 gigabyte” claim and aimed for a physical arrangement that would guarantee at least 8,000,000,000 bytes of storage (1000*1000*1000*8).  This is over half a billion bytes short of the 8,589,934,592 (1024*1024*1024*8) actually expected from “8 gigabytes.”

Not surprisingly, most manufacturers do the same thing with optical discs, memory cards, and harddrives, as this deceptive practice has become an industry standard.  They were truthful back in the days when CD-ROMs were the de facto storage medium, when a “650-megabyte” disc really could hold 681,574,400 bytes and a “700-megabyte” disc 734,003,200 bytes, but then they realized they could get away with being cheap and lazy by using a different counting system.  I first noticed this in the late 1990s when I realized that my “ZIP100” disks did not hold a full 100 megabytes as advertized[1].  In those days every byte counted as removable storage media weren't cheap and filled up way too quickly, so that really pissed me off.

Here's the breakdown of some common capacity claims versus what you actually get:

When the
manufacturer
claims...

You're actually
getting just over...

When you should
be getting...

Shorting you
approximately...


Actual Yield


Disparity
100 megabytes (ZIP100[2]) 100,431,872 bytes 104,857,600 bytes 4,425,728 bytes ≈95.78 megabytes 4.22%
650 megabytes (CD-R) 681,574,400 bytes 681,574,400 bytes None 650+ megabytes < 0%
700 megabytes (CD-R) 734,003,200 bytes 734,003,200 bytes None 700+ megabytes < 0%
4 gigabytes 4,000,000,000 bytes 4,294,967,296 bytes 294,967,296 bytes ≈3.73 gigabytes 6.87%
4.7 gigabytes (DVD-R) 4,700,000,000 bytes 5,046,586,573 bytes 346,586,573 bytes ≈4.38 gigabytes 6.87%
8 gigabytes 8,000,000,000 bytes 8,589,934,592 bytes 589,934,592 bytes ≈7.45 gigabytes 6.87%
16 gigabytes 16,000,000,000 bytes 17,179,869,184 bytes 1,179,869,184 bytes ≈14.90 gigabytes 6.87%
25 gigabytes (BD-R) 25,000,000,000 bytes 26,843,545,600 bytes 1,843,545,600 bytes ≈23.28 gigabytes 6.87%
32 gigabytes 32,000,000,000 bytes 34,359,738,368 bytes 2,359,738,368 bytes ≈29.80 gigabytes 6.87%
64 gigabytes 64,000,000,000 bytes 68,719,476,736 bytes 4,719,476,736 bytes ≈59.60 gigabytes 6.87%
128 gigabytes 128,000,000,000 bytes 137,438,953,472 bytes 9,438,953,472 bytes ≈119.21 gigabytes 6.87%
256 gigabytes 256,000,000,000 bytes 274,877,906,944 bytes 18,877,906,944 bytes ≈238.42 gigabytes 6.87%
512 gigabytes 512,000,000,000 bytes 549,755,813,888 bytes 37,755,813,888 bytes ≈476.84 gigabytes 6.87%
1 terabyte 1,000,000,000,000 bytes 1,099,511,627,776 bytes 99,511,627,776 bytes ≈0.91 terabytes 9.05%
2 terabytes 2,000,000,000,000 bytes 2,199,023,255,552 bytes 199,023,255,552 bytes ≈1.82 terabytes 9.05%
4 terabytes 4,000,000,000,000 bytes 4,398,046,511,104 bytes 398,046,511,104 bytes ≈3.64 terabytes 9.05%
8 terabytes 8,000,000,000,000 bytes 8,796,093,022,208 bytes 796,093,022,208 bytes ≈7.28 terabytes 9.05%
1 petabyte[3] 1,000,000,000,000,000 bytes 1,125,899,906,842,624 bytes 125,899,906,842,624 bytes ≈0.89 petabytes 11.18%

As you can see, the lower the capacity, the more trivial the disparity is, but as the claimed capacity rises, the disparity increases significantly, leaping an additional two-plus percent with each higher digit group.

Using 1000 for digit grouping may seem more logical, given the convenience of not having to do math when simplifying a byte count into another measurment, but I prefer binary's 1024.  A very small part of the reason is because when I buy a “4‑terabyte harddrive,” I feel like I'm getting four-trillion bytes of storage space, plus a huge bonus (until I remember that “4 terabytes” is a lie).

I prefer 1024 mainly because I'm a programmer.  A lot of my numeric considerations throughout my life have had to be in powers of 2.  Harddrives may be getting affordably larger while memory is getting cheaper and processors faster, making it so that anybody can write bad code that's “good enough” to accomplish the targeted task, but not too long ago good programmers had to pay close attention to how much memory their programs used, how much space was required for storing data, or what the limitations of a selected datatype were.  Determining how many kilobytes or megabytes are needed to be allocated is best represented with 1024-digit-grouping.

Take for example the 8-bit color system used by the game Duke Nukem 3D.  All of the game's graphics were mapped to a 256-color palette (28 bits = 256 indices), so that each pixel in an image required only 1 byte of storage instead of 3 (or 4 when padding the bytes out to be more processor-effecient)[4].  While this limited the game to less than 1% of the possible color spectrum, it afforded the advantage of easy palette swapping, whereby the same 256-color palette could be remapped for a single on-screen graphic, for example, to shift all of an alien's blue-colored clothing to green or orange.  Palette swapping was also used to simulate shading and transparency via look-up tables of palette remappings, which was much faster than trying to calculate the color changes in real time (which was crucial give the typical speeds of common processors back then).

The palette swaps needed 1 byte per pixel, so 256 bytes per palette, with a maximum of 256 palettes, totalling 65,536 bytes that needed to be allocated:  Exactly 64 kilobytes digit-grouped by 1024, or approximately 65.56 kilobytes digit-grouped by 1000.  The same numbers applied to the transparency tables.  There were 32 shade tables, so 32 × 256 = 8192 bytes:  Exactly 8 kilobytes digit-grouped by 1024, or approximately 8.20 kilobytes digit-grouped by 1000.

As you can see, using the 1000-digit-grouping when programming is very clunky and imprecise, whereas with the 1024-digit-grouping I can acturately say that I need 136 kilobytes for Duke Nukem 3D's color look-up tables without any ambiguity about exaclty how many bytes are needed.

To make these lies about storage capacity even less obvious, some operating systems now use 1000-digit-grouping when reporting how much space has been used or is available.

This is regrettably just another broken function of our modiern society which probably won't ever get fixed, as we'll just have to accept that we're not getting the storage capacities we're paying for.

  1. I'm not sure if Iomega, the ZIP Drive's manufacturer, started this trend, but it came as no surprise given the way product quality plummeted in the late 1990s and how the company mistreated its employees and nearly burned itself to the ground through mismanagement.
  2. I've listed the so-called “100-megabyte” ZIP100 since I mentioned it earlier.
  3. I've listed a 1-petabyte medium for reference even though such a thing is not yet available to general consumers, to show what numbers we can expect.
  4. Exactly 3 bytes (24 bits) are requred to represent any one of the 16,777,216 colors that a computer can display, but sometimes 4 bytes (32 bits) are used just to pad the data out to a power of 2, or to be able to define effects such as transparancy or referencing a dynamically system-specified color.