すみませんが、いまはこの記事が日本語で不可能選択です。

Data Storage Solutions

投稿されました:
[この記事について]

The Problem

At the turn of the century most average computer users probably could have fit their most important data on a very small harddrive[1] (such as the 3GB harddrive I had in the 1990s), so moving it around and making backups wasn't too much of an issue.  Now, with the proliferation of high-quality media and personal digital electronics, most people have accumulated several gigabytes, if not terabytes, of “important data.”

When a digital camera or smartphone fills up with photos, it's common practice to unload them to the harddrive of a personal computer.  This seems like the perfect solution since common, affordable harddrives already have very large capacities and it's continuing to increase very quickly.  However, harddrives fail.  Frequently, and unexpectedly.

There's the tendency to expect harddrives to only fail if an owner is rough with his computer, for example, if he drops his laptop on the sidewalk or in the bathtub, or keeps his tower on a wobbly table and bumps it too hard.  I somehow always thought if I was careful it wouldn't happen to me, but it eventually did in July of 2015, and the computer was only a few years old:  My table of contents somehow went corrupt, and the operating system could no longer find any files beyond the boot partition.  Luckily with some $50 software my brother was able recover probably 99% of my “important data,” but if I wanted a more thorough recovery, which some people must resort to, I would have had to pay a specialist anywhere from $500 to $2,000!

To complicate matters further, our increasingly larger harddrives are more complex and therefore more prone to failure.  There's even speculation that the manufacturing quality has gone down significantly, which contributes to — you guessed it — more failure!  Just look at all the super-old, operational computers you can get off of eBay or from thrift stores versus how quickly today's machines crap out.  My brother's Atari computer can still play games, and George R.R. Martin still uses a slow, old machine running DOS to compose his latest best-sellers.

Clearly we need to back up our data against these eventual failures.

Solutions

Thankfully there are several affordable solutions available, solutions that are leaps and bounds better than floppy disks[2].  We'll explore the following solutions:

Other solutions which I won't go into much depth on are RAID and backup software.

If you've never heard of RAID (and I don't mean the kind that kills roaches) this may not be the solution for you.  Simply put, different types of RAID consist of a configuration of (usually) multiple harddrives to duplicate and/or spread out data, but behave as if a single storage unit, so that if one or more drives fail data loss is minimized.  Not the typical solution for home users who don't build their own computers.

Depending on the quantity and nature of your data, automatic backup software may or may not be useful to you.  If you just want to back up digital photos, for example, manual methods may be sufficient.  In any case, backup software usually integrates with one or more of the following solutions.

Secondary Internal Harddrives

Having at least two harddrives is always a good idea when possible:  Install your operating system on the primary drive, then reserve the other for important non-program data (e.g. photos and documents).  The reason for this is that the second drive is expected to be accessed less frequently and therefore receive less wear, while nothing grinds on a harddrive like an operating system.

This still isn't an ideal backup solution as the secondary drive is still being accessed on a regular basis, and is therefore still vulnerable to wear and general failure.

By the way, if you back up data from one harddrive partitian[3] to another partition on the same harddrive, it won't do any good against harddrive failure, because a failed harddrive with partitians is still a failed harddrive.

External Harddrives

External harddrives are an excellent solution for medium-term backup:

  • They're very large in capacity.
  • They're very affordable (if you get HDD[4]; SSD[5] is very expensive).
  • They're typically accessed very infrequently compared to an internal harddrive and therefore don't wear out as quickly.
  • They can be easily used to manage data as if it resides on your internal system (albeit access is a bit slower than with an internal harddrive).
  • They are portable and can be conveniently moved between computers.

I say “medium-term” because external harddrives are identical to internal harddrives, the unobvious difference being that they come in conveniently portable enclosures with USB connectors.  They're expected to last a long time, but still aren't permanent enough.  That is, we hope they'll last a long time, but any harddrive is still succeptable to unexpected failure, which may be contributed to by any of the following:

  • Wear through normal usage.
  • Rough handling (e.g. dropping and bumping).
  • Power surges or power loss.
  • Exposure to weather.
  • Dust build-up.
  • Software error.
  • Manufacturing defects.
  • Poor manufacturing quality.
  • Bit rot[6].

According to everything I've read and heard from other people, the most reliable brands of HDD manufactuers are, in this order, HGST, Toshiba, Western Digital, and Seagate.  However, to put into perspective how they perform relative to each other, here's a chart from 2016.  It may be a few years old, but as of this writing (2019) it seems to still be relevant[7].

HGST has been rated far better than Western Digital, but Western Digital bought out HGST in 2012 and has transitioned the HGST brand to another line of Western Digital products, so it yet remains to be seen whether the quality of Western Digital will improve or if the quality of HGST's successor will be pulled down by Western Digital.

In any case, when purchasing an HDD, I currently would recommend getting one of these brands and avoiding others whose reputations you aren't sure of.

According to one study, the four best brands for SSDs are Corsair, Intel, Kingston, and Samsung, but SSD is one of most rapidly changing technologies, so it would definitely be wise to always research before buying one.

As I mentioned earlier, harddrives are getting big, for example you can currently get an 8TB HDD for about $200, which is an excellent value, but older computer systems that run on 32-bits may have trouble utilizing them; a 32-bit processor can't count high enough to address more than 2TB of space[8].

Harddrive Tips

  • If your computer is a 32-bit system, the largest harddrive it can likely handle is 2TB.  You can still connect multiple 2TB harddrives to one computer for more storage space, provided you have enough available connection ports.
  • Don't let your operating system “compress this drive to save disk space.”  The system will automatically compress[9] all files when not in use.  Today's harddrives are so large that any benefit acheived from the extra storage space is outweighed by the reduction in performance, as files must be decompressed each time they're read and recompressed each time they're written.  Worst of all, in the event of a disk failure, recovery software will likely be unable to identify and recover files, as one of the natural side effects of compression is obfuscation.  (User-compressed files, such as ZIP archives, stand a better chance of being recovered as compression details are a part of the compressed archive file.)
  • Should you use disk encryption?  It depends on
    1. How much risk do you think you're at for having your computer stolen? and
    2. How critical a violation would it be if your harddrive fell into “the wrong hands”?
    The problem is that, again, in the event of disk failure, recovery software will have a very difficult time identifying and recovering files that are encrypted.  Unless you know the decryption key, you probably won't recover anything.  If your computer contains trade secrets that could put you out of business if leaked, a total data loss might be more preferable than a thief being able to view your data.  If you're careful to not store financial or embarrassing data on your computer, easier recovery may be more important to you than making sure a thief can't view your files.
    • If you feel you must encrypt some important files, only enable encryption on specific locations, instead of the entire drive.  Also, I've read that third-party solutions are usually more reliable for recovering data than solutions built into the operating system.
    • Also for consideration:  A log-in password on an unencrypted computer may prevent others from logging into your computer, but this can be bypassed by removing the harddrive from the computer and plugging it into another computer as a secondary drive.
  • These days, many external harddrives that come in enclosures are equipped with chipsets that automatically encrypt data because some idiot designers decided that the drive enclosures need them.  My brother had a failed external drive professionally recovered, and a hefty portion of the fee was time spent decrypting the drive contents, even though my brother had never configured any active encryption.  If you want to avoid forced encryption like this, check your new drive's specifications (or use OEM drives with a drive dock).
  • Don't leave an external harddrive turned on constantly.  Time spent powered on uses up an HDD's lifespan.  When you think you won't need it for a long while, use the “Safely remove” command to disconnect, then switch off the drive (or unplug its power cord if it doesn't have a switch).
  • Never, ever stand an external hardrive vertically!  No matter how cool it may look propped up on the stand that came with it, a harddrive should always lie flat (unless the base is wider than the stand is tall).  Upright rectangles have a high propensity to fall over when bumped, and fall hard.  Knocked-over harddrives usually go bad!
  • The larger a drive is, the greater the potential loss in the event of failure.  (If a failed drive is recoverable, recovery time and hassel significantly increase with size!)  Consider limiting “drives for important storage” to about 2TB, especially if you want the drive to be compatible with 32-bit operating systems.  Reserve that monstrously large 8TB drive for less important data like temporary files for video projects, or a home media server whose original resources are backed up.

Removable Media

“Removable media” can refer to any storage medium that can be removed from a computer without having to disassemble the machine, but for this section I'd like to narrow the discussion down to just the likes of memory cards and USB thumb/flash/stick drives.

Memory cards and thumb drives use one or another type of “flash memory” and are the fantastic successors to magnetic media as they are impervious to magnetic fields, run much faster, are much more compact, and have no moving parts that can “crash” into each other.  They do have some unique drawbacks however:

  • Memory blocks can only be written to a limited number of times before they're unusable.  The consensus is that this typically works out to a device lifespan of about 5 years.  That figure is with constant rewriting however, so in practice the lifespan should be a bit longer.
  • X-rays can erase data[10], and only in recent years has this issue started to be addressed.
  • With some types of flash memory, repeatedly reading the same area of memory can “disturb” and corrupt neighboring data in the same area.  Some devices compensate for this.
  • Some types of flash memory must be periodically refreshed with an electrical charge as the memory cells can leak their charges (a form of bit rot).  Today's flash devices have a typical memory retention age of 1 year at room temperature[11].
  • I'm mostly just nitpicking on this point, but these media are so small that it's too easy to lose them.

One common practice with memory cards is to fill them up with photos or other important data then toss them into a desk drawer or a shoebox and then forget about them for long periods of time.  That's not very wise given the above point about charge leaks.

Below are just two of several photos that went bad for whatever reason while sitting for a long time on the memory card of my first digital camera.  Some random bits went bad, not rendering the file unreadable, but causing blocks of pixels to shift and change color due to the alteration of the stored information.  (Luckily that wasn't my wedding.)

Photos ruined by just a few bits gone bad[12].

Memory Card and Thumb Drive Tips

  • Thumb drives are great for a frequently worked-on paper or project that you need to take between home and school or work, but if you plan on taking a long break, it would be wise to create a backup copy in case of bit rot or loss of the thumb drive.
  • To reduce the likelihood of losing a camera's memory card, buy a card with a very large capacity so that it doesn't have to be frequently switched out for an empty spare.
  • Frequently move photos from your digital camera or other mobile device to your prefered backup medium (or temporarily to a harddrive), especially just before going on a vacation, so that you don't need to carry a spare memory card and worry about losing a full one after switching it out.
  • Thumb drives are available in a broad range of styles, ranging from extremely compact to bulky novelty designs.  Avoid anything that you think looks like it was made with cheap materials, and be sure to get one that securly encloses the connector to keep out dust and water.  (Well enclosed thumb drives have accually survived accidental trips through a washing machine and dryer!)
  • On your memory cards, thumb drives, and mobile devices, put a text file containing contact information (e.g. name, e-mail address, telephone number, mailing address, whichever you're comfortable with) in the root directory.  This will make it easier for a Good Samaritan to return a lost and found item to you.  Name the file “@owner.txt” so that alphabetic sorting places it near the top of the file listing.  (This is of course useless if you've secured your device with a password or encryption.)
    • With a digital camera or other mobile device which can display photos, take a picture of a sheet of paper displaying your contact information to help the less technically-skilled Good Samaritan:  “If found, please contact Jenny at 867-5309.”

Cloud Storage

“Cloud storage,” also known as file hosting, consists of uploading your files to a remote server and paying somebody else to store them for you.  This can be a very inexpensive solution, and in some cases you can even get free storage for relatively small amounts of data.  Any cloud service worth its salt should implement plenty of redundancy to prevent your data from being lost in the event of a disk failure.  The cloud does has caveats, however:

  • No matter how inexpensive cloud storage may be, you have to keep on paying for it forever if you want your data there forever.
    • If recurring payments get interrupted, services might freeze access to the data or simply delete it.  Some are more generous (indefinately allowing downloading but not uploading) while others will delete your data if payment isn't received as soon as within 7 days of notification of a missed payment.
    • Some free services delete files after a certain period of account inactivity.
  • Having to download or upload files across the internet every time you view or modify them can be slow, especially if networks are congested (or throttled).
  • Network systems can go down, temporarily locking you out of your data.  Amazon has had a major service blackouts in which it appears no data was lost, but dozens of services were down for lengthy periods of time[13].
  • A proper cloud service should have a good plan in place to back up your data, but there's still always a small chance of this system failing completely.  Your data is still being stored on just a bunch of harddrives at some far away data center.
  • There is always the possibility of hackers (or disgruntled or nosy employees) gaining unauthorized access to your data stored on a remote server.  They may simply view personal data, or vandalize otherwise important data.
    • Some servers encrypt your data such that, short of years of continuous brute-force processing, no unauthorized people can access your data.  This is great—so long as you don't lose your decryption key.
  • If your computer gets infected with ransomware, your cloud drive might automatically synchronize with the files encrypted on your local computer, as happened in one case.

“Burned” Discs

Burned discs[14], e.g. CDs, DVDs, Blu-rays, etc. seem to be the most permanent solution.  They're impervious to magnetic fields, data cannot be accidentally erased as it's permanently fixed, and the discs are even for the most part waterproof.  Their only weakness seems to be scratches and fingerprints.

When I started burning CDs in the '90s I thought they were permanent, but they're not, as disc-burning works by burning a layer of organic dyes which can decompose over time.  If poorly manufactured, air can reach the shiny metal layer and cause oxidation, or chemical reactions can occur between the adhesive and the other materials.

How long a disc will last depends of course on material quality and how it's handled, but depending on where you look, articles will claim that burned media have a lifespan of anywhere from 2 to 200 years, although the anecdotal concensus seems to be about 5-15.  Luckily I haven't yet encountered a burned CD that went bad just because it was old, but my brother every now and then has a burned DVD go bad on him.

It seems like we're screwed, as nothing we can do has any significant permanence.  So what are we to do?  Continuously copy our ever-increasing hoard of data to new media like a paranoid madman?  Nobody's got the time or money for that!

Luckily we still have one more option available (and I don't mean a paper-based solution).

M-Disc

There's a new type of storage medium called “M-Disc.”  Like other optical discs, it works by using a laser to burn information into a storage layer sandwiched between two layers of plastic, but instead of using organic dyes, it uses a rock-like substance.  Essentially, you're etching your data into stone!  Generally speaking, the only things that destroy stone are erosion by wind and water, and subduction into the Earth's mantle.

With tough polycarbonate serving as the plastic sandwich, M-Discs are resistant to typical abuse and are rated to last up to 1,000 years[15].  Even if they actually have a shelf life of only 100 years, that's leaps and bounds beyond everything else we have now, as that's enough time that I won't have to worry about losing my data to bit rot, and my children and grandchildren will have plenty of time to copy my family photos to a new M-Disc or whatever better medium is invented in their lifetime.

Another great thing about M-Discs is that they're backwards compatible with older hardware.  Well, mostly backwards compatible.  You can get M-Dics in DVD and Blu-ray formats (no CD because manufacturers decided that the cost of production versus the low capacity wasn't worthwhile), and similar to how CD-Rs were mostly backwards compatible with old CD-players, some hardware may have difficulty reading the new discs.  In my experience, I haven't had any problems reading M-Discs on my old USB Blu-ray drive or my Blu-ray player.

Now the major downside of M-Discs:  They're expensive.  Luckily the price has come down quite a bit in the few years since their debut, but they still cost anywhere from $1 to $20 per disc, depending on the disc's capacity and quality.  Thankfully the lower-capacity (and more common) formats are at the lower end of the price spectrum, but they're still expensive enough that you don't want to use them for a bunch of temporary copies or utilize a small percentage of their capacity.

The brand that I currently place my confidence in is Verbatim.  At current prices, a 25-pack of 4.7GB DVD M-Discs costs about $36, working out to be $1.44 per disc, while a 25-pack of 25GB Blu-ray M-Discs costs $51, working out to be $2.04 per disc.  While the Blu-rays are more expensive than DVDs, they work out to be more economic as the DVDs cost roughly 31 cents per gigabyte, while Blu-rays cost only 8 cents per gigabyte.  (When you get into the multi-layer formats, however, the costs skyrocket!)

As for other brands of M-Disc, there's Millenniata, which reviews imply is a superior brand, but it's twice as expensive as Verbatim which is also an excellent brand.  There are other, much cheaper brands, but reviews report that they're unreliable.

When it comes to archive longevity, M-Disc currently can't be beaten, and I use it for archiving my data.

  1. Harddrive:  The harddrive or hard disk drive (HDD) is the primary storage device of a computer.  It is typically non-removable, and is called “hard” because it uses solid metal disks, albeit with moving parts, to store data (as opposed to the floppy plastic disks of yesterday's removable media).  Computers may alternatively use a solid state drive (SSD, a drive with no moving parts) as the primary storage device, which may colloquially be referred to as a “flash harddrive.”  Since this article is not intended to focus on the differences between HDD and SSD, and both fulfill the same role, I will for simplicity use “harddrive” to generally refer to both.
  2. Floppy disks were so named because because they were very flexible.  The 3½-inch diskette had a hard plastic enclosure, but it was still a floppy disk inside.
  3. Disk partitioning is a method of logically dividing a single harddrive into separately managed areas.  Each area will show up in the file manager as an independent device, even though this is physically not the case.  (Here's how to determine if your harddrive has been partitioned.)
  4. A Hard Disk Drive (HDD) is a storage medium using solid metal disks, and other moving parts.
  5. A Solid State Drive (SSD) is a storage medium with no moving parts.
  6. Bit rot” occurs when stored data “just goes bad” without the storage medium necessarily failing.  Depending on the storage medium, this can be caused by electron “leaks,” exposure to extreme temperatures, radiation, or magnetic fields, or even by the storage medium literally decaying (e.g. paper punch cards).
  7. A good source of comparing HDD dependability is BackBlaze's quarterly statistics.
  8. 32-bit counting and the 2TB ceiling:  The largest value a 32-bit number can represent is 232-1, or 4,294,967,295, for a total of 4,294,967,296 values (0 through 4,294,967,295 inclusive).  Harddrives are divided into sectors which traditionally have contained 512 bytes each, so this is the sector size usually managed by 32-bit systems. 512 bytes times 4,294,967,296 sectors equals 2,199,023,255,552 bytes, or 2 terabytes.
  9. Data compression is acheived by identifying repeating patterns in a file, then rewriting the file so that its contents are defined by a table of patterns.  This results in a smaller file, especially if the file is mostly text, as human languages are full of repetitive patterns.
  10. See Flash Memory:  X-ray effects.
  11. See “Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery,” p. 10.
  12. In some cases, a damaged digital photo can be repaired by manually inspecting and editing the binary data, but it's a very involved process which most people would rather avoid.  ImpulseAdventure has a great article about it, though, and software that can assist in such an endeavor.
  13. Some instances of AWS outages:
  14. “Disk” or “disc”?  Prior to the recent advent of flash memory, spinning disks have traditionally been the most convenient method of storing computer data, in that they can be searched and accessed very quickly, as opposed to other contemporary methods like magnetic tape.  One family of disks consists of optical discs, e.g. CDs and DVDs.  “Optical” because they work by reflecting a laser off of a shiny surface into an optical reader, and “disc” simply because the industry decided to use this spelling to differenciate optical discs from non-optical disks.
  15. See Millenniata's “What is M-Disc?” page.