RAID

RAID is a way to use disks to increase the reliability of your computer. RAID can also increase speed. There are several types of RAID with each requiring a minimum number of disks, offering a different level of reliability, and varying degrees of speed. Lets talk about reliability first then find out which RAID is the fastest for a given level of reliability.

The are several types of RAID with each having a number called a level. RAID 1 and RAID 5 are the two you would normally use. There is no correlation between the number and speed or reliability. RAID 10 is not better then RAID 5 because RAID 10 is just RAID 1 plus RAID 0 and RAID 0 is not RAID.

A quick history

1973. IBM invent the Winchester disk drive, the grandparent of RAID because it made inexpensive disks possible.

Mid 1970s. As a kid fooling around with computers, I looked at buying two Winchester disks to make a mirrored disk array for reliability but they were $5000 each ($20000 in todays money). The result would have been RAD, Redundant Array of Disks. Floppy disk drives were cheap so I developed automated replication and backup to floppy disks to make recovery from a disk failure easy. I also created memory based cache, using separate hardware, to speed up the one disk. Today all disks have their own cache memory built in.

IBM and other companies worked with arrays of expensive disks for reliability or speed. IBM eventually receives a patent for the invention of RAID 5.

1978. Al Shugart and Finis F. Conner start Seagate Technology to make Winchester disks inexpensive. PC owners across the world suddenly had access to disk drives for hundreds of dollars instead of thousands of dollars.

Early 1980s. Those new disks were inexpensive and small enough to fit several in a PC case. You could get lots of programs to replicate and backup to a spare disk but they all worked in different ways and recovery was random. I pioneered some techniques to make PCs work faster with multiple disks, mostly based on techniques I worked on for mainframes. Today there are a lot of similar techniques built into disk hardware including command reordering.

1985. Apple finally let their Mac customers buy the same inexpensive disks used on PCs.

1987. Randy Katz at University of California, Berkeley, is using a Mac with one of those almost inexpensive disks (Nothing from Apple is inexpensive). David A. Patterson spots the disk and talks with Katz about building an array of disks for performance.

Together with Garth Gibson, Pete Chen, Ed Lee, Ann Chevernak, and Ethan Miller, Patterson and Katz develop the idea of RAID storage then publish a paper named The Case for Redundant Arrays of Inexpensive Disks. The new RAID approach worked because of the magic word Inexpensive.

A lot of companies jumped on the RAID bandwagon but deliberately made RAID more expensive than conventional disks because they could sell the reliability factor created by Redundant. Patterson agrees with the money people to change the definition of RAID to Redundant Arrays of Independent Disks. They should have called it RAED.

The marketing people then produced the idea of RAID 0 which is RAID without the R. If there was truth in advertising, the advertising for RAID 0 would say Now your computer can have AIDs.

PC owners could replace their manual disk arrays with a standard approach that was quickly made available in a choice of inexpensive software versions, expensive preconfigured hardware versions, and expensive versions including dedicated hardware.

Hybrid software RAID

Eventually hybrid versions arrived and confused everyone by selling software based RAID as hardware. You pay a lot of money for a cheap disk adaptor that does almost nothing different to a standard cheap disk adaptor. The hardware includes a software driver that does the actual work in software RAID. The Marketing people pretend you are getting hardware RAID instead of software RAID. Marketing people love it when they can charge more for hardware by advertising fake features.

The only use for hybrid RAID is in Windows where Microsoft deliberately and artificially stop you using software RAID in the non server versions of Windows. You buy the hybrid adaptor because it adds software RAID to the desktop version of Windows. Life would be incredibly easier if Microsoft sold a professional version of Windows with RAID included. (The do sell a Professional version of Windows but that gets you only NTFS and the basic security that should be in every version of Windows from the start.)

Versions

RAID 0

RAID 0 is not RAID. The R in RAID means redundant and there is nothing redundant in RAID 0. Never use RAID 0. There is a separate page describing the agony of using RAID 0 instead of real RAID.

RAID 1

RAID 1 uses one disk to store data and a spare disk to store a mirror image of the first disk for redundancy. When a single disk breaks, you can replace the broken disk and recover the broken disk from the data on the other disk. RAID 1 is a good fit for workstations with two disks. Some notebook computers let you use two disks in a RAID 1 array for reliable operation out in the field. There is a separate page describing all the benefits of using RAID 1.

RAID 5

RAID 5 uses two or more disks to store data and adds a spare disk for redundancy. When a single disk breaks, you can replace the broken disk and recover the broken disk from the data on the other disks. You need a minimum of three disks to have that spare disk and in a typical server you could have seven or nine disks. Dedicated RAID servers might have 30 disks in the one box.

A four disk case could use RAID 5 with three data disks and one spare. The three data disks, if they are 500 GigaBytes each, would produce 1500 GB or 1.5 TeraBytes. This compared to using two RAID 1 arrays to get 1000 GB (1 TB).

Recovery of RAID 5 can take a long time because one block, or stripe, of data on the new disk is reconstructed from a block/stripe of data from each of the other disks. In one system there are 32 disks in a RAID 6 array. That means 30 data disks and two spares. When you replace a disk, each stripe on that disk is reconstructed from 30 stripes, one from each of the other disks.

RAID 6

RAID 6 is RAID 5 with an extra data disk. RAID 6 can survive two disks breaking at the same time. RAID 5 is vulnerable to disk failure from the time the first disk fails to the time the disk is replaced and all the data is recreated. When you set up a new server with all new identical disks, the disks will tend to fail at similar times, perhaps during a power surge or on a hot day. RAID 6 gives you a chance to recover the first failed disk before the next disk fails.

Our four disk example could use RAID 6 with two data disks and two spare disks. The two data disks, at 500 GigaBytes each, would produce 1000 GB (1 TeraByte) which is exactly the same as using two RAID 1 arrays.

RAID 6 write performance will be slightly slower due to the extra disk but not noticeably slower than RAID 5. Read performance will be better because there is an extra disk to handle reads.

Think of a company adding a new server every month. Some disks have 3 year guarantees and will start to fail predictable at 3 years. Your servers might have 7 disks each. At the end of 3 years you have 36 servers and a total of 252 disks. Keeping track of all those disks is a problem. With RAID 5 you would look at cycling the older servers out of service, replacing the disks, then reusing the servers. With RAID 6 you can just pull out an old disk and plug in a new disk, safe with the knowledge that no matter what happens with the disk replacement, there is still an active spare to protect your data. Over a week, you can replace all the old disks without taking the server out of service.

RAID 10

RAID 10 is RAID 1 plus RAID 0 (which is not RAID!) in a combination that is difficult to recover. There is almost always a better combination than RAID 10, RAID 01, or RAID 0.

You might use RAID 10 if you have four small disks, want some redundancy from mirrored disks then want one large disk to handle video editing. You really have to work hard at testing recovery because there are two layers of RAID instead of one and there are many more combinations of failures.

RAID 01

RAID 01 is RAID 0 (which is not RAID!) plus RAID 1 in a combination that is difficult to recover. There is almost always a better combination than RAID 01 or RAID 10 or RAID 0. See the similar RAID 10 for comments about the pain created by combinations including RAID 0.

You might consider RAID 01 if you already have several disks in a RAID 0 array and want to mirror them for reliability. One broken disk will destroy the RAID 0 array. After replacement of the disk, you have to recreate the whole RAID 0 array from the mirror.

Hardware or Software?

Dedicated hardware RAID might be faster than software equivalents but it is not often the case. Good dedicated RAID hardware is expensive and if you are not going to use it to maximum capacity, hardware RAID is more expensive than adding software RAID to your existing computer.

Dedicated storage servers use standard computers with standard RAID software and alter the design only to add more disks, giving you improved speed from using more disks in parallel but then loosing the speed through the connection back to your computer. Use dedicated storage servers for sheer size or for sharing the storage across several computers.

Dedicated RAID cards in your computer suffer from small processors and lack of memory. Modern computer processors are faster than the processors in hardware RAID devices. You have to pay a lot of money to get a dedicated RAID controller that is faster than using RAID software in your operating system. In a recent comparison, a fast hardware RAID card was $800 and contained a processor equivalent to about 10 percent of the speed of one core in the main computer processor. For about an extra $50, the computer could have a faster main processor with 10 percent extra speed in each of the four cores, giving a total processing power increase four times greater than the power of the hardware RAID card. Hardware RAID for $800 with the processing power only available for RAID or $50 for four times as much extra processing power and all that extra power available for everything when not used for RAID, which would you choose?

Software RAID is standard, easy, and reliable in Microsoft NT and their versions of Windows based in NT but you have to use NTFS for maximum benefit, NTFS is only in the Pro versions of Windows, and RAID is only in their server versions of Windows. The server version of Windows used to be cheap and now it is expensive. There are RAID add ons for the non server versions of Windows and the add ons vary by a huge amount in ease of use, reliability, and how they operate when you try to recover.

Linux includes RAID for free. See Linux workstation disk RAID 1 for an example. RAID used to be a pain to build in Linux and now the installation programs are starting to make RAID easy. Recovery is still a random even because there is little documentation on recovery, a lot of the documentation is out of date, the documentation often tells you that you should have used a different RAID configuration, and most of the comments in support forums tell you you should have used hardware RAID, not software RAID.

Use RAID in Linux and practice recovery before you rely on RAID.

Reliability

RAID 1 and 5 are easy to implement and should be easy to recover. Restarting from a failure should be as simple as stopping the computer, removing the broken disk, and restarting the computer. After you buy the replacement disk, you should be able to stop the computer, insert the replacement disk, start the computer, then issue the instruction to rebuild the RAID array. If you have hot swapping, you should not have to stop the computer to remove or insert a disk. RAID 6 should be the same as RAID 5.

RAID 1, when properly implemented, leaves each disk unchanged from the way they are when used without RAID. You can switch RAID 1 off at any time and proceed with one disk. You can remove one disk, insert the disk in a new computer, then start the new computer as a clone of the old computer. I used that exact technique to clone hundreds of computers and to perform hundreds of upgrades to new computers.

Many RAID software and hardware suppliers insist on damaging the format of the original disk. When you remove one disk to start a new computer, you find the disk will not work by itself. It might work if you reproduce an exact hardware copy of all RAID related hardware and use exactly the same RAID related software. Clearly the creeps supplying the RAID are trying to lock you into buying their hardware and software.

To make matters worse, you quite often find the proprietary RAID hardware and software systems will not work if you upgrade the hardware or software at the same time. You might have to expose your credit card to the pirates on that big online auction service to buy hardware exactly the same as the hardware in your old computer. If you are lucky with the online auction, the hardware will work and it might be the same as the advertised model but it could have different incompatible firmware.

It does not matter what RAID you use, outside of the old NT software RAID, practice recovery before you need it. Test recovery of one lost disk and test recovery to a different machine.

Backup

Think about backing up some small disks to Bluray disks. Your main disk might be 200 gigabytes. Drop the unneeded temporary files and empty space to get perhaps 100 GB for your backup. Leave out all the operating system files, the 327 megabytes from open office, your clipart collection you never use, and you might be down to less than the 50 GB capacity of a dual layer Blueray disk. Backup is easy.

Suppose your computer currently has four disks, each of 200 GB, and each backed up to a Bluray disk. Take those four disks and join them together into one RAID array. You have more than 50 GB to backup. You need a special backup program to gather all the data together then split it up across several Bluray disks. The backup is one big one instead of several small ones. Restoring from backup is one long process instead of several small processes. If one backup disk fails, you have to throw out the whole backup and go to an older backup.

RAID could tip you over a limit from your current backup process to performing something more complicated and requiring different hardware. You need to plan the new approach and test recovery.

Of course you would have the same problem if you decided to ignore RAID and replace all your current small disks with one of the new 2 terabyte disks.

Disk size

RAID works best when all the disks are the same size and speed. You can use some versions of RAID to connect together disks of different sizes and that might be useful when adding extra disks to a relatively new computer. You might have a computer with four slots for disks and two slots occupied by 1 terabyte disks from a current range of disks. You are now editing video and need a lot of space. 1.5 terabyte disks are now at a good price because of the introduction of 2 terabyte disks. You might buy the 1.5 terabyte disks, use RAID 1 to mirror them, then add RAID 0 to create one big RAID 10 array.

If the disks have a significant size difference then they will have a significant speed difference and your RAID array will drag along at the speed of the slowest disk. Combining disks with significant speed differences is a real waste. If you have a lot of old disks to use and they are in two different speed groups, create two arrays with one of each speed. If the disks are in more than two speed groups, retire the oldest slowest disks. Those old slow disks are going to break soon so retire them before they embarrass themselves.

Performance

RAID 5, and therefore RAID 6, create a performance overhead for calculating the spread of data over the different disks. No other RAID has significant CPU processing overheads.

The RAID 5 overhead works out at about 25 percent for the CPU component of disk write operations and a trivial amount during read operations to compare the data arriving from the different disks. The disk activity might be 4 percent of your overall CPU activity and increasing the activity by 25 percent still leaves the activity at only 5 percent of your total CPU usage. Clearly spending $800 to buy a dedicated RAID controller for a $399 PC is not a good investment.

Rendering animation into movies is all CPU with very little disk processing but editing large videos can be mostly disk processing. A slight change in your computer usage might be enough to look at dedicated RAID hardware for RAID 5.

The most expensive part of reading and writing to RAID is handled by multibyte processing extensions in modern processors. Modern graphics processors provide faster versions of the same calculations. A small processor speed upgrade might be far more effective than a dedicated RAID hardware device.

There is little difference between hardware and software RAID in a modern computer. Using an old RAID controller in a modern computer is usually slower than using software RAID by itself.

RAID 5 is faster than all the other RAIDs but requires at least three disks. Small desktop computers and notebooks are often limited to two disks, limiting you to RAID 1. RAID 6 is more reliable than RAID 5 but in a small configuration, say four disks, there is little difference in speed and RAID 5 will provide more storage. In a server with a lot more disks, say 9 disks, you will not see a performance difference between RAID 5 and RAID 6, you will not see much different in capacity, but you will be a real difference in recovery time if two disks break at the same time.

Think about the lots of disks example again. The server was probably set up with a collection of new disks that will all age at the same rate and break at similar times. A slight malfunction in the air conditioning or a voltage overrun can burn out multiple disks. When you have a large array in a server, the replacement of a single disk might take only a few minutes but rebuilding the data on that disk might take several hours in the background. During those several hours, any one of the other disks could break from age, heat stress, or continue to degrade from that voltage spike. Now think about the server down for a day while you recover several terabytes from those old DVD backups. Switching from RAID 5 to RAID 6 will improve the performance of a recovery from a two disk failure.

Conclusion

Choose RAID for reliability and easy recovery from the common single disk failure. You also get speed as a side benefit. RAID can make backup a huge task and recovery more complicated for anything other than a single disk failure. Practice all types of recovery before depending on RAID.