Wednesday, March 3, 2010

Taking a closer look: RAID-5. How does it measure up?

In my last post, I explored the Holy Grail of Home Network Data Storage.  Now, let's take a look at some specific solutions and see how they measure up.

First, I want to take a generic look at RAID, since most of the other solutions I will be discussing are decended from RAID.

What is RAID?

NOT Bugspray RAID is an acronym that stands for "Redundant Array of Inexpensive Disks".  The idea behind RAID is to use multiple hard drives to store data in a way that either increases speed, or increases reliabilty... or both at the same time.
RAID is actually more of a concept that includes many different sub-categories.  There are many different "Raid Levels" such as RAID-0 which is designed purely for speed, RAID-1 which is designed purely for data protection, and RAID-5 which, with enough physical harddrives, combines both speed AND data protection.
For our discussion about the Holy Grail of storage, I will focus mainly on RAID-5 because RAID-0 is sorely lacking in the RELIABLE department, and RAID-1 fails to impress in the CHEAP factor.
RAID-5 uses a combination of "striping" and "parity" to increase both speed and reliabilty of data. 

Understanding Striping

truck Striping is a way of spreading data across multiple drives to increase speed.  Here's one way to think of it:  Suppose you have a house full of furnature, and you need to move across town.  You have a truck that can hold a certain amount of furnature.  Assuming your packers and loaders are VERY fast, the speed limit in town really limits how fast you can move your stuff from one location to another.  You can greatly increase the speed at which you can move your stuff if you add more trucks!  If you have 3 trucks moving the furnature, the job gets done MUCH faster.
Hard drives are kinda similar to this idea.  The most limiting factor of a hard drive is the speed at which it can read data off the disk and pump that data through it's controller to the network card, or wherever it's needed.  With today's hard drives, it really boils down to how tightly packed is the data on the surface of the disk, and how fast is the disk spinning.  When you use multiple physical hard drives and spread the data between them, you can get more data moved faster, just like using more trucks.

Understanding Parity

Parity is a way of adding extra bits to existing data to create a way to rebuild missing bits.  It can be hard to wrap your brain around how it actually works, but I can make it simple.  Consider the following excersise.
Imagine we have 3 Bytes of data, and each byte is stored on it's own hard drive.  Bytes, as you know are made up of 8 bits.
IMG_0265
For each ROW, add up the bits.  If the result is an odd number, ADD 1 to the row.  If the result is an even number, add zero.  The goal is to make sure every row adds up to an even number.  For example, the first row is 0+0+0 which adds up to 0, which is an even number, so we write down a 0.  The fifth row is 1+1+1, which adds up to 3, an odd number.  We add a 1 to bring it up to 4, which is an even number.
IMG_0266
Now, imagine one of the hard drives bites the dust.  We've just lost all 8 of the bits on that hard drive.  But, with Parity, we can "REBUILD" the missing data.
IMG_0267
Now, imagine adding a new empty hard drive, and do the same excersise again - For each row, add up the bits and either add 1 or add nothing to make sure each row adds up to an even number.
IMG_0268
As you can see from this simple excersise, we can very easily determine the missing data by using parity along with remaining data to figure out what was missing.  It works the same way in a computer, except that the CPU uses the XOR math operator to add up the bits much more quickly.
IMG_0269

RAID-5 - how does it measure up?

Raid-5

Ok, now let's take a look at using RAID-5 as a storage solution for a home network.

Expandable

stretch-armstrong RAID-5 is NOT very good at being expandable.  The problem with RAID-5 is that while you can use many hard drives, they must all be the same SIZE, and the must be all together from the start.  If you have hard drives of multiple sizes, the array can only use as much space on each drive as the smallest drive.  Any extra space on the other drives is wasted.  Also, you cannot add another hard drive to a RAID-5 array without rebuilding the array.  This usually means you need to take all of the data out of the array(backup), then rebuild it from scratch including the new drive, then move all the data back into the array.
There are certain hardware RAID controller cards that actually can handle adding drives to an array, but these tend to cost alot more, which goes against the CHEAP factor, plus it's often quite a complex task to actually make the change since much of it must be done by hand.

Reliable

trust RAID-5 is very good at being reliable.  In an array with multiple drives, you can loose one drive completely, and still not loose any of your data.  You can add a replacement drive back into the array, and the controller hardware or software can rebuild the missing data using the parity.  As a bonus, the controller card can even calculate the missing data on-the-fly.  If you need to access your data while a drive is missing from the array, the controller can perform the parity calculations on-the-fly to create the missing data and serve it to you, even without it existing on a physical drive.

Fast

fast RAID-5 tends to be pretty good at being fast.  At least, the potential is there.  It really depends on the speed of the Controller that is controlling the hard drives.  If the controller has dedicated hardware circuitry to handle the calculations for parity, then a RAID-5 system can be incredibly fast, and can easily be faster working together than any of the single drives working alone - while still being safe.
 
stop_sign Ok, let's stop for a moment.  RAID-5 isn't really a "system", it is a concept that would be used in building a system.  If we want to use RAID-5 to provide data storage on a home network, we would still have to use it as a feature on a server, or a stand-alone Network Attached Storage device.  The choice we make here determines the following several factors as a network storage device.
 

Secure

Lock and Chain RAID-5 itself has no security.  It's the operating system of the machine running the RAID array that determines the security.  You could build a Windows Server 2008 machine with some AWESOME security, and control access to every single file on the system.  Or, you could buy a cheap NAS enclosure with really badly implemented (or no) security.

Low Power

Battery_9V Again, how much power a system uses depends on the whole of it's parts, but thinking specifically of just the RAID-5 array, one important thing to remember is that when you access data on the array, ALL of the hard drives in the array must be spinning.  In fact, I would not expect to spin-down the hard drives of a RAID-5 array ever.  There is potential for a low-power system if you select the right drives and other hardware.

Quiet

whisper Likewise, how Quiet a system is depends on the whole of it's parts, but taking into account the number of hard drives, and the fact that they must spin continuously, you have to consider fans to keep them cool.  The potential is there for a quiet system, depending on the case and the fans.

Cheap

RAID-5 has great potential here.  You see, the problem with RAID-5 is that it requires at least 3 hard drives just to bring it into existance, and if you want to eventually have more drives, you really should start with those from the beginning.  So, there is an initial start-up cost investment to deal with.
piggy bank The Cheap factor also includes Efficiency, and this is where RAID-5 shines.  You can get full data protection from any 1 hard drive crash for the cost of one of the drives.  Each additional drive you add after the first one adds you that much space.  If you have 3 500g drives, you loose 500g to safety, and can use 1000g.  If you add another 500g drive (at the start), you get 1500g of usable space.  As you consider starting with even more drives, the cost per usable gig of storage goes down.
There are a few other factors to consider - again, the cost of the system depends on the whole of it's parts.  And to do RAID-5 really well, you should get a dedicated RAID controller card to connect the drives to the system.  Those pretty much start at roughly $300 from what I've seen.

Simple

lightswitch Here's where it all falls apart.  You see, if we consider RAID-5 as a stand alone concept, then we must assume that you know what to do with it.  You will need to build a system to house it, or you'll need to go buy a box of some kind to build it in.  I'm not talking about a turn-key system, I will explore those in detail in my new few posts.
A well built RAID-5 system would be simple to maintain, but very complex to build initially.  My wife could not build one, but if the machine had hot-swap bays, she could easily swap out a dead hard drive.  Then again, that all boils down to - how much are we willing to spend on it to start with?
Bottom line here is this: I can’t just turn to my wife and say “RAID-5.   GO!” and expect her to know WTF I’m even talking about.

Other Thoughts

As will all of the "systems" I'm planning to discuss, there will always been some additional considerations to think about.  These are the litle gotchas that you need to consider outside of the other factors mentioned above.
Raid is not backup First, I will take the opportunity again to point out that RAID is NOT the same as backup.  If you have a healthy RAID array, and you (or someone you love (or a virus!)) accidently deletes a file, there is no protection from that at all.  The RAID array will happily delete the file, just exactly like a single ordinary hard drive would.  If you overwrite a file, it's gone.  BACKUP YOUR DATA.
danger-signIf you loose one hard drive from an array, the array is then unprotected until you replace the missing drive and let it rebuild.  During this time, if you were to loose another hard drive, you would loose all of the data on the array.  The fact that RAID-5 continues to run with a failed or missing hard drive increases the temptation to USE the system in that state, which in turn increases the risk that something could go wrong during that time.  If you want ultimate safety, you would shut the system down when it looses one drive.
computer-parts In a RAID-5 system, you really need the whole system up and running to access the data.  If you had to take the system apart, you would not be able to get anything out of the drives by hand because the files are stretched across all of the drives, and only the chipset of the RAID controller knows exactly how to reconstruct the data.  If you use a rare RAID controller card, and it dies instead of one of the hard drives, you may loose all of your data if you can't replace the controller card.  One might consider deliberately using a controller with a common chipset in case replacement is ever necessary.
 

Coming up next:  Drobo!

No comments:

Post a Comment