ATA disks are cheaper than SCSI disks. RAID cards are now available for ATA cards. An ATA RAID package is half the price of the equivalent SCSI RAID package. Is the ATA performance as good?
The disk manufacturers release their latest high speed disk drives as SCSI drives first because they can charge a higher price for the SCSI version. The higher price means big profits. Take a $500 ATA disk, add a $20 SCSI chip and you have a $1,000 SCSI drive.
A top of the line ATA disk might cost $300 to build with a $10 ATA controller chip. Add $3 to cover replacing faulty drives under warranty, $10 for superb packaging, $5 for television advertising, and endless other little items, then tax. You know have a $500 drive that supplies the manufacturer a small profit per drive and sells by the million.
When you buy the same speed drive as SCSI, the only differences are the slightly more expensive controller chip and the smaller volume of sales. The small sales volume means the manufacturer has to make $100 profit per drive instead of $5.
Now look at the latest high performance drives. The manufacturer might spend $10 million inventing a minor speed improvement. The same manufacturer might then spend $100 million upgrading a factory to produce the higher speed drives. All that cost has to be recovered in the first year of sales and usually from the SCSI drives sold for servers. The manufacturer adds $500 per drive and hopes to sell 200,000 before their competition releases a similar drive.
If you want the absolute maximum speed in a single drive, you have to buy SCSI drives and put up with the price. RAID lets you bypass the price by sucking the same performance out of cheaper drives.
What Do You Do With Your Disks?
When you edit video streams you need one fast input stream and a second fast stream for output. You could use two SCSI disks and a SCSI controller card or you could buy a six channel ATA RAID card, six ATA disks, and configure the disks as two RAID 5 arrays. The ATA RAID configuration will be about the same speed as the SCSI hardware and have three times the capacity.
How do you get three times the space? The largest easy to buy SCSI disk stores 73 Gigabytes. The equivalent ATA disk is available as 80 GB. For very little extra money you can buy a 120 GB ATA drive. 160 GB and 250 GB ATA drives are available but are still at a premium price so this comparison will use the 120 GB.
Three 120 GB disks set up as RAID 5 will give you 240 GB of storage. Compare that to one 73 GB SCSI disk for the same price.
When you look at sheer speed, the fastest SCSI disks rotate at 15,000 rpm while the common fast ATA disks rotate at half that speed, 7,200 rpm. When you put together three ATA disks in RAID 5, you get the equivalent to twice the rotation speed. Three 7,200 rpm ATA drives become the equivalent of a single 15,000 rpm drive.
Pushing six disks in to your computer will overload the space and cooling unless you buy a full height or double width case with extra fans. The six drives will also overload standard power supplies so you need to jump up from a 300 watt power supply to perhaps 500 watts. Drives use extra power when starting up so choose a power supply with plenty of power.
Open magazine published a comparison of ATA and SCSI then drew what I consider are flawed conclusions. The magazine used different configurations between ATA and SCSI. They mixed operating systems, file systems, and all sorts of other things that make their comparisons of little use. Overall they praised ATA RAID as cost effective despite the differences. What they did was mix up a few concepts that will make the article useless for most people buying disks.
Most magazine articles get the topic of ATA RAID wrong because the authors have little experience with the full range of possibilities presented by ATA RAID. Your approach to ATA RAID is different to SCSI RAID just as the best way to use RAID is different to the best way of using the disks available before RAID.
The R in RAID stands for redundant. RAID 0 has no redundancy so should not be called RAID. RAID 1 has redundant disks, works with only 2 disks, but wastes half the disk space on redundancy. RAID 5 works with three or more disks and wastes one disk on redundancy. If you want your computer to keep going when one disk fails, you need RAID 1 or higher.
The A in RAID stands for Array. RAID controllers present the disks as arrays to your computer. Usually the array emulates a SCSI disk. A RAID array controller presenting two lots of three disks would make each of the two arrays look like one SCSI disk. Low cost RAID controllers let you define one or two arrays and that is all you need. Low cost RAID controllers also make you place each disk in a single array while expensive RAID controllers let you split disks across more than one array. I can tell you from experience, using one disk in multiple arrays makes recovery a nightmare.
The expensive controllers let you mix disks of different sizes and use every GB on every disk. The first time you have to rebuild a system, you will kick yourself for using a disk in more than one array just to get a few extra GB. A disk failure means rebuilding an array and your whole system slows down or stops for the whole rebuild. If the broken disk is part of all your arrays, you have to rebuild all your arrays.
Your set of six disks in a single 600 GB array means rebuilding a 600 GB array. If you split the six disks in to two arrays of three disks, your rebuild will have to process only 240 GB of data to rebuild an array after a single disk failure. Rebuild time is a good reason to use several small arrays.
Another reason for small arrays is the cost of a backup tape drive. Look at ways to back up 600 GB then look at ways to back up 240 GB. Think about the time it takes to restore data from tape to disk, something which is often several times slower than creating the backup tape. A single huge array can be a real disadvantage.
I is for inexpensive. There is no point using RAID if you then have to pay top dollar for the disks. Look for RAID with several cheap ATA disks instead of a small number of premium price SCSI disks.
The D in RAID stands for Disks. RAID needs more than one disk to work. Notebook computers have just one disk which means you cannot run RAID. There are a very small number of heavy notebooks where you can replace the CD drive with a second disk or a second battery. That lets you use RAID but the second disk uses up so much power, you will need the second battery. Where does the second battery go if the second slot contains the second disk? I hope future notebook computers will get better batteries and two disks so we can use RAID on the road.
Use Same Size Disks
RAID creates an array based on the size of the smallest disk. Make all the disks the same size. If you mix an 80 GB disk with a 120 GB disk, RAID will work as if you have two disks each of 80 GB.
A previous example used three disks in a RAID 5 array. Three 120 GB disks make a RAID 5 array of 240 GB. When you have six 120 GB disks, you can set them up as two arrays of 240 GB or one giant RAID 5 array of 600 GB. You get 600 GB in a single array because only one of the six disks is wasted on redundancy.
Dozens of Considerations
There are dozens of other considerations when using RAID. They apply to both ATA RAID and SCSI RAID. I will expand on this article from time to time so the technically minded can explore the subject further. Send me your questions using the form below.
One question not answered in the Open magazine article was the speed of the RAID array controllers used in their tests. The controller is a computer sitting between you and your data. If the controller is too slow, the controller will limit your access to your data. Few RAID array manufacturers supply the information needed to decide if their product is fast enough for a specific configuration. I invite manufacturers to send me RAID controllers so I can test them and recommend them for the most appropriate configurations.
RAID 1 uses disk mirroring, a process that requires little processing power. I am typing this page on a computer using the world's cheapest Promise brand ATA RAID controller. The two disks are configured as RAID 1 and they provide read access exactly twice as fast as a single disk. The RAID controller does not limit their speed.
The Open magazine article reported slow ATA RAID response when used in RAID 5. RAID 5 requires extra computing power. Their slow RAID 5 result might have been caused by the RAID controller failing to perform the calculations as fast as the disks could supply data. My cheap RAID controller does not have to perform the calculations so would perform at the same speed as the most expensive RAID controller if both were set up with the same disks and configured for RAID 1. The most expensive RAID controller might be faster than my cheap RAID controller if the same disks were used in a RAID 5 array.
Modern disks supply data around 40 Megabytes per second. Your computer's PCI interface accepts data at 133 MBps. If you put together four disks in a single RAID 5 array, your computer would receive data at 120 MBps, near the 133 MBps limit. Five disks in a RAID 5 array would supply data at 160 MBps, which is a waste in normal computers. Our previous example of six disks in one big array would try to push 200 MBps through the 133 MBps PCI bottleneck. What do you do when you want to go faster?
Some computers are built on a motherboard with PCI 64. PCI 64 is twice as fast as the standard PCI. Use a PCI 64 RAID controller in a PCI 64 motherboard if you want maximum performance.
Some computers have motherboards with two sets of PCI. You can then use two RAID controllers with two arrays of for or five disks each to get maximum speed.
Disks and RAID controllers have memory to store data from disk during the quiet times so you will have faster access during busy times. When you request the first block of data from disk, the disk can deliver the first block of data then read the second block of data in to it's RAM cache before you ask for the second block. You will get faster performance from disks if they have a couple of Megabytes of RAM installed but do not pay a premium price for large disk caches. Save the money and spend the money on extra main memory for your computer. The best place for memory is near you processor, not the disk.
RAID controllers with 64 MB of RAM are great for accessing disk arrays of 60 GB. The optimal memory for a RAID controller is between 1% and 0.1% of the array size. Memory is better utilised if taken out of the RAID controller and placed next to the computer's central processor. In practice, a desktop workstation has just one RAID controller and the memory in the RAID controller performs the same work as the main memory on the motherboard. In servers with several RAID controllers, buy the minimum memory for the RAID controllers and spend the money on the main processor memory.
Modern disks and RAID controllers transfer data at 100 MBps or 133 MBps across the ATA cable from the disk to controller. You need a special cable to get that speed so make sure your new RAID controller comes with matching cables.
You will only get that speed if there is one disk per cable. My cheap ATA RAID card suggested using four disks connected by two cables to the one controller. I placed two disks across two cables for maximum speed. Why worry about two disks on one cable if the disk can only supply 40 MBps and the cable can transfer 100 MBps? The disk can supply 40 MBps when continuously transferring large files. If your disk has a 2 MB cache, then a large file is anything larger than 2 MB. When you look at smaller files, the disk can preread the file in to the disk cache at 40 MBps then transfer from the cache to the controller at 100 MBps, a speed that would swamp a cable if there were two disks on the same cable.
Disk cache tends to work best for frequently read small files when the disk is the only disk on the cable. A good operating system will cache frequently read small files in memory so the effect of disk cache is reduced. When you limit your RAID controller to one disk per cable, you might get the benefit only a few percent of the time but that few percent will be when your computer is most busy.
The advantage of one disk per cable is so great for busy computers that the next generation of ATA cable allows only one disk per cable. The current ATA 100 and ATA 133 wide flat cables are replaced by SATA (Serial ATA). SATA cables look similar to USB cables and work in a similar way to USB 2.0. The SATA cables work over a short distance at 150 MBps. The next generation of SATA will work at 300 MBps. SATA is designed for connections inside a case.
Among other advantages, the thin SATA cable will not block air flow. Current ATA and SCSI cables block air flow within computer cases and the result is often a disk which fails due to heat. SATA will reduce the air circulation problem.
If you have two disks in a RAID 1 array, heat is probably not a problem. If you have six disks on six cables then heat is a problem and the six cables will block air circulation. Adopt SATA quickly so we can scrap the ancient cables blocking the cool air within our computers.
There is one other factor you might consider when looking at RAID. Some operating systems have RAID software built in. NT Server and Linux have RAID available. Solaris has RAID available as an optional extra. RAID software can use up to 30% of your processing power. Which is cheaper for you, buying a cheap RAID controller card or buying a processor that is 30% faster?
Usually you cannot upgrade an old processor without buying a new motherboard. If you are buying a new motherboard, you can often get the motherboard with built in hardware RAID for a few extra dollars. Take the hardware solution. If you do not like the hardware RAID, you can always turn it off and use a software version.
Note that some hardware RAID chips still perform a large slice of the work in the software driver, which means you are still using up your main processor. There are also hardware assisted RAID controllers where most of the work is performed in the controller and only a small amount remains in your main processor.
What Do you Do When the RAID Card Fails?
One more item you need to check. RAID controller cards do fail. Can you access your disk without the controller? When I remove my cheap RAID controller card, I can plug either of the two disks in as a working single disk. I can do that because I use RAID 1 and the RAID controller maintains a one to one relationship between the physical disk and the array image loaded on to the disk. You cannot remove the RAID card with RAID 5 as the underlying disks are useless by themselves. With RAID 1 and a good RAID controller, you can remove the RAID and continue working. With RAID 5, you always need a RAID card.
There are some RAID cards where the individual disks from a RAID 1 array will not work by themselves. I have had this problem with Intel and Silicon Image RAID but not with Promise.
The average desktop PC is configured for four disks and CD drives. Low cost RAID cards have two ATA cables, two SATA cables, or four SATA cables which limits your configuration options. SCSI lets you use many more disks but your PCs power supply will not support that many disks. ATA and SATA disks are so much larger than SCSI disks that a few SATA disks gives you a disk array far larger than you could build in the same computer if you used SCSI.
There is no RAID for one disk so back up frequently in case the disk fails.
Use RAID 1 for reliability and reads that are twice as fast. Disks are so cheap that consuming a whole disk for reliability is cheap compared to the stress of recovering from a failure.
You could place three disks in a single RAID 5 array for speed and for reliability but you would have to back up the whole array. You could also place two disks in a RAID 1 array and use the third disk as work space. Place only temporary files on the third disk. The RAID 1 approach is more flexible but you lose speed on the work files. The RAID 5 approach is faster overall and you could divide the array up using partitions so that you have to backup only a smaller space.
Use two disks in RAID 1 for reliability and reads that are twice as fast. Use the other two disks in a separate RAID 1 array for fast work files. You then only have to back up the first array. Placing all four disks in a single RAID 5 array is easier and slightly faster but you have to back up everything.
I use four disks and have lots of reference material that I could reload from CD or download from the Internet. I placed all the reference material on one disk that does not have to be backed up. I also have one disk reserved for work files including collections of files assembled for placement on DVDs. That leaves just two disks in a RAID 1 array and most of that array is empty.
Five disks is the largest number of disks that start up reliably in the average medium size PC. Modern disks use less power but modern processors use more power which means you could need a larger power supply with this many disks. Use two disks in a RAID 1 array for the system disk. Use three disks in a RAID 5 array for all your data and work files.
Hardware RAID is better than software RAID and is extremely cheap when you use ATA disks. When SATA switches to 300 MBps, all RAID will use SATA disks.