2011/11/10

Surprises reading up on RAID and Disk Storage

Researching the history of Disk Storage and RAID since Patterson et al's 1988 paper has given me some surprises. Strictly personal viewpoint, YMMV.
  1. Aerodynamic drag of (disk) platters is ∝ ω³ r⁵  (RPM^3 * radius^5)
    • If you double the RPM of a drive, spindle drive power consumption is cubed. All that power is put into moving the air, which in a closed system, heats it.
      Ergo, 15K drives run hot!
    • If you halve the size of a platter, spindle drive power consumption is reduced by the fifth-power. This is why 2½ inch drives use under 5W (and can be powered by USB bus).
    There's a 2003 paper by Gurumurthi on using this effect to dynamically vary drive RPM and save power. Same author in 2008 suggests disks benefit from 2 or 4 sets of heads/actuators. Either to increase streaming rate or seek time, or reduce RPM and maintain seek times.

    The Dynamic RPM paper to be the genesis of the current lines of "Green" drives. Western Digital quote RPM as "intellidrive", but class these as 7,200RPM drives. Access time just got harder to predict.

    This is also the reason that 2.5" and 3.5" 15K drives use the same size platters.

  2. In Patterson's 1988 RAID paper. They compare 3 different drives and invented the term "SLED" - Single Large Expensive Disk to describe the IBM mainframe drives of the time.
    NameCapacityDrive
    Size
    Mb per
    Rack Unit
    Platter
    Size
    PowerSpecific
    Power

    IBM 33807.5Gbwhole
    cabinet
    180Mb/RU14"6.6kW0.9 W/Mb

    Fujitsu
    Super Eagle
    600Mb6RU,
    610mm deep
    60Mb/RU10.5"600W1.0 W/Mb

    Conner
    CP-3100
    100Mb4in x 1.63in,
    150-250mm deep
    350Mb/RU3.5"6-10W0.1 W/Mb

    And two smaller surprises, all these drives had 30-50,000 MTBF and the two non-SLED drives were both SCSI, capable of 7 devices per bus.
    8 or 9 3.5" drives could be fitted vertically in 3RU, or horizontally, 4 per RU.
    Because of the SCSI bus 7-device limit, and the need for 'check disks' in RAID, a natural organisation would be 7-active+1-spare in 2RU.
  3. 2½ inch drives aren't all the same thickness! Standard is ~ 70mmx100mm x 7-15mm
    • 9.5mm thick drives are currently 'standard' for laptops (2 platters)
    • 7mm drives, single platter, are used by many netbooks.
    • 7mm can also be form-factor of SSD's
    • "Enterprise" drives can be 12.5mm (½ inch) or 15mm (more common)
    • Upshot is, drives ain't drives. You probably can't put a high-spec Enterprise drive into your laptop.
  4. IBM invented the disk drive (RAMAC) in 1956. 50 platters of 100KB (= 5Mb). Platters loaded singly to read/write station.
    IBM introduced it's last SLED line, the 3390, in 1989. The last version, "Model 9" 34Gb, was introduced in 1993. Last production date not listed by IBM.
    IBM introduced the 9341/9345 Disk Array, a 3390 "compatible", in 1991.
    When Chen, Patterson et al published their follow-up RAID paper in 1994, they'd already spawned a whole industry and caused the demise of the SLED.
    IBM sold its Disk Storage division to Hitachi in 2003 after creating the field and leading it for 4 decades.
  5. RAID-6 was initially named "RAID P+Q" in the 1994 Chen, Patterson et al paper.
    The two parity blocks must be calculated differently to support any two drive failures, they aren't simply two copies of "XOR".
    Coming up with alternate parity schemes, the 'Q', is tricky - they can be computationally intensive.
    Meaning RAID-6 is not only the slowest type of RAID because of extra disk accesses (notionally, 3 physical writes per logical-block update), but it also consumes the most CPU resource.
  6. IBM didn't invent the Compact-Flash format "microdrive", but did lead its development and adoption. The most curious use was the 4Gb microdrive in the Apple iPod mini.
    In 2000, the largest Compact Flash was the 1Gb microdrive.
    By 2006, Hitachi, after acquiring from IBM in 2003, had increased the capacity to 8Gb, its last evolution.
    According to wikipedia, by 2009, development of 1.3", 1" and 0.85" drives was abandoned by all manufacturers.
  7. Leventhal in 2009 pointed out if capacities kept doubling every 2 years, then by 2020 (5 doublings or *32), then RAID would need to adopt triple-parity (and suggests "RAID-7").
    What I found disturbing is that the 1993 RAID-5 and 2009 RAID-6 calculations for the probability of a successful RAID rebuild after a single drive failure is 99.2%.

    I find an almost 1% chance of a RAID rebuild failing rather disturbing.
    No wonder Google invented it's own way of providing Data Protection!
  8. The UER (Unrecoverable Error Reading) quoted for SSD's is "1 sector in 10^15".
    We know that flash memory is organised as blocks, typically 64kB, so how can they only lose a single sector? Or do they really mean "lose 128 sectors every 10^17 reads"?
  9. Disk specs now have"load/unload cycles" quoted (60-600,000).
    Disk platters these days have a plastic unload ramp at the edge of the disk, and the drive will retract the heads there after a period of inactivity.
    Linux servers with domestic SATA drives apparently have a reputation for exceeding load/unload cycles. Cycles are reported by S.M.A.R.T., if you're concerned.
  10. Rebuild times of current RAID sets are 5hours to over 24 hours.
    Part of this is due to the large size of "groups", ~50. In 1988, Patterson et al expected 10-20 drives per group.
    As well, the time-to-scan a single drive has risen from ~100 seconds to ~6,000 seconds.
  11. One of the related problems with disks is archiving data. Drives have a 3-5 year service life.
    A vendor claims to have a writeable DVD-variant with a "1,000 year life".
    They use a carbon-layer (also called "synthetic stone") instead of a dye layer.
    There is also speculation that flash-memory used as 'write-once' might be a good archival medium. Keep those flash drives!
Update 11-Nov-2011:

Something new I learnt last night:
 The 1.8" disk format is very much alive and well.
 they're used in mobile appliances.
 I wonder if we'll see them "move up" into laptops, desktops or servers?
 Already I've seen a 2.5" SSD which is a 1.8" module in a carrier...

Another factoid:
 For the last 2 years, HP has only shipped servers with 2.5" internal
drives.

Apple lead the desktop world twice in this fashion:
  Mac's skipped 5.25" floppies, only ever 3.5".
  Mac removed floppy drives well before PC's.

Does the 'Air' w/o optical drive count too?
The Mac Classic used SCSI devices, which seemed like a very good idea at the time. But not great for consumer-level devices and they've gone to SATA now.
Apple did invent Firewire (IEEE 1394 a.k.a. "iLink"), which took off in the video market, and I believe still support it on most devices. 


Articles


"Triple-Parity RAID and Beyond", ACM Queue
 Adam Leventhal (SUN), December 17, 2009

"Calculating Mean Time To Data Loss (and probability of silent data corruption)"
Jeff Whitehead, Zetta, June 10, 2009

"A Better RAID Strategy for High Capacity Drives in Mainframe Storage"  [PDF],
ORACLE Corporation, Sept 2010.

"Comparison Test: Storage Vendor Drive Rebuild Times and Application Performance Implications"
Dennis Martin,  Feb 18, 2009

"Considerations for RAID-6 Availability and Format/Rebuild Performance on the DS5000" [PDF]
IBM, March 2010.

"Your Useable Capacity May Vary ..."
Chuck Hollis, EMC Corp, August 28, 2008.


"Five ways to control RAID rebuild times" [requires login. Only intro read]
George Crump. July, 2011 ???
 In a recent test we conducted, a RAID 5 array with five 500 GB SATA drives took approximately 24 hours to rebuild. 
 With nine 500 GB drives and almost the exact same data set, it took fewer than eight hours.
"DRPM: Dynamic Speed Control for Power Management in Server Class Disks",  Gurumurthi, Sivasubramaniam,  Kandemir, Franke, 2003, International Symposium on Computer Architecture (ISCA).

"Intra-Disk Parallelism: An Idea Whose Time Has Come", Sankar, Gurumurthi, Mircea R. Stan, ISCA, 2008.

4 comments:

Neil Gunther said...

Very interesting, but your formula in Part 1 is a bit misleading. It's not an expression for "Aerodynamic drag" which is a frictional force.

In fact, that's the 1st factor on the RHS of the expression, viz., the Bernoulli term that goes like v^3. Expressed in terms of angular velocity (ω): v^3 = ω^3 r^3. But that's at any point of the surface of the platter. We have to multiply by 2x the area of the platter (produces an r^2 factor) for an overall factor of r^5 on the RHS.

And since force * v = P (power), the LHS has the dimensions of power: rate at which energy is lost due to rotational aero-drag.

It must also scale with the number of platters (N) on a common spindle.

The correct expression, up to constants of proportionality is therefore:

P ~ N * ω^3 * r^5

steve jenkin said...

I stand corrected. What I didn't say first time around: I used the result that you'd derived from First Principles for me when I asked "Why the Fifth Power?". Many thanks for both the original paper and this correction.

Neil Gunther said...

What's really important is this.

You can't measure mechanical aero-drag on the disk platters, but you can measure electrical power loss as I*V (amps*volts) and that's what makes the formula very useful for "green" capacity planning.

Since one would usually be comparing one type of storage with another, the constants of proportionality become irrelevant b/c they simply divide out in any ratios.

Neil Gunther said...

The exponents are relatively large (nonlinear) and that means:

A. There can be potentially huge wins from a green standpoint. Data center planners can't afford not to pay attention.

B. The disk diameter and RMP values can be read off the respective vendor's data sheet to give a good estimate of power consumption. Everything else is in the noise.

C. Not to mention that the number of spindles in SANs, NAS, JBoDs, RAID, etc., is yet another (linear) multiplier.