Storage Basics Part 4 - or The Dead Parity Sketch

In parts one through three of this series on the basics of storage we have covered off some of the fundamentals of the subject including why storage is important and made some headway into the terminology, in this post we look at some of the acronyms and initialisms.

RAID

RAID, has evolved its meaning over time from Redundant Array of Inexpensive Disks to Redundant Array of Independent Disks, but its acceptable either way. RAID takes a few forms (known as levels) with the intention to provide virtualization of the underlying storage hardware to increase performance and protection for the data residing on a presentable logical storage device. The levels include

RAID-0 (striping), where the data is spread across multiple disks in uniform stripes to increase performance
RAID-1 (mirroring) where the same data resides on multiple disks and can be used to increase read performance and provide redundancy for disk failure
RAID-2, 3 and 4 which deal with disk head synchronization and single disk parity (and we don’t hear of these so much now)
RAID-5 remains popular as it provides a level of redundancy and performance through the distribution of parity across a set of disks allowing for a failure of a disk in a group
RAID-6 which allows for 2 disk failures in a group by doubling up on the parity

For the mathematically minded folk out there most of the parity calculations are done using the XOR operator on the data (“This is an XOR Parity” – come on, some of you love this Python sketch, no, just me then) additionally RAID-6 introduces further parity calculations which depending on the implementation approach can impact write performance considerably. Some of these levels can be combined, such as RAID 10 (a striped mirror) and RAID 0+1 (a mirrored stripe) - homework task to understand the difference - if you are interested in knowing a bit more, there are some good descriptions on the Wikipedia page.

IOPS

IOPS means ‘Input/Output Per Second’, longhand it’s the number of operations per second that a device (individually or as a logical aggregate) can perform. This is one of a number of measurement metrics that can be combined to understand the relative performance of an element of a storage subsystem, again on its own it can be used unwisely to overstate, or competitively understate the performance of a storage system. See Disraeli’s lies, damn lies and statistics.

SSD (or Flash)

Solid State Disk - very high performance storage, tends to be lower capacity, likely to see convergence point in next couple of years on cost per GB with traditional disks in consumer space, which is currently high, but cost per IOP can be attractive for appropriate workloads. No moving parts, extremely rapid data access, lower capacity than traditional disks, higher cost per GB. Sweet-spot for random IO workloads.

Interfaces

Interestingly one of areas that doesn’t make things easy to understand for the newcomer is the way that connectivity interfaces define the class of storage in the industry, not always accurately – in the same way that if you turn up for a party in a Ferrari, it doesn’t mean you can dance. It’s all about making sure that you know the limitations (or excesses) and you cater for them in your design.

FC and SAS*

Fiber Channel and Serial Attached SCSI - these are enterprise class interfaces, supporting a wide variety of performance enhancing techniques and instrumentation, they are still associated mostly with traditional spinning disk, providing high reliability, intelligent features to enhance enterprise performance.

SATA

Serial Advanced Technology Attachment - lower end interface, tend to be capacity centric drives, lower performance and price point. However, don’t assume that this is the wrong choice just because more expensive solutions are out there - look at your application stack and determine how data will be de-staged to the storage.

NL-SAS

Near Line - Serial Attached SCSI - bit of a mix, high-end interface giving some enterprise features with a high-capacity, lower performance drive. This allows capacity centric drives to coexist in the same storage array as higher performance devices. Sounds good, but does it always make sense? We will come back to the dynamics of charging models in a later post, where just like any business there are fixed and variable costs.

Previously…

Next Time…

The focus moves to the group of 3 letter acronyms that tell us how we are going to move data to-and-from our storage

*(SAS a double nested acronym no less, 5 points to anyone who can send me any IT related triple-nested-acronyms – as if this subject is not complicated enough for the beginner!)

Don’t forget to comment, get involved and get in touch using the boxes below and reach out direct @glennaugustus