Hybrid HDD + SSD RAID1

What is hybrid RAID1?

In general, hybrid RAID1 is RAID1 that mirrors data on two different storage technologies. Here we are talking about a HDD and an SSD. (Or more if you want more than 2-way RAID1. Why would you want, e.g. 3-way RAID1? Simple: If one disk fails, you still have redundancy, same reason as for using RAID6. And that one disk will fail sooner or later.)

Why do it?

HDDs and SSDs have different characteristics. With a hybrid slution, you get some of the advantages of both. Let me brifly state the main characteristics of HDDs and SSDs. (+) is something positive, (o) is neutral and (-) is a disadvantage.

HDD

(+) Proven technology, characteristics well understood
(+) Cheap
(+) Unlimited overwrites
(+) Small sectors
(o) Reasonable reliablity, good data endurance
(o) Reasonable linear access speed
(-) Sensitive to mechanical shock and vibration
(-) High access latency

SSD

(-) Unproven technology
(-) Expensive
(-) Limited overwrites (wear-leveling helps)
(-) Large sectors (100kB and larger)
(-) Unknown reliability, unknown data endurance (but lower than HDDs)
(+) Good linear speed
(+) Not sensitive to mechanical shock and vibration
(+) Very low access latency

Why are small sectors an advantage? Simple: Better small-file performance and you cannot corrupt other files when writing to one file. With large sectors, you can case corruption to files you did not write, because the large physical sector may be shared with a file you did write.

Why do I classify reliability and especially data endurance as unknown? Simple: SSDs with MLC (Multi-Level Cells) and wear-leveling have not been long enough on the market to know. Do not trust vendor marketing material. If you look at what they actually promise, then you will notice that they give very little hard assurances. Especially data endurance will be low. The best professional FLASH media have 10 years, while magnetic media can reach 50 years. In a running system, you can increase endurance by regular complete reads, see the section about maintenance below.

Anyways, what you get if you combine an SSD and a HDD in a RAID1 is that you can get all the reads from the SSD with SSD speed and at the same time have the redundancy of RAID1 without the need to buy a second, expensive SSD. Writes will still be at HDD speed, but writes are typically a lot rarer than reads. In addition, writes can be buffered, while reads cannot.

The second thing you get is that you have two different storage technologies and chances are good that things that kill one will let the other intact. For example, heat is much more likely to kill a HDD. A massive amount of small writes is much more likely to kill an SSD.

How to do it with Linux Software RAID

The trick is to create the RAID1 array and set the HDD(s) during creation as "write-mostly". This will cause the kernel to only do (slow) reads from the HDD if they are really needed. All other reads will go to the SSD. This option was originally added when mirroring over a slow network interface, but performs equally well to concentrate reads on an SSD.

Here is how to do it. Let us assume you want to RAID1 a HDD partition sdb6 and an SSD partition sdc6 as md1. (Substitute full disk dev if needed. You can mix partitions and full disks.) The respective call to mdadm would be as follows:

mdadm --metadata=0.90 --create -n 2 -l 1 /dev/md1 /dev/sdc6 -W /dev/sdb6

The same for a 3 disk RAID1 would look like this (sda6 is another HDD partition):

mdadm --metadata=0.90 --create -n 3 -l 1 /dev/md1 /dev/sdc6 -W /dev/sdb6 -W /dev/sda6

Note that I specify the old 0.90 superblock format. The reason is that the "new" formats are broken as they do not offer kernel-level autodetection. RAID array assemply is the job of the RAID controller, and that is the kernel. Why anybody though it would be acceptable to require some userspace-script do it is beyond me. There are other problems with the "improved" formats, and unless you need them because the offer more disks per array, I recommend to stay away.

After you have created the array, a subsequent check in /proc/mdstat should show a "(W)" after the HDD components. Here is an example from my set-up, with sdb6 and sdc6 HDD partitions and sdd1 an SSD partition (that is a triple-RAID1):

cat /proc/mdstat 
Personalities : [linear] [raid1] [raid6] [raid5] [raid4] 
...
md6 : active raid1 sdc6[0](W) sdb6[1](W) sdd1[2]
      62508800 blocks [3/3] [UUU]
...

Observed read speeds are the same as for an SSD alone. The write speeds are comparable to a HDD-only RAID1, but only after the filesystem runs out of memory to buffer the writes. Since writes can often been buffered, overall a hybrid array is a lot faster than one with only HDDs.

How to do it for an existing RAID1

You can enable "write-mostly" for a RAID component in the following way:

echo writemostly >  /sys/block/md6/md/dev-sdc6/state

and disable it this way:

echo -writemostly >  /sys/block/md6/md/dev-sdc6/state

If for some reason you cannot set a component of a RAID1 to "write-mostly", you can kick it from the array and re-add it with the write-mostly flag active. This will temporarily lower your redundancy level. Backup before doing this is recomended.

To set /dev/sdc6 from the last example to "write-mostly" would work as follows:

To kick, first set it to "faulty":

mdadm --fail /dev/md6 /dev/sdc6

Then kick it:

mdadm --remove /dev/md6 /dev/sdc6

Then add it again:

mdadm --add /dev/md6 --write-mostly /dev/sdc6

Wait for the RAID1 resync to complete, and /dev/sdc6 will now only be read when needed.

Maintenance

There are two aspects to storage maintenance with RAID: RAID maintenance and storage device maintenance.

Both have to goal to detect problems early when there is still a chance to correct them and to notify you in time when it looks like manual intervention is needed. Still, keep in mind that RAID is not backup. It only covers some of the areas a backup covers, but not all. For example, user error and malware problems are not covered by RAID. Your computer being hit by lightening is also not covered. You do need the backup in addition. What RAID gives you is that the probability of needing that backup is lower, hence the process for restoring from backup can take higher effort, which makes it cheaper. Or you just have the hassel far less often.

RAID consistency checks

I recommend running a RAID consistency check every 7 - 15 days. The way to run it is a bit obscure. Basically, you read "/sys/block/mdx/md/mismatch_cnt" (substitute your md device for "mdx") before to make sure it is zero. Then you put the string "check" into "/sys/block/mdx/md/sync_action" (replace "mdx" as before) and wait for it to not give "check" anymore. Then you read the mismatch count again and make sure it is zero.

Here is a Python script I wrote that does this md_check.py, just adjust the configured device at the start and use if from cron like this:

# check array with SDD 
33 6 * * * /root/sys_tools/mdadm/md_check.py

This script can be used for other RAID arrays as well, not just for RAID1 or hybrid arrays. I run it two times a month from cron for each of my RAID arrays (I currently have 8), whith a maximum of one check per day.

Make sure cron can send email to you or you will not be notified in case of errors. That would make the check basically worthless.

It is also possible that your distribution already does this check automatically. Debian does so, but only once a month and with a pretty convoluted script that only works sometimes. It also seems to be missing any meaningful reporting, which makes the check worthless. From my experience, the only reporting that works are checks that send email to an address that is read regularly. For faster alerting, use a mailbox system that notifies you via text message or send the email to your mobile phone in the first place. Forget about anything else, it just it just does not work. Email is the base mechanism to use. And it has the added advantage that you can either send it directly or put a message to stdout when called from cron and cron will send the email for you. That is also one reason any real sysadmin makes reliable email sending a top priority.

SMART selftests

The second thing that should be done regularly is a full device read test. I do it every 14 days. For HDDs, you can run a long SMART selftest, e.g. with smartd or manually from cron as well. Make sure you have smartd configured and working to catch errors! smartd also needs to be able to send email to you, otherwise all monitoring is basically worthless.

For SSDs, the problem is that not all support SMART or long SMART selftests. If yours does, do the same as for the HDDs. Otherwise hope that read errors will show up during the RAID consistency checks. The RAID consistency check will read all component devices in full, but will not notice if sectors are slow to read or extensive error correction was needed. SMART attributes will show that and smartd will notice and notify you.

Arno Wagner