SPDCA: RAID Arrays

So, I built a new computer. Records show I started writing an upgrade log, detailing for whoever cares my learnings/progress in getting a new PC built. However, from the looks of it, I abandoned it to play Diablo 3. Because I had a new computer. And it could play games. And then I decided I needed to get a new phone. It's like I don't own my technology, but rather, it owns me.

Anyhow. As is always the case with building a PC, there are always things. I say things, and anybody who has had to slap a computer together from component parts will have an idea what I'm talking about. No matter how well worn the path is that you are following, how popular or standardized your methods, there will always be something that causes you to sink an incredible amount of time. It's the nature of the game, and in some cases, the journey is more rewarding than the destination, but more often than not, your things will be annoying as shit.

I'll spare you the details of my first two weeks with the computer. Describing the tweaks one does to a workstation to make it "right" is like telling somebody about an incredibly affecting, emotional dream you had, the kind you wake up from a changed man/woman, but in reality and in its retelling it is actually boring as hell.

I maintain, however, that mine has a sort of cautionary tale element to it. Sort of like the Rime of the Ancient Mariner... but about drive redundancy.

Anyhow. One of my things was getting a RAID mirror set up on my computer. I have an SSD for my boot drive, and two HDDs I wanted to serve in a 2-disk drive mirror for performance and having some redundancy in the event one disk goes before the other (I have backups aside from this, but in this instance, my redundancy is about reducing down-time).

Initially, setting up my RAID array, I went through my motherboard's built-in hardware RAID setup. This setup is ideal, as the operating system does not have to care about the RAID setup, the maintenance of the array, or really anything. The RAID just works at a low level on the system, and I can potentially have multiple operating systems have partitions on the RAID array and everyone just plays nicely.

This, however, failed miserably. Absolutely miserably. No matter what I would do, whenever I enabled the RAID driver Windows would refuse to boot. I would disable the driver, and it would boot back up happily. I thought to myself: Well, damn. I have a buggy RAID controller in my motherboard. Buuut, everything else seems to be working okay... I'll try out a software RAID configuration.

In a software RAID configuration, the operating system controls the disks according to its own rules, sort of "manually" maintaining the RAID array by reading/writing to both disks of its own accord (and not relying on a lower-level function to do it for the operating system).

Trying that, I found out that Windows 7 Home Premium does not support this kind of RAID, and that to support it I would need to upgrade to a version of Windows that cost twice as much. Fuck that, said I, and I tried to go back to getting the motherboard hardware RAID working.

A few dozen iterations of knob-twiddling and subsequent frustration, I was ready to give up. No amount of BIOS, firmware, or driver upgrades would get Windows to boot into the proper mode.

However, magic happened. Much like you are guaranteed that things will happen during PC building, something magic happens. This is when, despite all of your efforts at troubleshooting, something magically starts working, and you have no idea why. For someone who operates on the assumption of a knowable universe, this kind ofmagic is infuriating. And also worrying, for the reason that eventually, all magic eventually ends.

The magic was that the RAID array started to work. Magically. My RAID array appeared as one drive, and I could hear the clicking of two disks, not just one. I thought to myself, Maybe I didn't give it enough time? After all, RAID arrays take some time to build. Maybe the underlying hardware wasn't "ready," and just didn't know how to tell me.

Then two months passed. I got complacent. I started doing real work, where it mattered whether I lost what was on the drives. I did the thing I should not have done. I relied on the magic.

And tonight, the magic ended. I was playing Borderlands, and Windows reared its ugly head, minimizing everything I was shooting at. Windows claimed "You no longer have a genuine copy of Windows! Please buy a genuine copy of Windows!" To that I also said Fuck that, given that I can see my legitimate registration key sitting on shelf. I investigated. Then investigated some more. It's been a number of years since I ran Windows at home, after all. There's bound to be odd things I haven't seen since Windows XP. Plus, I know that Windows ties all sorts of things in its "registration" process to the underlying hardware, and between recent driver and Windows updates, maybe something had changed in its consideration of that. Either way, I decided I would attempt a restart.

After restarting, my RAID array came up as two drives. This is bad. This is very bad. Once a disk goes into a RAID array, it should not come out unless it has failed. If a drive that was previously RAID comes out of the RAID, and an operating system like Windows sees it, and mounts it, you have basically destroyed your RAID (because you can almost guarantee that Windows did two slightly different things to both drives, making them no longer identical mirrors of each other).

I investigated. I investigated some more. I restarted a bunch. I started going down the original hardware RAID path I did back in September, and ended up at the same fail-booting loop I was in before. I was distraught.

My saving grace was that I turned off the "Automatically Restart After Fail" option. You can do this by holding F8 before you get to the "Windows loading..." screens. This means that even if you Blue-Screen-Of-Death, the machine won't immediately restart, clearing the message on the screen. The message that I saw was about as helpful as any I'd seen: a bunch of hexadecimal codes that mean nothing to a human being. I happened to take a picture of that screen with my phone, and moved on.

I started getting desperate. I started researching to see if there was some known problem with my motherboard. Or if there were more updated version available from the vendors (looking at Intel, rather than MSI, in hopes that an update to a BIOS/firmware/driver might help).

In my desperation, I went here (www.intel.com). And while I normally don't pay a whole lot of attention to the admonitions on driver download pages (as they are mostly the vendors divulging all legal liability for their product), I happened to notice a familiar hex code: 0x0000007b

Don't ask me why I recognized that. It doesn't mean anything to me. It's like any other number. But it stuck in my mental craw, and forced me to read what it said:

If your RAID controller is not enabled, enabling the RAID controller is not recommended or supported when a SATA hard drive is the boot drive. Enabling the RAID controller might cause an immediate blue screen with the error code, 0x0000007b, followed by a reboot. To enable RAID, reinstall the operating system.

A memory stirred again. There was something strange I found with my motherboard. My motherboard has a strange design to it, in that almost every human-configurable component has a redundancy to it. For instance, it has two onboard BIOS memories that you can switch between at will. They do this, because the more you fuck with something, the more likely you will wedge something. In this case, it is certainly nice to be able to restore to a previous known good state rather than having to start over from the beginning.

I had noticed when initially putting my stuff together that my motherboad had not one but two SATA controllers (this is the hardware that interacts directly with your hard drives, and can do interesting things with them, like RAID mirroring). For the SATA ports 1-6, they go to an onboard Intel controller. For the SATA ports 7-8, they go to an ASMedia controller. At the time, I didn't understand the reason for this. Maybe if a controller failed, the other one could be used? No, because they have different numbers of ports, and you wouldn't be able to boot with the same configuration... (shrugs)

It was something I had initially discarded as one of the board's redundancies. But, reading that information from Intel, now I see why. Intel's controller, if configured for RAID, will completely shit the bed if you end up trying to boot your computer from a SATA slot that is not configured in a RAID array (but is still connected to your Intel controller).

Because I boot from an SSD, and because that SSD is not part of any RAID array, the Intel controller threw an unrecoverable hardware error whenever Windows would try to load the RAID driver. Why does the Intel controller do this? Why would they not just provide a nice error message when this happens? Or at least a warning when you try? I don't know.

Figuring this out, I switched my SSD over to SATA port 7, running it through the ASMedia SATA controller. I booted it up, and everything was just fine. My drives, still incredibly wedged in terms of RAID mirroring, at least still had their data on them. Even though Intel sort of screwed me when it came to underlying hardware configuration, they did provide a nice tool to migrate existing data from a known drive at he same time you were creating a RAID mirror from it. Currently, it is chugging away, and is supposedly at 7% completion to rebuild itself.

So, my question now is: what magic has been going on these last 2 months? If both drives had come up, but the first drive looked completely different than the second, I would guess that I was simply operating outside of RAID and I didn't notice. However, both drives were within ~5MB usage, which meant that they were nearly identical at the time they came out of the magical RAID.

This is something I will probably never know. It might bug me a little, and for a little while. But now I can sleep better, knowing that magic died a little tonight, and that the discreet, knowable universe inched a little bit forward.

#6858, posted at 2014-12-15 04:14:37 in Cognitive Surplus