The Digital Rag
Real World Information in a Virtual World
Sign Up!
Login
Welcome to The Digital Rag
Wednesday, February 08 2012 @ 04:56 AM PST

A Tail of Woe - Playing Hardware God and Losing Data

System Administration Tidbits

The bottom line was that I could only get 4 drives working on the new workstation, and I needed more than that would give me. So I re-purposed the internal drives as a RAID0 array (striped, no redundancy but FAST!) for video processing, and put the other drives plus some more into a spare box I had, turning it into a network file server.

Using Linux to create a NFS box is quick and easy. There are many off-the-shelf units that, though they don't tell you, are exactly that; Linux plus a bunch of hard drives with the standard Linux RAID software making them work. The NFS (called NFS1 in my network) box got a copy of my home directory and major files from PACDAT in late December last year, and happily took up the task of storing both my and my wife's files. I left the originals on PACDAT while I worked at moving other tasks from it in preparation for shutting it down and re-purposing it to something else.

I started working with my new workstation, pulling in video from the archives and starting to edit it down to pieces we could use on the web and sell - but I ran into more problems with it. It simply turned out to be completely unreliable. I'd be working away and poof! the system would log me out and present me with the login screen, closing and stopping all the various tasks I had been working on, some of them things that take days to finish.

I tried all manner of changes - BIOS tuning, re-seating the hardware, various incantations and potions suggested by others on the net, all to no avail. I got the system to the point where it would last for days and even weeks at a time - but just when I'd think things had finally sorted themselves out, it would reboot or lock up - sometimes when I was not even working at it.

I'd just about gotten to the point where I would throw in the towel and put something else together - or even go back to my older (and very stable) machine for the time being when we had a crashing summer lightning storm; about two weeks ago now.

The Gods threw lightning bolts all around us - but none were all that close. I got up and watched - and counted (7 seconds = 1 mile) and none came within a mile. The problem is, our electrical system carries some jolts for more than a mile, and it seems one such jolt hit our house and specifically my NFS server.

Now the NFS server was on a UPS - an older one with new battery in it. I'd had it on a newer one in the basement for most of its short life, but Shirley's workstation was a lot older and slower than even my old PACDAT, and the NFS box had a hyper-thread capable CPU and 2 Gigs of RAM (compared to the old P4 2GHz 768 Megs of her workstation) so I moved the NFS machine up by her desk and plugged it into the old UPS, vowing to get a new one next time I was near a store. Getting her up and running was trivial since her home directory was already on the machine anyway and the mother board's built-in video card was more than capable of doing what she needed for e-mail and word processing.

Warning #1 - Older style UPS (uninterruptible power supplies) don't give much if any protection from power surges

But the UPS was one that had a relay in it - a relay that kicked the batteries in when the power went out, but otherwise simply passed line current through with no filtering. Newer UPS units run the load (computer) from the battery all the time, and provide charging current to keep the battery charged when the power is on - there is no time delay if the power goes out, and the charge circuit provides very good line filtering.

The result of this was that at some point in the night my NFS system got some sort of surge that cause two of the 8 drives in it to stop working correctly. They didn't fail altogether, at least not immediately.

The first indication I had that something was wrong was that reading files was V E R Y slow - a snail's pace compared to normal. The screens took a long time to come up from sleep first, then moving from one virtual desktop to another took longer than normal. At first I thought it was something with my workstation, not the file server. I spent a fruitless hour or so investigating that.

Finally I looked at the NFS server - and discovered by using some drive testing utilities (hdparm) that two of the drives had dropped down to the lowest setting for "UDMA" - the two on one drive controller.

I swapped out the drive controller for a spare - no better

I started copying files as fast as I could to another system (the workstation, it had lots of space, even if it was not "redundant") but was only a small way into them when the NFS system simply died.

OK - hmmm. probably the power supply but I don't have a spare that is large enough, so off to the store to pick up one.

New power supply didn't help - it appears the damage included the mother board or something on it - leave that for another day and go back to the store.

On the way there I got to thinking that this might be the opportunity I needed to create a new (and hopefully reliable) workstation to replace the old-new one that was still not reliable. For not too much more than a basic motherboard/cpu/ram setup I'd be able to shuffle the old machines down a notch and get my work back in order.

I should explain that I have 5 monitors on my desk - 4 on the main workstation and 1 on the old one, and that I really wanted all 5 (and maybe eventually more) on the one workstation. I've written about my asperations with my previous "new" workstation (called VIDEO by the way) in Trees hate computers... and why I have so many monitors - but it comes down to the fact that I'm far more efficient with lots of screen real estate.

For various reasons I decided I wanted a machine that would use ATI video cards, and since I'd seen a board with 4 PCI-X slots on it one time I'd visited the store, I though I'd want something similar just in case I decided I wanted more than 6 screens (The cards I'd been looking at can handle 2 XVGA screens each)

An ASUS mother board with ATI glue chips - and a AMD quad-core 64 CPU plus 4 Gigs of RAM plus 3 ATI video cards, and a pair of 1 Terabyte drives - hey, it's only money.

Back home, put the system together and try to re-boot from the NFS drive system. No problem booting - Linux kernel booted on the AMD 64 even though the previous CPU was 32 bit. But the RAID array would not come up - and nothing I did would help, even trying to force it and have it not "rebuild/resync" - there was enough damage on the two drives that the file system was toast.

OK - install the pair of Seagate Terabytes and throw out the two bad drives (ok - pack them up and send them in for warrantee repair/replacement) - set up the remaining ones as a smaller RAID in the new machine and the two Terabytes as a RAID1 for my main files. Install Fedora Core 8. Nine is out but when I went looking for drivers for the video cards they were not yet available.

Everything came up fine - but... with the 5 monitors on there were video artifacts. Again, nothing I could find on the net would fix the problem. I finally deep-sixed the ATI cards in favour of 3 new Nvidia ones and all is well. As I write this the system has been up for 4 days and has not even hiccupped. Here's hoping :)

The only really annoying thing is that I've lost about 6 months worth of "working" files along with a small number of photos I've taken. My e-mail and documents are stored on my laptop and backed up nightly whenever my machine is home.

Warning #2 - RAID (Redundant Array of Inexpensive Disks) does not substitute for doing backups

I didn't do backups of these files - I relied upon my RAID5, expecting to eventually (too late it turned out) put another machine together that would handle all the backing up. My current backup server is too small to add my huge files to those of my customers and laptop that I back up. In fact, most of the files I really miss are not all that large - they're things like the settings and setup for browsers and various programs, cache data, cookies, password files, note files and such - the things that accumulate in the "hidden" files. Other than that I lost some work in progress files - but I mostly have the originals, just not the edits. I've now set up a backup process to get at least these system files.

I also lost my photos. This caused me a bit of anguish but a though hit me - since I had taken only a fairly small number, maybe I could "undelete" the flash cards and recover some, if not all. A little utility called Photorec did exactly what I needed.



3 - The PC may be a fairly mature technology in general but there are enough changes at the hardware level taking place that configuring what you want (or want based on a belief of what is possible) can be frustrating and painful

4 - Open source (Linux) and web search (Google, etc.) combine to make even really esoteric systems doable

Note: work in progress - check back for additions - richard

Resources:

  • http://forums.fedoraforum.org/showthread.php?t=155503&page=1&pp=10 downgrade fc9 X-11 to fc8 version and install fglrx and tools
  •  http://www.cgsecurity.org/ Photorec and testdisk
  •  

What's New

Stories

No new stories

Comments last 2 days

No new comments

Trackbacks last 2 days

No new trackbacks

Older Stories

Thursday 15-Sep


Saturday 10-Sep


Tuesday 30-Aug


Saturday 20-Aug


Thursday 18-Aug


Sunday 14-Aug


Thursday 04-Aug


Tuesday 02-Aug

?

Ads by Clickochet

G+ Public Posts

There was a problem reading this feed (see error.log for details).
?

G+

?

Facebook Page

RSS Feed

Richard's Digital Rag

Poll

How do you like to find out news about the internet and computers?

  •  Newspaper
  •  Radio
  •  TV
  •  Web Search
  •  Favourite Web Site(s)
  •  Pod Cast
  •  Video Online
  •  Email List(s)
  •  RSS - Syndication
  •  Word of mouth
This poll has 0 more questions.
Results
Other polls | 28 votes | 0 comments