Want more geeky details on what happened at Microsoft/Danger? The short of it is that the SAN took a nose-dive and took out the drives that could have repaired the data with it. A total of 800TB of data was lost. There was an off-site tape backup, a reasonable backup measure, but with the way the SAN died the entire RAID array needed rebuilding and 800TB of data is a lot of data to rebuild. There are more details on server moves in the info below.
The following is reportedly from “someone close to the action”:
“Here’s the actual scoop, from someone involved in the recovery:
Danger, purchased by Microsoft, was moved into a Verizon Business datacenter in Kent, WA a short while ago. While this had to do with the MS assimilation, it was done as a one for one move from Danger to a DC that MS uses heavily. (MS didn’t re-write, port, migrate to winblows, etc.) The backend service uses a variety of hardware, load balancers, firewalls, web and application servers, and an EMC SAN (Storage Area Network, think huge drive array connected with fiber.)
Well last Tuesday, the EMC SAN took a dump on itself. What I mean by that is the backplane let the magic blue smoke out. While usually in the heavy iron class of datacenter products like an EMC SAN this means you fail over to the redundant backplane and life continues on. Not this time folks. In the process of dying, it took out the parity drives. What does that mean? It means the fancy RAID lost it’s ability to actually be a RAID. How much data got eaten by this mega-oops? 800TB. Why wasn’t it backed up? It was, to offsite tape, like it’s supposed to. But when the array is toast, can’t just start copying shit back.
Apparently EMC has been on site since Tuesday, but didn’t actually inform Danger/MS that their data is in the crapper until Friday afternoon. On top of that, EMC has done nothing to bring in replacement equipment between Tuesday and Friday. (In the Enterprise support world, that’s fucking retarded, multi-million dollar support contracts are that expensive for a reason.)
So what’s being done? Well the good news is that the complex was slated to be migrated into the Verizon Business cloud services (not MS’s cloud per se, but it’s MS’s effort.) And as a part of that migration a newer shinier SAN array was in process of being implemented. But space isn’t ready for it on the datacenter floor, and you can’t just toss the EMC raid and place this one in it’s place, it’s a different vendor and is 2 racks instead of one. This means it’s being shoehorned into a different part of the datacenter than was originally planned, one that doesn’t have the necessary 3 phase power installed. So there’s a bit of work to be done. Not to mention the restoral of 800TB of backup data from offsite tape.
Time to restoral? Looking like Wednesday at the earliest with techs working all weekend.”
Sounds like they know what they’re talking about, but since we haven’t been able to confirm this directly ourselves, we’re keeping it labeled as a rumor.
UPDATE (2009-10-15 01:02 PST): We’ve confirmed that Danger does indeed have servers in a Verizon Business Data Center, however it appears to be one in California, NOT Kent, WA. If you want to confirm, do a traceroute on one of Danger’s web proxies and you’ll find it ends up at danger-gw.customer.alter.net (188.8.131.52), an IP owned by Verizon (MCI) that appears to be around the San Jose/Santa Clara area. It’s possible (although unlikely) that the web proxy servers are kept separate from the user data servers though.