A nearly 20-hours-long crash of the FAA’s NOTAM database last week occurred because of a drive failure that took place “in the middle of updating the information on the hard drive,” which in turn “screwed up the database,” Barry Davis, manager of the aeronautical information management for the FAA, told ComputerWorld.com. The box in question was a Sun Microsystems Inc. server, according to the FAA, that was nearing the end of its life expectancy. Its failure put controllers to work disseminating the NOTAM information to pilots. Davis’ team already had replacement equipment on hand, they just hadn’t yet performed the replacement. Because of that, the hardware recovery portion of the fix “was quite simple — we just put the boxes in,” said Davis. Unfortunately, when they did that, they moved a data error over to the backup system, thereby corrupting it and causing the system to run slowly and in a manner that appeared to be deteriorating. In the end, the latest information had to be pulled from the corrupted database, re-imported into the new database and resynchronized with all the subsystems. Davis’ team then put the system back online and stuck around into the evening to make sure there were no more surprises.
The NOTAM Database Crash: What Happened
Key Takeaways:
- A nearly 20-hour FAA NOTAM database outage was caused by a drive failure on an aging server during a data update, which corrupted the database.
- Although replacement hardware was available, the initial recovery attempt failed because the corrupted data was transferred to the backup system, causing further complications and slow performance.
- The system was finally restored by extracting, re-importing, and resynchronizing the latest information from the corrupted database onto new equipment.
See a mistake? Contact us.