Seemingly out of nowhere, the world came to an end for a full day until power was restored.
What was the reason?
I don’t know the particulars but it is usually always the same. An update in one software was inconsistent with another. A line of code got through that didn’t work as planned. Some third-party software stopped communicating with the main server. A cascade of effects followed that messed up everyone’s lives.
This is becoming more and more common. Two days earlier, all the servers hosted by Amazon went down. This wrecked countless social sites, financial services, airline reservations, and banks. Whole industries ground to a halt, and for exactly the same reasons.
Everything about the digital age works great until it doesn’t. Neither you nor I can fix it when it does break. We don’t likely know people who can. We are all vulnerable. Our hands are tied. We can only sit and wait for the administrators of the services to kick the machines, tug on the wires, revert the codes, reboot this and that, and otherwise somehow find out what’s wrong.
I was a server administrator in the early days so I have my experiences with outages. You are staring at black screens. So the mind begins to wander. The problem could trace to one of dozens of possible causes. Or maybe hundreds or thousands. Or millions.
The only way to get through it is to be very calm and rational. It’s a process of elimination and that requires disciplined logic. And quiet.
Meanwhile, everyone around you is screaming for a fix. The boss is freaking out. Emails and phone calls are flying everywhere. Online communications are blowing up. It’s like the world is ending. And yet you have to be the clear-headed one, even as the heat is on. When you find out the real issue, it seems perfectly obvious. Then people wonder what took you so long.
Here is the real problem we face. The whole of civilization embraced digital technology as the new shiny object. It seemed like the thing to do. Mostly it worked. But there has been a tremendous neglect of basic principles of engineering such as creating redundancy. When things go wrong, you need a backup plan. In principle, code architects know this. In practice, creating redundancy is the most neglected feature of the digital age, because doing so does not pay the bills.
The key problem is that the people most affected by breakage are powerless to do anything about it. We are completely at the mercy of the digital masters. And with artificial intelligence (AI) it is even worse. The digital masters are themselves machines that absolutely no one fully understands. When AI starts to break, we are going to rue the day that everyone and his dog threw themselves into this without so much as a thought.
1 comment:
Tonight we had a storm, lost power for 5 hours. Except for having to make coffee on the gas stove instead of the kettle it mattered little. After a week or so it may be a problem with washing. It is a trust issue, do you trust tec and government to be there when the things go bad?
Post a Comment