Prairie Trail Logo

Views from the Prairie

October 12

Designing Around Failure

Most of our efforts in life are about striving for perfection. We want to make perfect products, perfect corporations, and perfect computer software. None of that is possible. A growing number of people are suggesting that we may do better by trying to design for failures.

One big place where failure is almost guaranteed is in designing computer systems for financial services. The complexity of financial services like credit card processing is very high which means that there will always be bugs in the software. Thus, there will always be ways for the "bad guys" to get into the system and steal money. And, we are seeing the results: year after year, a major processor finds out that it has been "hacked" and lost either money or account information.

Currently, most efforts have been focused on trying to find the holes and patch them. We have the PCI standards and major efforts are being made to try to get all merchants to adhere to them. However, each year, a new way to break into the processing systems is found and the "standard" is changed each year. The results are that merchants are frustrated. The merchant spends a bunch to get PCI certified and they are broken into anyways.

A recent Transaction Trends article estimated that our current security is less than 70% effective. To get the type of security we want, we would have to be spending many times the current spending.

Security technicians often point to improving user authentication with stronger passwords or other more difficult ways. However, in most of the big breaches, the bad guys went around the security locks by using other methods to get to the data.

The alternative is to plan around failure. Successful human systems (such as democracy) all are designed around failures. Part of the success of the Internet is that it was designed to accept and deal with failure.

The new Federal Trade Commission's chief technologist, Columbia University professor Steven Bellovin, says that we need to change our architecture to one that is designed around the realization that security breaches will happen. If we plan for our computer systems to be broken into, we will change how data is stored. We will design it so that when the system is broken into, the thieves don't get everything they are looking for.

There are a few examples of designing around failure. When Amazon had a massive failure, Netflix and SmugMug were not taken down by that failure even though they used Amazon extensively. Both services planned for failure. NetFlix even has a process running on their system to kill computer services at random times just to see if they still have a functioning system without that service. But they are not perfect, some failures have brought NetFlix down for a while.

But planning for and testing for failure produces more robust systems.

Energy Costs

Every time we have volatile energy costs, Americans change their energy usage. We did that following the 70's energy crisis. We did that when we were running out of whale blubber oil and southern pine turpentine in 1860's. Each time, we have numerous entrepreneurs creating new industries to provide the energy needed and change the services we need to use less energy.

Today, one rising cost is the energy cost of computer networks. Right now, the estimate is that the Internet is consuming 1% of the country's total energy usage. The energy cost of the computer is often as high as the hardware cost. That is a significant cost. The large data centers run by Google, Amazon, etc. are spending efforts to identify what the energy needs really are so that they can cut the costs of providing those services.

Part of the situation is that the computers rarely are used all the time. Many are used very rarely - yet are always on and using energy. This is truer at smaller companies. There is a term for computers running Windows that never sleep, "Windows Insomnia". The energy cost of the computer is often as high as the hardware cost. The more computers a company has, the more it can benefit from managing how well those computers "sleep".

In datacenters, significant energy is spent cooling the place. Now, newer designs allow far better management of the heat such that some datacenters do not even need air conditioning (being in a cooler climate than Texas). Turns out that modern computers rarely need the deep cooling that huge mainframe computers needed in the past. Many servers can operate well at 80 degrees instead of 65.

There are a lot of ways to save with computer systems. The energy costs is one often overlooked way.

Risky World

Web pages are getting more and more ads on them with the result that page load times are getting longer and longer. The video ads are the worst. People are starting to refuse to stay with a page till it fully loads. Trying too hard to make money off of a page is reaching a limit.


This newsletter is posted here as well as sent via mail and email. If you wish to receive updates, please sign up above.

Prior Years

  1. 2008
  2. 2009
  3. 2010
  4. 2011