In This Issue
Engineering for the Threat of Natural Disasters
March 1, 2007 Volume 37 Issue 1

Building a Resilient Organization

Thursday, March 1, 2007

Author: Yossi Sheffi

Governments must understand the resiliency and risk management strategies of private-sector enterprises.

The 9/11 attack, the SARS epidemic, Hurricane Katrina, and scores of other disruptions have made companies more aware of the need for active risk management. Governments in the West have also realized that more than 85 percent of the infrastructure in western countries is owned and/or operated by the private sector. At the very least, this means that governments need to understand the resiliency and risk management strategies of private-sector enterprises. This article, based on ideas described in my recent book, The Resilient Enterprise: Overcoming Vulnerability for Competitive Advantage (MIT Press, 2005), outlines some of the ways companies prepare for inevitable large disruptions. Most of the lessons are also relevant to government, nonprofit, and other types of organizations.

A Fire in the Desert
On Friday night, March 17, 2000, in Albuquerque, New Mexico, a bolt of lightning struck a factory of Philips NV, the Dutch electronics conglomerate, causing the furnace in Fabricator No. 22 to catch fire. At the time, this did not seem to be a major event. The automatic sprinkler systems were activated, and Philips-trained staffers put out the fire in less than 10 minutes. Thus, by the time firefighters from Albuquerque arrived, they had nothing to do but verify that the plant was safe. What the firefighters did not realize was that the location of the fire had been one of the cleanest places on earth, a semiconductor fabrication plant, or “fab,” for making special chips for cell phones. The fire had damaged two of the four clean rooms.

Philips immediately notified the two largest customers of the plant—Ericsson LM and Nokia Corporation—both of whom were in the process of launching a new generation of cell phone based on chips produced in the Albuquerque plant. In the original message, Philips estimated a one-week delay in the supply of chips. Nokia was not unduly perturbed by the news, but just to be sure, placed the affected chip on a “special watch” that called for daily discussions between Nokia and Philips engineers. Nokia discovered very quickly, however, that it would take several months to bring the Albuquerque plant back to full production, causing the company to miss the launch of its new-generation cell phones. At that point, Nokia sprang into action on two fronts. First, it pressed Philips to find an alternative supply from its other fabs around the world—even though this would mean outsourcing some of Philips’ own production. Second, Nokia looked for alternative suppliers around the world and paid them extra for quick setup and testing and expedited production.

Ericsson, Philips’ other major customer, basically ignored Philips’ original message, knowing that one-week delays in supply chains are routine and that the company had enough stock to cover the gap. In sharp contrast to Nokia, however, Ericsson did not detect the problem fast enough, and by the time the magnitude of the shortage became apparent, the worldwide supply of special chips had been sewn up by Nokia. At the end of 2000, Ericsson announced a staggering 16.2 billion kronor (U.S. $2.34 billion) loss in the company’s mobile phone division, which the company blamed on a slew of component shortages (LaTour, 2001). About a year after the fire, Ericsson retreated from the handset production market. In April 2001, Ericsson signed a deal with Sony to create a joint venture to design, manufacture, and market handsets. The new company, Sony-Ericsson, would be owned 50–50 by the two companies.1 Thus one of Nokia’s major competitors was eliminated from the marketplace. Within six months of the fire, Nokia’s year-over-year share of the handset market increased from 27 to 30 percent.

Although both Ericsson and Nokia had been hit by the same disruption, one recovered while the other had to give up significant parts of its business. How could this happen? Why did the same disruption cause one large, sophisticated company to exit the market and the other to increase its market share?
This example illustrates many lessons of resiliency. Most important, risk management is not a company issue that can be handled “within the four walls.” It is a supply chain management issue. Every company is a “citizen” of its supply chain that depends on networks of suppliers, logistics providers, brokers, port operators, and many other facilitators to get parts to plants and distribute products to customers. A serious business interruption can happen, not only because one of the company’s own facilities, distribution channels, or workforce is disrupted, but also when an element in its supply chain, its ecosystem, is disrupted. Repairing the damage to the Philips plant cost about $40 million, mostly covered by insurance. The damage to Ericsson—the loss of its handset manufacturing business—was orders of magnitude larger.

Risk management is a
supply chain issue, not
a company issue.

The Nature of Risk
Most corporations today approach risk management in one of two ways: (1) based on models and numbers or (2) based on subjective beliefs about the future. If there is reason to believe that the patterns of the past will be repeated, a company can use data, probability distributions, and models to forecast future patterns. These forecasts can then be used as a basis for strategies to address expected variations in future outcomes. However, other outcomes, known as high-impact, low-probability (HILP) events, are difficult to forecast because they are outside the company’s past experience. For the same reason, HILP events can have a significant impact on an enterprise.

Most companies first classify possible risks along two axes (Figure 1—see PDF version for figures). One axis denotes the likelihood of a particular disruption; the other axis denotes the impact (or severity) of this disruption once it hits. Figure 1 shows a simple example of this kind of risk classification. The space in which threats are placed can then be divided into four quadrants, as depicted in Figure 2. Rare, insignificant events are placed in the lower left-hand quadrant and are not of concern. Events with high probability and light consequences are also of little concern because data, statistical distributions, and models provide ample warnings and tools to address them. These so-called “firefighting” events that operations managers deal with all the time are placed in the upper left quadrant.

Even high-probability, high-impact events (upper right quadrant) are not of particular concern, because special groups in each company have processes in place for dealing with them. For example, BP suffers substantial losses every time a hurricane moves through the Gulf of Mexico. Deep-water platforms have to be buttoned down and evacuated, and platforms are often damaged and have to be repaired at very high cost. However, because hurricanes in the Gulf of Mexico are an annual phenomenon, BP has a well developed process for dealing with them.

HILP events, however, such as the sinking of the Titanic, shown in the lower right quadrant, are qualitatively different. These are events that companies, or governments, have not imagined and are not prepared for and that, therefore, can have devastating consequences. Examples include the 1984 disaster in the Union Carbide plant in Bhopal, India, the 1986 Chernobyl nuclear accident, the 2003 SARS outbreak, the 9/11 terrorist attack, and Hurricane Katrina in 2005.
Note that the expected damage (i.e., the product of probability and consequences) is not a good measure of risk! Frequent small disruptions have little in common with rare, high-impact disruptions, even though their expected values may be similar. The former are dealt with by operations managers in the course of their jobs, while the latter can devastate an enterprise. And disruptions with the highest expected value (high-probability, high-impact events) should not be the focus of most attention from risk managers because organizations are most likely ready for them and have processes in place to detect them and deal with their consequences.

Government Responses to High-Impact, Low-Probability Events
In many cases, HILP events cause fear and confusion, and governments may feel compelled to act quickly, even before all of the necessary information is available. Unfortunately, hasty government actions often exacerbate problems.

After the September 11 terrorist attacks, for example, the U.S. government closed U.S. borders and shut down all flights in and out of the country. These measures had immediate impacts on many supply lines. Ford Motor Company had to idle several assembly lines intermittently because trucks loaded with components coming in from Canada and Mexico were delayed. As a result, Ford’s fourth-quarter output in 2001 was down 13 percent compared to its production plan (Andel, 2002). In response to the 2001 outbreak of foot and mouth disease, the British government not only slaughtered 6.5 million cows, pigs, and sheep, but also closed the countryside to tourists. Damage to the tourism industry turned out to be significantly greater than damage to the agricultural industry—and it was clearly caused by the government’s actions.

Thus enterprise managers must consider the consequences of possible government actions as part of any disruption scenario. For example, if a container explodes in a U.S. port, the government is likely to close all ports, thus causing significant economic damage.

The High Likelihood of Low-Probability Events
On Thursday, May 8, 2003, a powerful tornado hit the General Motors (GM) assembly plant in Oklahoma City causing extensive damage and a second-quarter charge of $140 to $200 million. Although the probability that a specific disruption will hit a given element in a company’s supply chain during a particular week is negligible, the probability that a major disruption of some type will take place somewhere in GM’s vast supply chain sometime during a given year is significant. Thus an enterprise like GM must be resilient.

Resilience is, literally, the ability of a material to return to its former shape after a deformation. Similarly, a resilient enterprise is an organization that can “bounce back” to its pre-disruption level of manufacturing, service to customers, or any other relevant performance metric. Enterprises can build in resiliency in two ways— through redundancy and through flexibility. Regardless of the general strategy, however, early detection of a disruption and the right corporate culture are major determinants of resilience.

Early Detection
Among many counterterrorism professionals, the real nightmare scenario is not a nuclear explosion or a “dirty” bomb, but an attack in which an organization does not realize it is under attack until it is too late. For example, the first symptoms from a lethal biological agent may not be evident for weeks, but then might spread very quickly. Therefore, the Centers for Disease Control in the United States, on a daily basis, looks into geographical clusters of respiratory infections and small rashes accompanied by fever, symptoms that may signal a bio-terror attack in progress (Gerberding, 2004).

At the time this article is being written, the World Health Organization and local health authorities are spending significant resources to detect the onset of avian flu. If the virus mutates so it can be transmitted among humans, it could be even deadlier than the 1918 Spanish flu, which reportedly killed 30 to 60 million people worldwide. The best defense against pandemics is early detection and quarantine until antiviral drugs and vaccines against the active flu strain can be developed.

In many cases, early detection of a disruption means not only that an organization receives warning signs, but also that it can process, understand, and act on those signals. A clear failure of organizational response took place during Hurricane Katrina. The city of New Orleans started the evacuation too late, the state of Louisiana called in Pentagon resources too late, and the Federal Emergency Management Administration provided a meager response at best. And this was a disaster the country was warned about days in advance.

Enterprises can build in
resiliency through redundancy
or through flexibility.

Redundancy in Supply Chains
An enterprise can be resilient if it creates redundancies throughout its supply chain—low-capacity utilization, extra inventory, multiple suppliers for the same part, and so on. For example, the U.S. Postal Service (USPS) was able to withstand the anthrax attacks very well, even though several major facilities had to be shut down. Over the last two decades, as fax, e-mail, and online payments had reduced the volume of mail, USPS had not adjusted its capacity at the same pace. Thus it had a built-in redundancy that proved useful in that situation. However, very few commercial enterprises can afford redundant capacity that can be activated in case of a disruption.

The most common type of redundancy is a safety stock of parts and finished products, which most companies maintain to protect themselves against “normal,” day-to-day fluctuations in the global flow of commerce. However, maintaining safety stock as protection against HILP events is very expensive because a lot of inventory would have to be kept for a long time. Keeping a large inventory is expensive for two reasons: (1) it has to be maintained, financed, warehoused, and attended to, even as the value of the product may decrease while it is kept in inventory; and (2) excess inventory leads to sloppy operations—if there is a defective part in the manufacturing process or a defective product ready to ship, it is easy for managers to “take one from the inventory pile” rather than take the time immediately to investigate and correct the problem. As the Toyota manufacturing system has proven, fixing problems at the source is an essential part of a superior business model.

In the last two decades, leading manufacturers, distributors, and retailers worldwide have made tremendous strides in developing “lean,” tightly coupled, efficient supply chains that can react to disruptions quickly based on advanced information technology applications, electronic data interchange standards, and finely honed processes. Nevertheless, the lack of built-in redundancy makes even these supply chains vulnerable to disruptions. At the same time, no company can afford significant redundancies, which are likely to reduce competitiveness in the marketplace.

Fixing problems at the source
is essential to superior
business models.

If companies cannot afford to “fatten” their operations with redundancies, they must build in flexibility. Unlike redundancy, increasing supply-chain flexibility can help a company not only withstand significant disruptions but also respond to demand fluctuations, thus increasing its competitiveness. The notion of flexibility is based on interchangeability—the ability to interchange elements in a supply network quickly.

Standardized Facilities and Processes
Intel plants around the world are identical (following their Copy Exact! philosophy). When the SARS epidemic hit Southeast Asia, Intel was able to move production from its Indonesian plant to other plants with relative ease. Similarly, when a severe ice storm shut down the main UPS air hub in Louisville, Kentucky, in 1986, making it impossible for workers to reach the facility, the company kept operating by flying in workers from other parts of its system. Since UPS uses standard terminal design and processes throughout its vast system, the new workers were able to keep the Louisville hub operating.

Interchangeable Parts and Products
If the same parts are used in different products, the inventory of these parts is less susceptible to changes in the demand for those products. For example, if a product with a given part cannot be manufactured because of an unrelated problem, the part can still be used in other products and does not have to be discarded or held in stock for a long time. Following this logic, Intel has reduced a mix of 2,000 different types of resistors, capacitors, and diodes to only 35 types (Anderson, 2004). For the same reason, Southwest Airlines uses only one type of airplane—the Boeing 737. Airlines are always subject to disruptions from bad weather, crew shortages, airport congestion, and so on. However, because every Southwest crew can fly every company aircraft, Southwest has the flexibility to respond to disruptions faster than other airlines.

Concurrent Processes
Overlapping sequential processes can not only speed up the recovery phase after a disruption, but can also lead to improved market responses. Lucent Technologies achieves concurrency through a single supply-chain organization that spans multiple company functions, including engineering and sales. By aligning these activities under the supply-chain umbrella, the company can view operational areas concurrently and quickly assess their status in an emergency. In addition, the company’s responses to emergencies are faster and more efficient because people in different organizational units are accustomed to working together.

Designing products and processes for late value addition and late customization offers another layer of flexibility. By keeping products in a semifinished form, a company can move its products from surplus to deficit areas. This strategy also increases fill rates and improves customer service without increasing inventory carrying costs because products can be completed to meet a particular customer’s needs. Italian clothing manufacturer and retailer Benetton, for example, redesigned its manufacturing processes so that products that are subject to extreme variability in the demand for color are produced as generic, undyed items that can be finished when customer preferences can be determined, sometimes even after orders are placed.

Alignment of Procurement Strategy with Supplier Relationships
In response to 9/11 and other disruptions, some observers advised companies to maintain multiple suppliers for essential parts. However, there may be very good business reasons for having a single supplier, even for some critical parts: (1) a single supplier is more likely to allow access to its innovation because it is less worried about “seepage” of its intellectual property to a competing supplier; (2) the fixed, per-supplier, cost of procurement is minimized; (3) the company can concentrate its buying power, possibly leading to lower purchase costs; and (4) a company becomes a more significant customer of the supplier, thereby getting more attention.

But when using a single supplier, an enterprise does put all of its eggs in a single proverbial basket. To manage the related risk, the enterprise must commit to deep relationships with the single supplier—it must have a detailed understanding, and continuously monitor, the supplier’s strategy, financial condition, and the supplier’s suppliers. This strategy is shown in the top left-hand quadrant of Figure 3. If a company decides not to incur the cost of developing deep relationships with suppliers, it will be less knowledgeable about its trading partners and, therefore, less likely to be forewarned of supply problems. In this case, the enterprise must spread its risk by maintaining a network of suppliers (lower right quadrant in Figure 3).

Each company must choose the approach that aligns its corporate-supplier relationships with its procurement strategy. For example, when Land Rover’s sole supplier of chassis for the Discovery vehicle went bankrupt unexpectedly in December 2001, the company almost lost its business. Because of inadequate monitoring, Land Rover was totally unprepared for the bankruptcy and eventually had to pay off some of the supplier’s debt. This is the dangerous situation shown in the lower left quadrant of Figure 3. Maintaining close relationships with many suppliers may simply be too expensive to be practical.

By developing close, collaborative relationships with trading partners, companies can become allies during a crisis. Toyota, for example, recovered very quickly, with the help of dozens of suppliers, from a fire that gutted the sole plant of its main P-valve supplier in February 1997 (Nishiguchi and Beaudet, 1998). In another case, loyal customers enabled bond trader Cantor Fitzgerald to recover after it lost more than a third of its employees and its headquarters on 9/11. Collaborative relationships can also be crucial to companies responding to fluctuations in demand, which may require that the entire supply chain ramp production up or down.

Corporate Culture
The factor that clearly distinguishes companies that bounce back from disruptions quickly, and even profit from them, is their corporate culture. Corporate culture is difficult to define and even more difficult to change. But as the success of the quality movement in the 1980s showed, cultural change can become “everybody’s” issue, rather than the exclusive domain of experts or vested interests. Resilient organizations, such as Nokia, Toyota, UPS, Dell, and the U.S. Coast Guard, may not appear to have much in common, but a closer look shows that they have several common traits.

Continuous Communication among Informed Employees
Resilient companies communicate obsessively, keeping all managers aware of strategic goals, tactical factors, and the day-by-day, even minute-by-minute, pulse of the business. Dell employees, for example, have continuous access to product manufacturing and shipment information, as well as to the company’s overall status. Thus, when disruptions occur, employees can react based on up-to-date knowledge, even if the normal lines of communication have broken down.

Distributed Power
In addition to continuous communications and informed employees, resilient organizations empower teams and individuals to take drastic action when necessary, without waiting for the usual approvals. Toyota assembly-line workers can halt production by pulling an alarm cord, which brings in a team of engineers to fix the problem. The Coast Guard moved assets into Louisiana before Katrina hit and was operating life-saving, round-the-clock missions without specific instructions from the U.S. Department of Homeland Security or even from its national headquarters in Washington, D.C. The Coast Guard was guided by an operating principle called “On Scene Initiative,” which essentially empowers local commanders (USCG, 2006). In all of these organizations, individuals who take action are celebrated when they are right but not punished when they are wrong.

Passion for Work
Successful companies engender a sense of the “greater good” in their employees. As a Southwest Airline executive explained, “The important thing is to take the bricklayer and make him understand that he’s building a home, not just laying bricks.” Similarly, navy sailors do not think of their job as driving big ships, but rather as defending freedom.

Conditioning for Disruptions
Through frequent and continuous “small” operational interruptions, resilient, flexible organizations are conditioned to be innovative and flexible when HILP events occur. UPS operations, for example, are subject to adverse weather conditions, traffic congestion, road closures, and many other delays. Thus the company’s recovery processes are tested daily.

Companies with relatively predictable environments can interject uncertainty for training purposes. A special Intel team, for example, routinely visits plants and introduces simulated disruptions, such as the failure of a critical supplier. The team runs the plant through a complete drill of finding and qualifying alternative suppliers, arranging transportation, changing production schedules, and so on, just to guard against managerial complacency.

A resilient organization is not only “hardened” to withstand disruptions of all kinds, but is also more competitive on a day-to-day basis. Supply disruptions create shortages, similar to spikes in demand caused by supply/demand imbalances, and resilient enterprises can react to changing market demand ahead of their competitors. Furthermore, resilient enterprises can consider disruptions to be opportunities rather than problems. When large-scale disruptions affect a whole industry or an entire region, resilient enterprises are likely to bounce back ahead of their competition, winning market share and increasing customer loyalty.

Andel, T. 2002. Material handling in troubled times: will logistics save the day? Material Handling & Warehousing 57(13): 22.
Anderson, D. 2004. Build-to-Order and Mass Customization: The Ultimate Supply Chain and Lean Manufacturing Strategy for Low-Cost On-Demand Production without Forecasts or Inventory. Cambria, Calif.: CIM Press.
Gerberding, J.L. 2004. Statement by Julie L. Gerberding, director of the Centers for Disease Control and Prevention, in testimony before the Subcommittee on Labor, Health, and Human Services, Education, and Related Agencies, Committee on Appropriations, United States House of Representatives, April 28, 2004.
Latour, A. 2001. Trial by Fire: A Blaze in Albuquerque Sets Off Major Crisis for Cell-Phone Giants. Wall Street Journal, January 29, 2001, p. A1.
Nishiguchi, T., and A. Beaudet. 1998. The Toyota Group and the Aisin Fire. Sloan Management Review 40(l): 49–59.
Sheffi, Y. 2005. The Resilient Enterprise: Overcoming Vulnerability for Competitive Advantage. Cambridge, Mass.: MIT Press.
USCG (U.S. Coast Guard). 2002. Principles of Coast Guard Operations. Available online at:
Williams, M. 2001. Sony, Ericsson eye cell phone joint venture. SCI-TECH, April 20, 2001.

1 On January 26, 2001, Ericsson announced that it would no longer manufacture handsets and outsourced all of its manufacturing to Flextronics Inc. The Sony joint venture was announced three months later (Williams, 2001).

About the Author:Yossi Sheffi is a professor of engineering systems and of civil and environmental engineering at the Massachusetts Institute of Technology. He heads the MIT Center for Transportation and Logistics.