Download PDF Summer Bridge on Engineering the Energy Transition June 26, 2023 Volume 53 Issue 2 This issue explores the energy transition needed to address the mounting threats of climate change. The articles are an excellent resource to help inform meaningful decisions and steps for energy-related contributions to reduce carbon emissions. The Electric Grid and Severe Resiliency Events Wednesday, June 7, 2023 Author: Thomas J. Overbye, Katherine R. Davis, and Adam B. Birchfield As the power grid becomes both more essential and more vulnerable, new approaches are needed to ensure its resiliency. Large-scale electric grids worldwide are in a time of rapid transition due to a variety of changes, including the addition of large amounts of renewable and distributed resources, the electrification of transportation, the need for more energy storage, increasing use of advanced technology for monitoring and control, smarter distribution systems, and sophisticated electricity markets. It is an exciting time, and one with many engineering challenges. While the future could be quite bright, this time of great transformation is also a time of potential peril. Societies around the world are increasingly dependent on a reliable, nearly ubiquitous supply of electricity. The impact of the loss of a portion of a large-scale electric grid ranges from minor inconvenience when the outage is brief and limited in scope to potentially catastrophic when it covers a large region for a long duration. Grid Reliability and Resiliency In 2010 the North American Electric Reliability Corporation (NERC) and the US Department of Energy (DOE) used the term high-impact, low-frequency (HILF) events to denote risks that could cause long-term, widespread blackouts (NERC and DOE 2010); HILFs may also be called Dark Sky or Black Swan events (e.g., Paté-Cornell 2012). Recognizing that such events ultimately affect grid resiliency, here we use the term severe resiliency events (SREs). A future pandemic could affect the electricity grid workforce, making it difficult to continue operating the transmission grid and resulting in blackouts. “Keeping the lights on” involves designing and operating electric grids[1] with the goal of achieving two related but different concepts: reliability and resiliency (Kezunovic and Overbye 2018). For large-scale grids, reliability has two core concepts: (1) adequacy (enough electricity supply) and (2) operating reliability (the ability of the high-voltage grid to withstand contingencies such as the loss of a transmission line) (NASEM 2017). Reliability mostly concerns smaller, more routine events, with the goal of keeping all, or almost all, of the grid intact. Resiliency is also about keeping the lights on, but is more pertinent to this article’s focus on more severe events. In this paper the most germane definition of resiliency is from the North American Transmission Forum (NATF 2022): “The ability of the system and its components (both equipment and human) to 1) prepare for, 2) anticipate, 3) absorb, 4) adapt to, and 5) recover from non-routine disruptions, including…[HILF] events, in a reasonable amount of time.” An event’s magnitude, the scale, location, and duration of grid exposure to the event, and other factors all determine the impacts of the event. Impacts and thus the desired system response are based on the power system’s electrical characteristics, which inform exactly what must be done to prepare for, anticipate, absorb, adapt to, and recover from such events. In this article we explain severe resiliency events and provide some guidance on how their risks can be reduced and their impacts mitigated. Severe Resiliency Events SREs combine large size and long duration with potentially catastrophic societal impacts. They can occur initially in the electricity grid and then spread to other sectors, start in another sector and spread to the electricity grid, or simultaneously affect both (Bose and Overbye 2021). They include events that cause grids to have cascading failures, such as what happened in the North American Eastern Interconnection on August 14, 2003, when localized problems in Ohio resulted in a blackout affecting 50 million people in eight states and southeastern Canada (USCPSOTF 2004). Types of Threat Events The 2010 NERC-DOE report considered four types of HILFs: (1) cyber or physical coordinated attacks, (2) pandemics, (3) geomagnetic disturbances (GMDs), and (4) high-altitude electromagnetic pulses (HEMPs) caused by the detonation of a nuclear weapon in or above the atmosphere. While any of these could involve catastrophic scenarios, they exist on a frequency and severity continuum, with the more common occurrences often classified as reliability events. For example, vandalism at a few transformers in a single electrical substation, causing thousands to lose electricity for a few days, is a reliability event, whereas a large-scale coordinated attack that disables large portions of an interconnected power system for weeks or even months, affecting millions, is an SRE. The covid-19 crisis is a resiliency example, akin to the NERC pandemic scenario, affecting the electricity grid workforce, making it increasingly challenging to continue operating the transmission grid and resulting in blackouts. Thus SREs, and associated risk reduction and mitigation measures, need to be considered on a reliability-resiliency continuum. Other SRE classes include severe weather, earthquakes, major operational errors, volcanic events, tsunamis, and wildfires (NASEM 2017). A recent SRE was winter storm Uri in Texas in February 2021—it came close to blacking out all of the Electric Reliability Council of Texas system (FERC and NERC 2021). An increase in the frequency and virulence of SREs and the potential involvement of external systems and infrastructure are also notable features. The US Cybersecurity and Infrastructure Security Agency (in the Department of Homeland Security) defines 16 critical infrastructure sectors whose assets, systems, and networks are so vital that their destruction or loss would devastate national security and welfare (CISA 2023). Electric energy is the uniting factor among all 16 sectors. Natural vs. Human-Induced Events SRE risk reduction and mitigation require consideration of the nature of the event and its relative risk. For example, approaches to protect against one class of events could be quite different than for other classes, and some (e.g., earthquakes, hurricanes) are prevalent in some areas but not others. A key distinction is between natural and human-induced events. Natural events, like GMDs, severe weather, and earthquakes, have underlying causes that generally cannot be prevented. Resiliency efforts for natural events involve predicting and preparing, taking steps to reduce impacts on infrastructure, limiting the scale and cascade of impacts, and expediting repair. Human-induced events may be unintentional or intentional. Unintentional events due to lack of training or flaws in system design or operation may produce cascading, broad-range impacts. The predictability of these events is quite low, since known flaws are (presumably) corrected. Much like hidden bugs in software, the potential for these events could be hiding in many aspects of the system, particularly as components and control schemes get faster-paced and more complex. Efforts to enhance resiliency in this category mainly focus on preventing them from happening through detailed reviews of system design and operational practices, including personnel training. Unintended problems may be more likely to occur at the boundaries or interdependencies of different subsystems, which are individually robust but have hidden failure modes when combined with a larger system. For example, during winter storm Uri in February 2021 (FERC and NERC 2021) electric outages compounded existing problems at natural gas processing facilities and increased the shortage of electric generation, an effect that could have been reduced with proper coordination of critical load designations. Intentional human-induced events are the work of malicious actors seeking to cause disruption to the grid. Resiliency to these types of events may be the most challenging because they are due to an active intelligent effort to maximize the impact and duration of an event, perhaps timing it when society is particularly vulnerable (e.g., during a cold weather event). The assets and subsystems affected are not arbitrary and are meant to cause significant disruption. Unfortunately, public discussion of grid vulnerabilities to such disturbances may help an adversary better plan attacks, and unlike cyber vulnerabilities that may be rapidly patched, some grid vulnerabilities (e.g., to cyberphysical coordinated attacks, HEMPs) are not easily rectified. Prepare, Anticipate, Absorb, Adapt, Recover Engineered critical infrastructure systems are built from, and depend on, interdependent systems of systems, with computational, physical, and human components. In direct or indirect ways, they all depend on power and energy. Hence, protecting against SREs of any origin to avoid operational impact requires new approaches that cross traditional silos for careful design and implementation of solutions. Unintentional human-induced events may result from lack of training or flaws in system design or operation. The goal of enhancing electric grid resiliency is to minimize, in a cost-conscious manner, the likelihood of long-duration blackouts, reduce their magnitude, and recover as quickly as possible. Using the NATF (2022) approach, coupled with the feedback component from NASEM (2017), grid resilience involves the following: (1) prepare as much as possible through both long- and short-term planning, (2) anticipate what is happening before and during the event through situational awareness, (3) design the grid (including its asso-ciated control and cybersystems) to be robust and able to absorb shocks, (4) adapt as needed during the event, (5) recover as quickly as possible, and (6) learn from what occurred and improve (figure 1). The effectiveness of all six steps depends on the first: what is done to prepare well before an SRE occurs. Simulation and Assessment Simulations and assessments contribute to resilient system design. For example, could different grid architectures reduce the impacts of certain points of failure (e.g., critical substations) (Nagpal et al. 2022)? Simulation results can be used to determine needed procedures to address potential events. Not every event can be fully anticipated or mitigated, but realistic plans must be developed beforehand. Scenario development and operational planning require a wide range of research—even in the aggregate, such research is almost always significantly less expensive than even one of the events it seeks to mitigate. To know how to respond to events, an initial assessment is crucial to identify and predict events and their impacts. The value of the assessment is enhanced with high-fidelity models and corroborating data, and learning from the data (and experience) when models are inadequate or absent. Most assessment simulations are inherently interdisciplinary, particularly in efforts to accurately represent events such as earthquakes or hurricanes. For more common events, such as hurricanes, the risks are well known. But to some extent each event class has its own characteristics and relative risks, and requires its own mitigation strategies (Veeramany et al. 2016a). For instance, recent work discusses HEMP impacts and mitigation (EPRI 2019) and illustrates how models can break down during HEMP simulations (Overbye et al. 2022a). For many SREs, however, the risks are not precisely known, although there are some useful commonalities in ways to improve their simulation and assessment. For instance, the development of better approaches to simulation can reduce convergence issues in simulation software. But simulations can be challenging because some types of events may not have occurred in a particular region—or at all—and even within a particular event class there can be significant variability. An ongoing challenge in efforts to improve electric grid resiliency is the availability of grid models and data for research. Models of the actual grid are, of course, available to engineers in the electric utility sector and can be used in many SRE simulations. But development of advanced simulation tools, for example, needs to be done by researchers. Because of security concerns stemming from the September 11, 2001, terrorist attacks on the United States, agencies such as the Federal Energy Regulatory Commission have designated much electric grid information useful for SRE analysis as critical energy infrastructure information (CEII), meaning that it cannot be freely shared (FERC 2001). To address this problem, over the last several years geographically based synthetic grids have emerged (NASEM 2016). These fictional grids are free from CEII classification and designed to mimic the complexity of actual large-scale electric grids, with appropriate geographic coordi-nates so they can be coupled to other infrastructures and SREs (Birchfield et al. 2017; Xu et al. 2018). This is a useful compromise to provide realistic complexity to develop and test simulation tools without disclosing CEII-sensitive data. Geographically based synthetic grids are useful in efforts to determine earthquake risk (Veeramany et al. 2016b), and a combination of real and synthetic grids has been used to study an AC interconnection of the North American East and West grids (Overbye et al. 2022b). Figure 2 shows a detailed synthetic grid denoting different nominal transmission line voltages for the contiguous United States. Protection, Control, and Reinforcement A key aspect of SRE mitigation is to avoid cascading blackout scenarios, in which localized events can rapidly affect an entire interconnection (Dobson et al. 2007; Schäfer et al. 2018). While some disturbance phenomena propagate at nearly the speed of light (e.g., traveling waves from faults on a transmission line), most do so on much slower time scales because of grid interactions with the electromechanical coupling of rotating inertia. Protection and control systems also affect the way disturbances propagate. So, although grids are subject to many disturbances, most are quickly isolated, resulting in little or no loss of load. The challenge in assessing SREs is to ensure that even when subjected to a large disturbance, the bulk grid remains intact. One way to achieve this is intentional islanding, in which a grid is quickly broken up into a number subgrids operating independently (Biswas et al. 2020; Senroy et al. 2006). Such an approach could allow continued operation in parts of the grid, enabling faster recovery. Enhancing the grid is also about infrastructure reinforcement, with designs that improve the system’s ability to absorb disturbances. Some effective enhancements are expensive, as is the case with replacement of wood transmission towers with more wind-resistant steel or concrete, undergrounding of transmission lines, or implementation of stronger standards for distribution structures (e.g., National Electric Safety Code Rule 250C or 250D; Jurgemeyer and Miller 2014). Inherent resiliency in the design of the grid lays the foundation for operational resiliency (i.e., withstanding an SRE in real time without significant degradation). The ability to respond—at any stage: before, during, or after an event—requires good situational awareness (Endsley 1995) of the grid and related control systems, and assessment of hazards. This certainly applies to electric grid SRE simulations of unusual operating conditions (Overbye et al. 2021). Modeling, situational awareness, and response—the three pillars of power system resiliency—work together to support the grid’s operational and infrastructural resiliency. Modeling involves testbed simulation of the system with its threats and defenses. Situational awareness requires vulnerability and risk analyses, monitoring, inference, and detection. Response includes mitigations, defense, outreach, and training. There is a direct link between the first two and effective response. Studies of SREs must also consider how to measure and quantify risk avoidance. This is important because it is difficult to quantify something that hasn’t yet happened, and therefore difficult to justify investment for protection and defense against it. It is more straight-forward to quantify the cost impacts of historical events. Defense against large-scale cyber disruptions has been a key driver of research in this area (NASEM 2020; for discussion of specific needs, see Gunduz and Das 2020, Sun et al. 2018). A coordinated approach for next-generation energy management systems begins planning before an event and carries the model and associated data through the entire analysis cycle—before, during, and after an event (Sahu et al. 2023). This is known as event lifecycle security. For example, at the early stage, the goal could be to improve cyberphysical situational awareness, which is based on the model and preventive risk analysis. Next, monitoring and verification combine different data sources (including cyber and physical) to identify or infer system vulnerabilities. Then the models and data are combined to support online preventive cyberphysical risk analysis with current state information to understand how expected system behavior matches observations. Last, these analyses provide recommendations for response and mitigation, for use in next--generation energy management systems. A next-generation energy management system based on the three pillars of power resiliency would facilitate new capabilities for online control actions that couple cyber and physical domains. The integrity and security of the data flow pipeline are crucial for grid resiliency, so such next-generation energy management systems will track and secure the grid cyberphysical critical infrastructure from monitoring to analysis to control. Conclusion Since the creation of the first grids in the 1880s elec-tricity has played an indispensable role in the development of modern societies. This transformation continues, with rapidly increasing use of renewables, expansion of computing power and artificial intelligence, and massive integration of consumer-based grid edge technologies, including the electrification of transportation. With the new opportunities provided by this transformation, there are also challenges in how to define and respond to severe resiliency events. The emergence of new technologies for incorporation in the grid may suggest optimism for the future, but they also require defense against a variety of SREs. A change in paradigm is needed—as are changes to traditionally designed solutions. Detailed modeling of SREs using realistic electric grid models is essential, with consideration of the end-to-end lifecycle of each event. References Birchfield AB, Xu T, Gegner KM, Shetye KS, Overbye TJ. 2017. Grid structural characteristics as validation criteria for synthetic networks. IEEE Transactions on Power Systems 32:3258–65. Biswas S, Bernabeu E, Picarelli D. 2020. Proactive islanding of the power grid to mitigate high-impact low-frequency events. 2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conf, Feb 17–20. Bose A, Overbye TJ. 2021. Electricity transmission system research and development: Grid operations. In: Transmission Innovation Symposium: Modernizing the US Electrical Grid, May 19–20. US Department of Energy. CISA [US Cybersecurity & Infrastructure Security Agency]. 2023. Critical Infrastructure Sectors. Dobson I, Carreras BA, Lynch VE, Newman DE. 2007. Complex systems analysis of series of blackouts: Cascading failure, critical points, and self-organization. Chaos 17:026103. Endsley MR. 1995. Toward a theory of situation awareness in dynamic systems. Human Factors 37:32–64. EPRI [Electric Power Research Institute]. 2019. High-Altitude Electromagnetic Pulse and the Bulk Power System: Potential Impacts and Mitigation Strategies (EPRI 3002014979). FERC [US Federal Energy Regulatory Commission]. 2001. Treatment of Previously Public Documents (Docket No. PL02-1-000). www.ferc.gov/sites/default/files/2020-05/PL02-1-000.pdf. FERC, NERC [North American Electric Reliability Corporation]. 2021. The February 2021 Cold Weather Outages in Texas and the South Central United States. Gunduz MZ, Das R. 2020. Cyber-security on smart grid: Threats and potential solutions. Computer Networks 169:107094. Jurgemeyer MF, Miller BM. 2014. NESC wind and ice load effects on wood distribution pole design. IEEE Transactions on Industry Applications 50(5):3004–10. Kezunovic M, Overbye TJ. 2018. Off the beaten path: Resiliency and associated risk. IEEE Power and Energy 16(2):26–35. Nagpal SV, Nair GG, Parise F, Anderson CL. 2022. Designing robust networks of coupled phase-oscillators with applications to the high voltage electric grid. IEEE Transactions on Control of Network Systems. NASEM [National Academies of Sciences, Engineering, and Medicine]. 2016. Analytic Research Foundations for the Next-Generation Electric Grid. National Academies Press. NASEM. 2017. Enhancing the Resilience of the Nation’s Electricity System. National Academies Press. NASEM. 2020. Communications, Cyber Resilience, and the Future of the US Electric Power System: Proceedings of a Workshop. National Academies Press. NATF [North American Transmission Forum]. 2022. Understanding the Definition of Resilience. NERC [North American Electric Reliability Corporation]. 2012. Severe Impact Resilience: Considerations and -Recommendation. NERC, DOE. 2010. High-Impact, Low Frequency Event Risk to the North American Bulk Power System. Overbye TJ, Shetye KS, Wert JL, Trinh W, Birchfield AB. 2021. Techniques for maintaining situational awareness during large-scale electric grid simulations. Power & -Energy Conf, Apr 1–2, Champaign IL. Overbye TJ, Snodgrass J, Birchfield AB, Stevens M. 2022a. Towards developing implementable high altitude electromagnetic pulse E3 mitigation strategies for large-scale electric grids. IEEE Texas Power & Energy Conf, Feb 28–Mar 1, College Station. Overbye TJ, Shetye KS, Wert JL, Li H, Cathey C, Scribner H. 2022b. Stability considerations for a synchronous interconnection of the North American Eastern and Western electric grids. Proceedings, 55th Hawaii Internatl Conf on System Sciences, Jan 4–7. Paté-Cornell E. 2012. On “black swans” and “perfect storms”: Risk analysis and management when statistics are not enough. Risk Analysis 32(11):1823–33. Sahu A, Davis K, Huang H, Umunnakwe A, Zonouz S, -Goulart A. 2023. Design of next-generation cyber-physical energy management systems: Monitoring to mitigation. IEEE Power & Energy 10:151–63. Schäfer B, Witthaut D, Timme M, Latora V. 2018. Dynamically induced cascading failures in power grids. Nature Communications 9:1975. Senroy N, Heydt GT, Vittal V. 2006. Decision tree assisted controlled islanding. IEEE Transactions on Power Systems 21(4):1790–97. Sun C-C, Hahn A, Liu C-C. 2018. Cyber security of a power grid: State-of-the-art. Internatl Journal of Electrical Power & Energy Systems 99:45–56. USCPSOTF [US-Canada Power System Outage Task Force]. 2004. Final Report on the August 14, 2003 Blackout in the United States and Canada: Causes and Recommendations. Veeramany A, Unwin S, Coles GA, Dagle JE, Millard DW, Yao J, Glantz CS, Gourisetti SNG. 2016a. Framework for modeling high-impact, low-frequency power grid events to support risk-informed decisions. Internatl Journal of -Disaster Risk Reduction 18:125–37. Veeramany A, Coles GA, Unwin SD, Nguyen TB, Dagle JE. 2016b. Trial Implementation of the High-Impact, Low-Frequency Power Grid Event Risk Framework to Support Informed Decision-Making (PNNL-25667). Pacific Northwest National Laboratory. Xu T, Birchfield AB, Overbye TJ. 2018. Modeling, tuning, and validating system dynamics in synthetic electric grids. IEEE Transactions on Power Systems 33(6):6501–09. [1] The term grid encompasses both the equipment used to deliver electricity and the many associated components such as control and cyber systems. About the Author:Thomas Overbye (NAE) is a professor and Katherine Davis and Adam Birchfield are assistant professors, Department of Electrical & Computer Engineering, Texas A&M University. Overbye is also director, Smart Grid Center.