Download PDF Winter Bridge on Frontiers of Engineering December 18, 2019 Volume 49 Issue 4 The winter issue of The Bridge is focused on the 2019 Frontiers of Engineering symposium. Influencing Interactions between Human Drivers and Autonomous Vehicles Wednesday, December 18, 2019 Author: Dorsa Sadigh Society is rapidly advancing toward autonomous systems that interact and collaborate with humans—semiautonomous vehicles interacting with drivers and pedestrians, medical robots used in collaboration with doctors, or service robots interacting with their users in smart homes. A key aspect of safe and seamless interaction between autonomous systems and humans is the ways robots such as autonomous cars can influence humans’ actions in one-on-one or group settings. This is usually overlooked by the autonomous driving industry, where the common assumption is that humans act as external disturbances like moving obstacles, or that automation can always help societies without actually considering how humans may be impacted. Humans are not simply a disturbance to be avoided, and they do not always easily adapt to the proliferation of automation in their lives. Humans are intelligent agents with approximately rational strategies who can be influenced and act in novel ways when interacting with other autonomous and intelligent agents. In this paper I discuss a unifying framework for influencing interactions in autonomous driving—actions of autonomous vehicles (AVs) that can positively influence human-driven vehicles in large-scale or vehicle-to-vehicle (V2V) interactions. Influencing such interactions can be a significant contributor to the safe and reliable integration of autonomous vehicles. Influencing Interactions at Vehicle Level We have designed a novel framework for understanding the interaction between autonomous and human-driven vehicles. We model this interaction as a dynamical system, where the state of the environment evolves based on the actions of the two vehicles at each time step: xt+1 = f(xt, utA, utH) Here, xt denotes the state of the environment computed based on the sensor values at each time step including the coordinates, velocity, and heading of each vehicle in the interaction, and the road and lane boundaries. The set of actions of each vehicle utA for the autonomous car and utH for the human-driven car includes steering angle and acceleration. Our key insight is that the actions of autonomous cars can influence the behavior of human-driven cars on the same road. This can be seen when, for example, a car tries to change lanes: it starts nudging into the destination lane, influencing the cars in that lane to slow down. Similarly, the actions of an autonomous car can result in a human driver changing lanes, slowing down, or speeding up. Our approach to planning for AV influencing interactions has a few fundamental components. We developed imitation learning techniques to build predictive models of human driving behavior, and designed interaction-aware controllers that model the interaction between a human and a robot as a two-player leader-follower game. Leveraging optimization-based and game theoretic techniques, our work produces robot policies that influence human behavior toward safer outcomes in V2V interaction with autonomous cars (Sadigh et al. 2016a,b, 2018). Human Driver Models Imitation learning attempts to learn models of humans by imitating a human expert’s demonstrations to enable robots to act in similar ways. Here, we leverage similar techniques, modeling each human driver as an agent who approximately optimizes her own objective, referred to as a reward function (e.g., a driver’s preferences about avoiding collisions or keeping distance from road boundaries): u*H = arg max RH (x, uH, uR). uH We assume that RH (x, uH, uR) = w . ϕ (x, uH, uR) represents the human’s underlying reward function and that this reward function is a linear combination of a set of hand-coded features ϕ (x, uH, uR). These features in the setting of driving can include distances between the AV and the edges of the road, lane boundaries, or other cars, including their velocity and direction. We collect training data in a driving simulator and use them in the form of demonstrations or preferences to learn the parameters w of the reward function using techniques such as maximum entropy inverse reinforcement learning or active preference-based learning of reward functions (Basu et al. 2019; Bıyık and Sadigh 2018; Palan et al. 2019; Sadigh et al. 2016a,b, 2017, 2018). Planning for Interaction-Aware Controllers Once we have a predictive human driving model, we can plan for autonomous cars that better interact with humans by being “mindful” of how their actions influence humans. We consider a setting where the autonomous car optimizes for its own reward function: u*R = arg max RR (x, uR, u*H). uR Here, the robot’s reward function directly depends on and influences u*H, the learned and predicted human behavior (called human policy). In game theory, this interaction modeling results in a two-player game between a human-driven and an autonomous car. The actions of the autonomous car influence those of the human-driven car, and vice -versa. To efficiently solve this interaction game and plan for autonomous vehicles, we approximately solve the game as a Stackelberg (leader-follower) game. Our work results in influencing actions by the autonomous vehicle that are more assertive, more efficient, and in many settings safer. Some of these trajectories are shown in figure 1. Our user studies suggest that autonomous cars that are programmed to be aware of their interactions with humans can achieve tasks such as lane changing or coordinating at intersections safely and efficiently (Sadigh et al. 2016a,b, 2018). Figure 1 Influencing Interactions at the Global Level Influencing interactions at the vehicle level can be observed in many driving settings—such as changing lanes, merging, or exiting from a highway—and has substantial effects on the larger traffic system (Bıyık et al. 2018, 2019; Fisac et al. 2019; Lazar et al. 2018; -Stefansson et al. 2019). For instance, the presence of a large number of autonomous vehicles on roads can influence the state of traffic—such as congestion, delay, or flow—and hence human drivers’ routing choices. We now discuss the challenges arising in mixed-autonomy traffic settings where a large number of autonomous and human-driven vehicles interact. Equilibria in Mixed-Autonomy Traffic: Altruistic Autonomy Traffic congestion has large economic and social costs. The introduction of autonomous vehicles may reduce congestion both by increasing network throughput and by enabling a social planner to incentivize AV users to take longer routes that can alleviate congestion on more direct roads. To formalize the effects of altruistic autonomy on roads shared by human drivers and autonomous vehicles we developed a model of road congestion based on a fundamental diagram of traffic (showing the relation between traffic flux [vehicles/hr] and traffic density [vehicles/km]). We considered a network of parallel roads and created algorithms that compute optimal equilibria that are robust to additional unforeseen demand. Our results show that even with arbitrarily small altruism, total latency can be unboundedly better than without altruism, and that the best selfish equilibrium can be similarly better than the worst selfish equilibrium. We validate our theoretical results through microscopic traffic simulations and show average latency decrease of a factor of 4 from worst-case selfish equilibrium to the optimal equilibrium when autonomous vehicles are altruistic (Bıyık et al. 2018). Humans’ Routing Choice Models When users of a road network choose their routes selfishly, the resulting traffic configuration may become very inefficient. Because of this, we consider how to influence human routing decisions so as to decrease congestion on these roads. We consider a network of parallel roads with two modes of transportation: (i) human drivers who will choose the quickest route available to them, and (ii) a ride hailing service that provides users with an array of AV ride options, each with different prices. We designed a pricing scheme for the autonomous vehicles such that when autonomous service users choose from their options and human drivers selfishly choose their routes, road use is optimized and transit delay minimized. To do so, we formalized a model of how autonomous service users make choices between routes with different prices versus delay values. We developed a preference-based algorithm (similar to our work in learning reward functions discussed above) to learn users’ preferences and used a vehicle flow model related to the fundamental diagram of traffic. Based on these, we formulated a planning optimization to support the objective of reduced congestion and demonstrate the benefit of the proposed routing and learning scheme (Bıyık et al. 2019). Dynamic Routing in Mixed-Autonomy Traffic We are developing a social planner by studying a dynamic routing game in which the route choices of autonomous vehicles can be controlled and the human drivers react selfishly and dynamically to the AV actions. As the problem is prohibitively large, we use deep reinforcement learning to develop a policy for controlling the autonomous vehicles. This policy influences human drivers to route themselves in such a way that minimizes congestion on the network (figure 2). Figure 2 To gauge the effectiveness of our learned policies, we established theoretical results characterizing equilibria on a network of parallel roads and empirically compared the learned policy results with best possible equilibria. We found that, in the absence of these policies, high demands and network perturbations result in large congestion, whereas using the policy greatly decreases travel times by minimizing congestion. Summary We have described our work in planning for influencing interactions in autonomous driving at two levels: (i) vehicle-to-vehicle interaction, in which an autonomous car influences human-driven cars for safer and more efficient driving behavior; and (ii) global-level interaction, in which a large number of autonomous and human-driven vehicles interact in the same traffic network. We design routing decisions for autonomous vehicles that influence humans’ routing choices in order to decrease the total delay of the traffic network for a more desirable societal objective. Autonomous systems are weaving their way into -daily life as robots and the Internet of Things move into homes and smart cities become a reality. Our long-term goal is to develop a theory for modeling and designing the effects of automation and robotics on human decision making, and this work is a first step toward developing efficient robotics algorithms that lead to safe and transparent autonomous systems as they interact with and influence humans and society. Acknowledgment The work discussed is a summary of technical work done in collaboration with Anca Dragan, Erdem Bıyık, Daniel Lazar, Ramtin Pedarsani, S. Shankar Sastry, and Sanjit Seshia. References Basu C, Bıyık E, He Z, Singhal M, Sadigh D. 2019. Active learning of reward dynamics from hierarchical queries. Proceedings, IEEE/RSJ Internatl Conf Intelligent Robots and Systems, Nov 4–8, Macau, China. Bıyık E, Sadigh D. 2018. Batch active preference-based learning of reward functions. Proceedings, 2nd Conf Robot Learning, Oct 29–31, Zürich. Bıyık E, Lazar DA, Pedarsani R, Sadigh D. 2018. Altruistic autonomy: Beating congestion on shared roads. Proceedings, 13th Internatl Workshop on Algorithmic Foundations of Robotics, Dec 9–11, Mérida. Bıyık E, Lazar DA, Sadigh D, Pedarsani R. 2019. The green choice: Learning and influencing human decisions on shared roads. arXiv:1904.02209v2. Fisac JF, Bronstein E, Stefansson E, Sadigh D, Sastry SS, Dragan AD. 2019. Hierarchical game-theoretic planning for autonomous vehicles. Internatl Conf Robotics and Automation, May 20–24, Montréal. Lazar DA, Chandrasekher K, Pedarsani R, Sadigh D. 2018. Maximizing road capacity using cars that influence people. Proceedings, 57th IEEE Conf Decision and Control, Dec 17–19, Miami Beach. Palan M, Landolfi NC, Shevchuk G, Sadigh D. 2019. Learning reward functions by integrating human demonstrations and preferences. Proceedings, Robotics: Science and -Systems, Jun 12–17, Corvallis. Sadigh D, Sastry S, Seshia SA, Dragan AD. 2016a. Planning for autonomous cars that leverage effects on human actions. Proceedings, Robotics: Science and Systems, Jun 18–22, Ann Arbor. Sadigh D, Sastry SS, Seshia SA, Dragan A. 2016b. Information gathering actions over human internal state. Proceedings, IEEE/RSJ Internatl Conf Intelligent Robots and Systems, Oct 9–14, Daejeon, Korea. Sadigh D, Dragan AD, Sastry S, Seshia SA. 2017. Active preference-based learning of reward functions. Proceedings, Robotics: Science and Systems, Jul 12–16, Cambridge MA. Sadigh D, Landolfi N, Sastry SS, Seshia SA, Dragan AD. 2018. Planning for cars that coordinate with people: Leveraging effects on human actions for planning and active information gathering over human internal state. Autonomous Robots 42:1405–26. Stefansson E, Fisac JF, Sadigh D, Sastry SS, Johansson KH. 2019. Human-robot interaction for truck platooning using hierarchical dynamic games. European Control Conf, Jun 25–28, Naples.  Imitation learning is a set of algorithms that involves training a robot policy to make decisions based on a collected set of expert demonstrations.  This can be done through pricing schemes or latency management. If, for example, there are two highways to the same destination and one is shorter than the other, drivers will likely select the shorter one, increasing traffic on that route. If a few autonomous cars choose the longer highway, their latency will be lower than that of the congested route—and will also help the latency of the shorter road. About the Author:Dorsa Sadigh is an assistant professor in the Computer Science and Electrical Engineering Departments at Stanford University.