Click here to login if you're an NAE Member
Recover Your Account Information
Author: James B. Bassingthwaighte
Engineering tools and techniques can be used to advance health care.
In an overview of the state of engineering in the new millennium, Wm. A. Wulf (2000), president of the NAE, introduced the concept of "macroethical" behavior, that is, behavior that increases the intellectual pressure "to do the right thing" for the long-term improvement of society. Examples abound: the development of maintainable energy resources, the preservation of a healthy environment, the avoidance of ecological disasters, and universal education. The macroethical issue addressed in this article focuses on using engineering to advance health care, minimizing risks and maximizing benefits. I believe we have a duty to "think as hard as we can" (in Wulf’s words) to plan for the future by using engineering strategies wisely and responsibly.
Current risk/benefit analyses of potential new drug therapies are not sufficient to the task. In general, current analyses are based on the inference that a drug acts on a single protein, usually an enzyme or a transporter; efficacy and side effects are determined later by observation. We need a great leap forward that will enable us to make "knowledgeable" calculations of risks and benefits. We need information on which decisions can be based. In the United States, new technologies, such as gene-insertion, stem-cell infusion, and new pharmaceuticals are within the purview of regulatory agencies, such as the Food and Drug Administration, rather than scientific funding agencies. The mission of regulatory agencies is to protect us against speculative or risky advances. These agencies depend heavily on large, expensive clinical trials in which some human subjects are put at risk in the interest of protecting others. For novel interventions, which offer great possibilities but little evidence for predicting success and a high risk of failure, harm, or damage, we must find another way to move ahead, but with minimal risk. In other words, we must find ways that enable us to follow our intuition and insight by maximizing our ability to predict risks and benefits.
Informatics and Information Flow
The problem in medicine and biology is that much relevant information is either irretrievable or undiscovered. Even a complete human genome cannot define human function. In fact, it is only a guide to the possible ingredients. The genetically derived aspects of the genome (i.e., proteins) are much more numerous than the genes. To get an idea of the magnitude of the problem, consider that yeast has about three proteins per gene, and humans have about 10 proteins per gene. Pretranslational selection from different parts of the DNA sequence, the post-translational slicing out of parts of the protein, the splicing of two or more proteins together, and the combining of groups of proteins into functional, assembly-line-like complexes, all contribute to the variety of the products of gene expression. Even a completely identified proteome, which is still beyond the scientific horizon, will be like a list of the types of parts of a jumbo jet with no indication of how many should be used or where they go. The concentration of proteins, the balance between synthesis and decay in each cell type, is governed by environment, by behavior, and by the dynamic relationships among proteins, substrates, ionic composition, energy balance, and so on and, thus, cannot be predicted on the basis of the genome.
The identity and descriptions of proteins, their locations, and concentrations in various cell types under various conditions and the kinetics of the reactions in which each is involved would fill a huge database. The protein data banks (e.g., PDB, 2002; Swissprot, 2002) are giant stepping stones that provide amino acid sequences and many protein structures but little data about function. Newer enzyme databases (e.g., WIT, 2002) are oriented toward providing kinetic or functional information, but they cover single-cell species better than mammals (Reich and Sel’kov, 1981).
The Combinatorial Dilemma
Sorting out the genome will leave us with a huge number of proteins to think about. The estimates of the number of genes have come down by about half from earlier estimates of 60,000 to 100,000; because new ones are also being found, 50,000 is a reasonable estimate. The level of complexity in mammalian protein expression far exceeds that of C. elegans, which has 19,536 genes and 952 cells. Humans might only have two or three times as many genes, but probably have a much higher ratio of proteins per gene. Assuming 10 proteins per gene, we have on the order of a half million proteins in widely varied abundance, and each protein has several possible states. If a protein in a given state interacts with only five other proteins (e.g., exchanging substrates with neighbors in a pathway or modifying the kinetics of others in a signaling sequence), then it may "connect" to any other protein through only a few links, a kind of "six degrees of separation" from any other protein. Moreover, cells contain not just proteins, but also substrates and metabolites, and they are influenced by their environments. Given the possible permutations and combinations of linkages and the many multiples further in the dynamics of their interactions, the combinatorial explosion would appear to preclude predictions.
The complexity I’ve briefly described provides a basis for functionality that cannot be predicted from knowledge about each of the components. "Emergent" behavior is the result of interactions among proteins, subcellular systems, and aggregates of cells, tissues, organs, and systems within an organism. Physiological systems are highly nonlinear, higher order systems, and dynamics are often chaotic (Bassingthwaighte et al., 1994; Goldbeter, 1996). Chaotic systems are only predictable over the short term; but they have a limited operating range. Even when Bernard (1865, 1927) defined the stability of the "milieu interieure," he meant a mildly fluctuating state rather than a stagnant "homeostasis." Biological systems are "homeodynamic"; they fluctuate, but under control, and they are neither "static" nor "randomly varying."
By "complexity" we imply that, even if we knew all of the proteins and all of the rate constants for their reactions, we could not predict the long-range outcome of an intervention that targeted only one protein. However, behavior can be predicted for the short term. Side effects show up later. Because proteins are building blocks (or nodes) on pathways and because each protein reacts with substrates or other proteins of a limited variety, there are road maps of reactions, the charts of biochemical reactions. The known stoichiometry of reactions (the numbers of moieties combining to form products) is the basis of flux/balance analysis for estimating product formation in bacterial cultures (Schilling et al., 2000). However, because evidence on thermodynamics is generally lacking, we cannot limit the predicted range of responses (Beard et al., 2002). Our ability to predict responses is handicapped by missing reactions, unidentified proteins, and the paucity of information about reaction rates (or traffic flow), controllers of the enzymes or transporters, and how proteins are arrayed spatially. Molecular biologists are learning how gene expression is regulated to adapt over long periods of time. But "thinking hard" will require much more data, at the single protein level and at a succession of higher levels.
This vast array of information must be linked into a consistent whole. The databases must be well curated and easily accessible, and they must provide a substrate for behaviorally realistic models of physiological systems. The arguments for building large databases to capture biological data are fairly new (Dao et al., 2000; Weissig and Bourne, 1999). Federal funds support genomic and proteomic databases, but not databases of higher level physiological information. Organ and systems data acquired over the past century have not been collected in databases and are poorly indexed in the print literature. Providing searchable texts of articles online will help but will not be a substitute for organized databases. The Visible Human Project, the National Library of Medicine’s effort to preserve anatomic information (NLM, 2002) is a part of the morphome (which we define as providing anatomic and morphometric information), analogous to the genome and the proteome.
The genome, proteome, and morphome all concern structure, which is necessary but not sufficient for explaining function (the physiome). We need to know about the dynamics, kinetics, and functioning of those structures and how they interact. Physiology and pathophysiology concern processes, not fixed states. We need more than statistical descriptions of associations among physiological variables; we need models that include mechanisms and distinguish mere association from cause and effect.
All of this information must then be captured in a comprehensive, consistent, conceptual framework, that is, a model of the system that conveys understanding, and for this we will need to use engineering approaches. Understanding complicated systems, modeling them, and learning the tricks for reducing their complexity to attain computability, are in the engineering domain, and bioengineering-trained investigators will be the integrators of the future.
Of course, all models are incomplete. They come in a variety of forms, such as sketches of concepts, diagrams of relationships, schemas of interactions, mathematical models defined by sets of equations, and computational models (from analytical mathematical solutions or from numerical solutions to differential or algebraic equations). The behavior of a well developed, well documented computer model can give us some insight into the behavior of the real system.
The Macroethical Imperative
Although we cannot predict the outcomes of drug therapy with certainty, we must go ahead. Despite the risk, designers of pharmaceuticals to alleviate AIDS or Alzheimer’s disease, developers of stem cells modified to cure diabetes, and producers of materials for the prolonged, controlled release of drugs all have an obligation to move forward into the unknown. Every new bit of information reveals our ignorance of other information, and the maze of possibilities is impossible to fathom with the unaided mind. Computational tools for large-scale models are being developed and are anxiously awaited by biologists. Computers, even big, multi-CPU parallel machines, are still too slow to be much good as "mind expanders." We need computers that can answer our "what ifs" in the time it takes us to think of the next question. Only then will we be able to critique efficiently the behavior of the models.
We must do our utmost to predict well, not just the direct results of a proposed intervention, but also the secondary and long-term effects. Thus, databasing, the development, archiving, and dissemination of simple and complex systems models, and the evaluation (and rejection or improvement) of data and of models - are all part of the moral imperative. They are the tools necessary to thinking in depth about the problems that accompany, or are created by, interventions in human systems or ecosystems.
The Physiome and the Physiome Project
A physiome can be defined as the quantitative description of the functional state of an organism. A quantitative model is a way of removing contradictions among observations and concepts and creating a consistent, reproducible representation of a system. Like the genome, the physiome can be defined for each species and for each individual within the species. The composite and integrated system behavior of the living organism is described quantitatively in hierarchical sets of mathematical models defining the behavior of the system. The models will be linked to databases of information from a multitude of studies. Without data, there is nothing to model; and without models, there is no source of deep predictive understanding.
The Physiome Project provides one response to the macroethical imperative to minimize risk while advancing medical science and therapy (Bassingthwaighte, 1995; Bassingthwaighte et al., 1991). The project is an effort to define the physiome, through databasing and modeling, of individual species, from bacteria to man. The project began with collaborations among groups of scientists in a few fields and is developing spontaneously as a multinational collaborative effort (Bassingthwaighte, 2000). Investigators first defined goals and then proceeded to put pieces together into impressive edifices (e.g., Hunter and Smaill, 1988; McCulloch et al., 1998; Noble and Rudy, 2001; Popel et al., 1999; Rudy, 2001; Winslow et al., 2000). Via iteration with new experimentation, models can remove contradictions and demonstrate emergent properties. These models are part of the tool kit for the "reverse engineering" of biology. The scale of the models, like the scale of models for weather prediction, presents computational grand challenges.
The Physiome Project is not likely to result in a virtual human being as a single computational entity. Instead, small models linked together will form large integrative systems for analyzing data. There is a growing appreciation of the importance, indeed the necessity, of modeling for analysis and for prediction in biological systems as much as in physical and chemical systems.
The hierarchical nature of biological systems is being used as a guide to the development of hierarchies of models. Models at the molecular level can be based on biophysics, chemistry, energetics, and molecular dynamics, but it is obviously not practical to use molecular dynamics in describing the fluxes through sets of biochemical pathways, just as it is not practical to use the full set of biochemical reactions when describing force-velocity relationships in muscle, or to use the details of myofilament crossbridge reactions when describing limb movement and athletic performance. One cannot build a truck out of quarks.
Biological models can be defined at many hierarchical levels from gene to protein to cell to organ to intact organism. Practical models comprised of sets of linked component models, each somewhat simplified, represent one level of the hierarchy. The strategy is to avoid computing the details of underlying events and to capture, at the higher level, the essence of their dynamic behavior. But monohierarchical models are not necessarily built to adapt to changes in conditions. Handling transients is like using adjustable time steps in systems of stiff equations, but more complicated; the lower level model must be used to correct the higher level representation. Once we have very good models that extend from gene regulation to the functions of the organism, they can be used to predict the short-term and long-term efficacy and side effects of various therapies.
Like genomic information, biological information and models should be put in the public domain. Parochial attitudes and financial interests should not be allowed to interfere with open access to knowledge and the scientific developments dependent on that knowledge. Obviously, success will require collaborative efforts; no individual investigator or group can make this happen.
The Physiome Project has its own risks and benefits. It is expensive to build, maintain, and revise immense databases. Improving the quality of information and learning how to provide good databases will be more difficult than sequencing the genome. One obvious benefit will be better therapies, by reducing catastrophic errors (e.g., predicting the effects of thalidomide on embryonic development) and by reducing the costs of bringing new drugs to market. This will take several years. As a society, we have decided that it is unethical to take large risks with human subjects; and we know that offsetting these risks will require national and international investment. Therefore, undertaking large-scale systems bioengineering, such as the Physiome Project, is a macroethical imperative.
This work was supported by the National Simulation Resource for Circulatory Transport and Exchange (Grant RR1243 from NIH, the National Center for Research Resources). See http://nsr.bioeng.washington.edu, to download simulation systems (XSIM and JSIM) and transport models.
Bassingthwaighte, J.B. 1995. Toward modeling the human physionome. Pp. 331-339 in Molecular and Subcellular Cardiology: Effects on Structure and Function, S. Sideman and R. Beyar, eds. New York: Plenum Press. Vol. 382 in Advanced Experiments in Medical Biology.
Bassingthwaighte, J.B. 2000. Strategies for the Physiome Project. Annals of Biomedical Engineering 28: 1043-1058.
Bassingthwaighte, J.B., R. Friesner, B. Honig, C.F. Starmer, and V.Z. Marmarelis. 1991. Modeling and simulation. Pp. 1-15 in NIH/NCRR Workshop on Technologies for the Future: Biomedical Computing for Visualization, Modeling and Decision Support. Bethesda, Md.: National Institutes of Health.
Bassingthwaighte, J.B., L.S. Liebovitch, and B.J. West. 1994. Fractal Physiology. New York: Oxford University Press.
Beard, D.A., S. Liang, and H. Qian. 2002. Energy balance for analysis of complex metabolic networks. Biophysical Journal 83: 79-86.
Bernard, C. 1865. Introduction ? l’?tude de la m?decine exp?rimentale, par M. Claude Bernard. Paris: J.B. Bailli?re et fils. (see also the translation: Bernard, C. 1927. An Introduction to the Study of Experimental Medicine. New York: Macmillan Company.)
Dao, N., P.J. McCormick, and C.F. Dewey, Jr. 2000. The human physiome as an information environment. Annals of Biomedical Engineering 28: 1032-1042.
Goldbeter, A. 1996. Biochemical Oscillations and Cellular Rhythms. Cambridge, U.K.: Cambridge University Press.
Hunter, P.J., and B.H. Smaill. 1988. The analysis of cardiac function: a continuum approach. Progress in Biophysics and Molecular Biology 52: 101-164. Available online at: http://www.bioeng.auckland.ac.nz/physiome/physiome.php.
McCulloch, A.D., J.B. Bassingthwaighte, P. Hunter, and D. Noble, editors. 1998. Computational biology of the heart: from structure to function. Progress in Biophysics and Molecular Biology 69(2-3): 153-155. Available online at: http://cardiome.ucsd.edu.
NLM (National Library of Medicine). 2002. The Visible Human.
Noble, D., and Y. Rudy. 2001. Models of cardiac ventricular action potentials: iterative interaction between experiment and simulation. Philosophical Transactions of the Royal Society of London 359A(1783): 1127-1142. Available online at: http://noble.physiol.ox.ac.uk/People/DNoble/.
PDB (Protein Data Bank). 2002. Available online at: http://www.rcsb.org/pdb/.
Popel, A.S., A.R. Pries, and D.W. Slaaf. 1999. Microcirculation physiome project. Journal of Vascular Research 36: 253-255.
Reich, J.G., and E.E. Sel’kov. 1981. Energy Metabolism of the Cell: A Theoretical Treatise. London: Academic Press.
Rudy, Y. 2001. The cardiac ventricular action potential. Pp. 531-547 in Handbook of Physiology, Section 2: The Cardiovascular System. Vol. 1: The Heart, edited by E. Page, H.A. Fozzard, and R.J. Solaro. New York: Oxford University Press. Available online at: http://www.cwru.edu/med/CBRTC/rudy.htm.
Schilling, C.H., D. Letscher, and B.O. Palsson. 2000. Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. Journal of Theoretical Biology 203: 229-248.
Swissprot. 2002. Available online at: http://www.expasy.ch/sprot/.
Weissig, H., and P.E. Bourne. 1999. An analysis of the protein data bank in search of temporal and global trends. Bioinformatics 15: 807-831.
Winslow, R.L., D.F. Scollan, A. Holmes, C.K. Yung, J. Zhang, and M.S. Jafri. 2000. Electrophysical modeling of cardiac ventricular function: from cell to organ. Annual Reviews in Biomedical Engineering 2: 119-155.
Wulf, W.A. 2000. Great achievements and grand challenges. The Bridge 30(3-4): 5-10.