In This Issue
Fall Issue of The Bridge on Agriculture and Information Technology
September 15, 2011 Volume 41 Issue 3

How Smart-IT Systems are Revolutionizing Agriculture

Monday, September 12, 2011

Author: Jason K. Bull, Andrew W. Davis, and Paul W. Skroch

Rapid advances in biotechnology, breeding, and agronomics require equally sophisticated information systems.

Corn production in the United States has doubled in the last 40 years (Figure 1), primarily as a result of improved cultivars that exploit agronomic advances in soil management and increased planting density (Hallauer et al., 1988). Monsanto, together with other key industry partners, is committed to helping farmers double the yields of corn, soy, and cotton again by 2030 through a mix of advances in biotechnology traits, breeding, and agronomics.1

Figure 1

This commitment is based on the success of the current generation of genetically improved crops that have changed the face of agriculture by increasing yield, reducing yield variability, reducing the use of pesticide and insecticide, and raising the profit per acre for farmers (Park et al., 2011). Approximately 95 percent of soybeans and 75 percent of corn grown in the United States have been improved through biotechnology. More than 95 percent of soy beans in Argentina and 50 percent in Brazil have also been improved.2

When given a choice, farmers have consistently adopted biotechnology-improved products because of the advantages listed above. These products, the result of sophisticated technological breakthroughs in genomics, breeding, and agronomy, have changed agricultural practices and expectations. For example, insect-resistant crops have reduced the use of pesticides globally compared with conventional crops (Carpenter, 2010).

Agricultural companies continue to develop seed products with a variety of traits and characteristics adapted to specific environmental regions. The next generation of products promises further improvements in yield and stress traits (e.g., improved drought tolerance), better use of nitrogen fertilizer by the plant, and broader spectrum insect protection. Breakthrough advances in breeding methodologies (e.g., molecular markers and sequencing) promise higher yielding cultivars, and targeting the resultant seed products to optimal management practices promises additional yield gains (Figure 2).

Figure 2
 
Rapid advances in these complex scientific areas require equally sophisticated information systems. In this article we discuss the changing role of information technology (IT) systems in the evolution of research on seeds and traits as well as on the future of agriculture. We also discuss how systems are evolving to accelerate decisions and opportunities for cross-industry partnerships to develop the next generation of “smart IT” systems in agriculture.

Information, the Heart of Modern Agriculture

Information is essential to modern agriculture, from the creation of new hybrid and varietal products to the placement of products in the correct management zones to capturing value at harvest time. Choosing the correct products for a given farm field requires accurate information about traits in the germplasm, as well as the overall performance of those products. As the combinations of biotechnology traits in products become more complex, companies must provide more information to farmers about traits, product performance, and product fit to agronomic practices.

Recommending the right product for the right management environment or zone involves analyzing many variables (e.g., soil types, drainage patterns, yield in past years, insect pressure, disease pressure). Such analyses require that companies collect information about product performance at refined environmental scales and translate this information into recommendations for farmers. Increasingly, the product is information itself as much as it is seeds and traits.

The development of new products has become a race to integrate and leverage the vast amount of information produced by industrial research and development (R&D) pipelines and industry partners. The increasing complexity of information required for product creation, placement, and value capture requires sophisticated IT systems engineered to handle the scale of information and to interrogate that information to support intelligent decisions.

Success in the marketplace is directly linked to the efficiency and accuracy of translating research information into decisions about products. Thus, IT systems merit significant investment and are considered a key factor in a company’s competitive advantage.

Pipelines for Biotechnology and Breeding

Demand for biotechnology traits and locally adapted germplasm has led to the development of agricultural R&D pipelines similar in overall form to those used in the pharmaceutical industry (Figure 3). Product advancement decisions are made in distinct phases. In each phase, candidate products are culled, leaving only the most promising to move on to the next phase. Testing in subsequent phases is increasingly stringent to ensure that the future commercialized product will meet the needs of customers.

Figure 3

Information is generated and consumed by thousands of scientists in high-throughput product development and assessment pipelines. Monsanto has two primary pipelines, one for biotechnology and one for breeding (Figure 4). The biotechnology pipeline identifies new and novel genes for traits, such as insect resistance, herbicide tolerance, and drought tolerance. The breeding pipeline creates germplasms with advantageous native genes that are locally adapted to specific geographic regions across the globe.

Figure 4

Laboratory and field work, each of which contributes to screening and the understanding of novel candidate genes and germplasm, are integrated in both pipelines. Information collection is industrial in scale and often highly automated.

In the biotechnology pipeline, for example, genomics and molecular biology tools are used to select and characterize thousands of genes every year. The pipeline is organized into a series of steps, including gene nomination, sequencing, cloning, transformation, and greenhouse and field testing. Each step involves integrated laboratory or field processes that together provide a comprehensive view of gene function and quality.

Marker-assisted breeding, a technique in which DNA markers are used to identify individuals with favorable gene combinations, is used in the breeding pipeline. Individual seeds are then “chipped,” and the shavings are used to analyze the DNA genotype of each seed (Figure 5). Millions of samples are processed and billions of data points are generated every year.

Figure 5

The chipping, geno-typing, and planting of predicted best selections is extremely time sensitive and highly automated. Monsanto runs the largest field testing network in the world. Seeds are planted in millions of field plots every year in thousands of locations around the globe, and information about multiple characteristics of each resulting plant is collected (e.g., yield, moisture content, plant height, disease reactions). Evaluations are conducted in differing geographies many times per year.

The logistical operation of this network involves management of seed inventory, distribution, and experimental design for every plot planted, harvested, and analyzed. Millions of data points are analyzed for the North American harvest alone.

Finally, the products advanced through both the biotechnology and breeding pipelines are integrated (again using genetic crosses, field plot evaluations, and molecular markers) to create superior products with the new genes of interest in the best locally adapted germplasm. Creating novel biotechnology products and assessing product performance on this massive scale can only be done with advanced IT systems.

The IT Pipeline

It can take 10 to 15 years to discover and commercialize new genetically enhanced products. The process requires standardized procedures and regulated testing to verify trait expression and farm value. To orchestrate this multistep, multistage, multiyear, interdependent evaluation process, IT systems must model the pipeline so candidate products can be tracked and capacity optimized, results analyzed and interpreted, and compliance ensured throughout. In other words, it takes an IT pipeline to manage an R&D pipeline.

For example, managing a global-scale plant biotechnology pipeline requires an IT workflow system that connects laboratory assessment points (gene nomination, gene construction, transformation) seamlessly to track detailed information to understand and evaluate genes for various traits. The system brings transparency to the pipeline, enables decisions about gene advancements, and eliminates laborious manual tracking.

Science at the Speed of IT

An organization with the ability to quickly leverage new scientific breakthroughs on a massive scale has a substantial competitive advantage, and IT systems are a core component in the transition from breakthrough science to industrial-scale implementation. For example, breakthroughs more than a decade ago in molecular marker technology, which enabled accurate “tagging” of genes of interest, led to a massive change in the incorporation of novel genes into adapted germplasm. That change could only be made thanks to wholesale innovations in IT systems, which established high-throughput, high-capacity DNA genotyping laboratories, optimized workflows that automatically select plants with the genes of interest in the best backgrounds, and integrated these workflows into the context of an already massive, global breeding pipeline.

Like most companies, Monsanto’s R&D pipelines rely on a mix of third-party and highly customized proprietary IT systems that integrate information across workflow steps in each pipeline. These systems make it possible to eliminate many manual steps and replace them with repeatable automation. Having IT systems do much of the work makes it possible for science to be automated and innovations to be leveraged on a massive scale at “the speed of an integrated circuit.”

Building such systems at the desired speed and scale requires creating a new type of IT organization that melds scientific expertise in breeding and biotechnology with computer and industrial engineering. Thus, the IT organization is comprised of expert scientists in addition to computer engineers—creating a powerful combination of specialized skills in biotechnology, breeding, genomics, bioinformatics, and computer science in a single organization.

This new model for IT enables a rapid transition from emerging R&D needs and breakthroughs to engineered software solutions that industrialize the application of innovations by standardizing workflow steps, automating information production and analysis, and using standardized algorithms to recommend the best seed placement decisions possible.

In the DNA molecular-marker genotyping example, results from genotyping assays must be scored for each individual seed. With millions of samples, this task cannot be done cost effectively by manual inspection of the scores. Automating the process eliminates a workflow bottleneck and a source of human error.

Another example in the breeding pipeline is the analysis of field-trial information during harvest, when many thousands of designed experiments must be statistically analyzed to determine which candidate products are the best performers in terms of traits of interest, geographies of interest, and years of evaluation. Because every harvest takes a period of weeks and advancement decisions are time sensitive, analyses must be repeated many times as more information is collected from the field. The data are automatically quality checked and analyzed within hours of harvesting as new information becomes available.

Smart R&D Pipelines Rely on Smart IT Systems

To accelerate the creation of the next generation of seed products, the next generation of IT systems must do more than enable decisions and manage complex workflows. These smart IT systems must embody the processes they support, transform scientific intuition into repeatable decisions, and enrich and accelerate resource-constrained pipelines. From the perspective of an R&D pipeline, the focus must change from generating and managing information per se to explicitly defining the decisions made at key points in the pipeline.

The creation of smart IT systems requires a paradigm shift in the role of IT systems in the generation of advanced seed products. Traditionally, IT systems have been considered tools for collecting and managing information used by individuals to make decisions. This mindset focused more on the aggregation of information than on how that information could be used to make optimal decisions and how systems themselves could learn from those decisions in a loop of continuous improvement (Figure 6).

As the volume and variety of information to be aggregated has increased exponentially, decision making based on information aggregation followed by manual decisions by experts has begun to exceed human capacity. For example, selecting a site to conduct a regulated biotechnology field trial seems simple enough until one considers the variety of questions that must be answered to proceed. Do I have the required regulatory permits for the trial? What was grown the previous year in the same field that I should monitor? Are other biotechnology trials nearby? What is the likelihood of the stress of interest occurring (e.g., drought)?

These are just a few of the most straightforward questions that must be answered for any one trial or trait. Now consider testing thousands of genes in multiple backgrounds in thousands of locations each year, globally, in an environment in which product delays can cost hundreds of millions of dollars in lost opportunity.

Designing smart IT systems for decision management has at least three characteristics: capturing the rules and criteria for making a decision; capturing the context in which decisions are made; and capturing the outcome of those decisions so the process can be improved upon. (Figure 6).

Figure 6

Defining and Capturing Decisions

The first step in developing a “smart system” is to expose and capture the “decision” itself. The key is to identify decisions in the pipeline that are repeatable and that can be understood as a system of rules that can be applied to similar information repeatedly (Taylor and Raden, 2007). Exposing these rules and criteria requires detailed knowledge about the purpose of experiments and their role in the R&D pipeline.

Capturing Decision Context

A smart IT system must do more than enumerate and apply a list of rules that lead to a decision. It must also capture the context—a snapshot of temporally relevant information—in which the decision is made. Capturing the context is important for learning which information is relevant and how it can be optimally weighted.

Historical context can be provided by information warehouses that store not only what decisions were made but also the exact information that was used to make them at the time they were made. In this way, a record can be built of specific information used for a decision, which, over time, can be leveraged to reconstruct the decision process. Over time, a system “memory” is created that can be used to improve and refine future iterations of that decision.

Capturing Decision Outcomes

A smart IT system must also be able to evaluate and “learn” from the outcome of a decision. What was the impact of the decision? Did the products that advanced at one stage continue to advance? Did this candidate product actually perform better, as predicted, when a certain agronomic practice was used?

Humans learn, at least in part, from the consequences of actions, and a smart IT system must learn in a similar way. By capturing outcomes, the decision model can be benchmarked and improved upon, thus transforming expert decisions into corporate assets that can be learned from, measured, modeled, and subsequently improved upon again by the next generation of scientists.

Partnerships across Industry

Supporting the generation of information from high-throughput pipelines and integrating that information to enable smart IT systems are both massive tasks in terms of scope and scale. Take, for example, the scale of DNA sequencing, which can generate terabytes of information every day. In addition, projections call for exponential growth every 2 to 3 years (Kahn, 2011). Turning these data into information requires analyzing complex DNA sequences and combining that information with other information on gene function to identify new markers and genes.

Facing these challenges and others will require optimized infrastructure, including specialized computing platforms (Schadt et al., 2010), optimized data storage, and networks that support global operations. Designing and implementing these capabilities will be beyond the capacity of a single entity and will require partnerships among public and private institutions that specialize in each of these areas and have experience in applying industry-standard technologies at scale.

Building partnerships with IT providers that have demonstrated expertise in a given computing domain will enable new systems to leverage industry-standard solutions. Partnerships in highly technical areas (e.g., sequence assembly optimization) will also enable the development of sophisticated or custom solutions. Monsanto, for example, has partnered with sequencing equipment manufacturers, universities, specialized hardware manufacturers, and large IT companies to tackle the production and analysis of sequencing information to enable delivery of continually improving seed.

Information, the Next Agriculture Frontier

Agriculture is on the threshold of realizing an exciting and dynamic opportunity to apply and benefit from innovation in IT. The next generation of seed innovations, which will be necessary to double yields by 2030, will depend on how effectively the industry can collect, analyze, and use the explosion of new information to make decisions for the R&D pipeline and the farm. As products become more complex and their selection is driven by more detailed characteristics of local management zones, companies will have to provide more specific on-farm product information. In fact, this information will be as essential as the product itself (i.e., seeds and traits) to realizing maximal yields and economic benefit.

Realizing the goal of doubling agricultural yields by 2030 will require that product performance both improve and become more predictable as multiple traits are targeted to specific stresses and management zones. Optimization of agricultural inputs will become more feasible with the next generation of biotechnology and breeding traits, which will reduce the environmental impact of agriculture by reducing the use of herbicides, insecticides, fungicides, and fertilizers. Combined with advances in farm machinery and precision agriculture, farmers will be able to cultivate the same or more acres more profitably.

These advances mean that agriculture will also become increasingly technology driven, and this “technification” will depend on how we use information. Advances in the extraction and use of information from genomics and molecular breeding will create opportunities for identifying new genes for biotechnology traits. Advances in how we use information to define and use management zones will lead to optimization of product use and seed performance. Advances in information about weather and potential insect infestations will create opportunities to manage risk.

All phases of the agricultural pipeline will be transformed and will require innovations in IT systems to realize the full potential of the seed products under development and to achieve the goal of doubling yields by 2030. Success will be linked to the “intelligence” of smart IT systems, because time-to-market for new products will be linked, in turn, to how quickly and effectively information can be transformed into decisions. Realizing this vision will require new partnerships among seed companies, IT providers, and other key players in the agricultural industry to bring cutting-edge IT to the product development pipeline as well as to the farm.

References

Carpenter, J.E. 2010. Peer-reviewed surveys indicate positive impact of commercialized GM crops. Nature Biotechnology 28(4): 319–321.

Hallauer, A.R., W.A. Russell, and K.R. Lamkey. 1988. Corn Breeding. Pp. 463–554 in Corn and Corn Improvement, 3 ed., Agronomy Monograph 18, edited by G.F. Sprague and J.W. Dudley. Madison, Wisc.: American Society of AGRONOMY.

Kahn, S.D. 2011. On the future of genomic data. Science 331(6018): 728–729.

Park, J.R., I. McFarlane, R.H. Phipps, and G. Ceddia. 2011. The role of transgenic crops in sustainable development. Plant Biotechnology Journal 9: 2–21.

Schadt, E.E., M.D. Linderman, J. Sorenson, L. Lee, and G.P. Nolan. 2010. Computational solutions to large-scale data management and analysis. Nature Reviews Genetics 11: 647–657.

Taylor, J., and N. Raden. 2007. Smart (Enough) Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions. Boston, Mass.: Prentice Hall.

FOOTNOTES 

1 See http://www.monsanto.com/ourcommitments/Pages/sustainable- agriculture-producing-more.aspx.

2 See http://www.monsanto.com/newsviews/Pages/do-gm-crops- increase -yield.aspx.

 

About the Author:Jason K. Bull is head of Technology Pipeline Solutions, the R&D IT Division of Monsanto. Andrew W. Davis is a leader in Emerging Strategies, and Paul W. Skroch heads the Pipeline Advancement and Analytics Team, both in Technology Pipeline Solutions.