In This Issue
Summer Bridge on Critical Materials
June 15, 2024 Volume 54 Issue 2
The summer issue of The Bridge discusses leveraging new and emerging technologies, infrastructure, innovative approaches, and a resilient supply chain to ensure a stable and reliable supply of critical materials far into the future.

Unlocking Alternative Solutions for Critical Materials via Materials Informatics

Wednesday, June 12, 2024

Author: Rémi Dingreville, Nathaniel Trask, Brad Lee Boyce, and George Em Karniadakis

Materials informatics can have a profound impact on replacing critical materials to achieve sustainability goals.

Critical materials are materials that are essential for a broad range of ­modern technologies but subject to supply risks, and for which there are no easy substitutes. The list of materials that are considered critical depends on who, where, and when you ask. This ambiguity is due to several factors, including geo­political instability, resource depletion, and environmental concerns. In the US, lithium (Li) has become the poster child for criticality, owing to the rapid rise in electric vehicles and the dwindling domestic production. Other ­examples include beryllium (Be), an important material for solar photo­voltaics and ­electric-vehicle batteries, or neodymium (Nd) and dysprosium (Dy), because of their use in magnets (DOE 2023). A 2023 assessment by the US Department of Energy (Bauer et al. 2023) identified “the electric eighteen” critical materials, which even include materials that are viewed as common, such as copper (Cu) and silicon (Si). While their supply risk is modest, their ubiquity in the energy sector renders any disruption potentially devastating.

The quest for the discovery and manufacturing of new and innovative materials to replace critical ­materials remains as vital as ever. Future critical materials disruptions will likely need to be solved in a matter of years or even months, rather than the decade or more often ­quoted as the requisite timeframe to mature from ­materials discovery to commercialization. In addition to this need for agility, a broadly coordinated federal strategy across all industrial sectors must address economic viability, ease of production, domestic availability, and lifecycle environmental impact. Resistance to change within the materials industry, along with a lack of awareness about environmental impacts, can slow down this transition. Regulatory frameworks may not be conducive to promoting sustainability, and technical challenges in fabricating materials with comparable performance to their traditional counterparts can be daunting. Additionally, limited data availability, existing infrastructure geared towards conventional materials, and market uncertainties can all pose substantial roadblocks. Therefore, to meet economic, industrial, and technological needs, it is imperative to accelerate the discovery of alternatives to critical materials by developing new and disruptive ­methods to identify materials with the desired properties in a timely and responsive manner.

Researchers and engineers have traditionally used their expertise and intuition, in concert with ab initio and heuristic models, to guide the discovery of new ­materials. However, machine learning (ML) and artificial intelligence (AI) systems are now surpassing human intuition limits for complex tasks such as image recognition (Henaff 2020), materials design and discovery (Liu et al. 2017), or autonomous experiments (Bennett and ­Abolhasani 2023). These data-driven approaches can also compensate for predictive shortcomings in traditional models arising from assumptions, simplifications, and imperfect calibrations. As AI algorithms become more powerful and accessible, many materials scientists are increasingly embracing this emerging scientific domain to accelerate the discovery and development of new ­materials. ­Materials informatics—the ­amalgam of materials science, AI and ML, and advanced data ­analytics—holds one of the keys to addressing roadblocks to discovering alternative solutions to critical materials. The promise of ­materials informatics is that the discovery and manufacturing of materials solutions that will replace critical materials can be simultaneously and rapidly optimized by semi-autonomous systems, where the engineers do not have to envision all possible materials replacement solutions and then painstakingly (and expensively) test each solution with build-and-check methods. Instead, engineers can select appropriate algorithms, embed known physical laws and constraints, and assign design and materials objectives. Materials ­informatic ­approaches have already proven quite useful for certain materials problems such as broad and rapid searches across the periodic table (or more often, a ­rational subset) to achieve particular alloying effects (see, for example, Wagih and Schuh 2022), albeit such approaches may not be as obviously applicable for difficult-to-predict behaviors such as fatigue life, for instance.

The quest for the discovery and manufacturing of new and innovative materials to replace critical materials remains as vital as ever.

In today’s fast-paced economic and technological landscape, leveraging a collaborative approach that combines human intuition with ML-guided solutions to potential alternative materials presents a promising opportunity to overcome limitations in human cognition (Boyce et al. 2023). For example, over the past decade the rapid emergence of compositionally ­complex “high ­entropy” alloys (Miracle and Senkov 2017) has ­high­lighted the potential viability of trillions of un­explored alloy ­compositions—a regime of phase space that had previously been dis­counted as a wasteland of ­impractical ­intermetallic compounds. ­Trillions may be an ­understatement; by one authoritative account, there are 10177 alloy compositions waiting to be explored (­Cantor 2014), even after eliminating rare, toxic, or radioactive elements. And this represents merely the compositional options; alloys also get their useful properties from the details of their processing route. To remain competitive, researchers and engineers need to explore such vast combinatorial solution spaces quickly and efficiently. One specific challenge about critical materials is that these materials often have unique process-structure-property linkages that are difficult to replicate in other materials. For example, yttrium (Y) and scandium (Sc) are critical materials used as dopants in aluminum nitride (AlN)-based ­piezoelectric materials because they induce an atomic structure change that enhances the piezoelectric response by nearly 500% (Akiyama et al. 2009). However, Y and Sc are rare earth elements and expensive metals, and their extraction can have negative environmental impacts. Startt and colleagues (2023) used a traditional and intuitive approach to survey twenty-three elements to identify earth-abundant alternative dopants in AlN piezoelectric materials, resulting in improved piezoelectric response. This type of paradigm, reminiscent of methods employed by Curie and Edison, is constrained by the cognitive limits of subject matter experts and engineers to select and observe relevant features based on past training and known physical and chemical principles, anticipate how those features may influence performance, and provide guidance about how to improve performance.

Materials informatics can have a profound impact on achieving sustainability goals to replace critical ­materials, redefining the boundaries of possibility in materials science and ushering in a new era of responsible innovation. Applications of ML and AI to materials science are now commonplace (Butler et al. 2018; Takeda et al. 2020; Zhang and Ling 2018). These approaches are being influenced by the following considerations:

  • The proliferation of materials data repositories is ­rapidly increasing (Blaiszik et al. 2016; Dima et al. 2016; Huck et al. 2016; O’Mara et al. 2016; Pluchala et al. 2016). Researchers are exploring the automation of data and knowledge extraction from literature sources, using natural language processing techniques to populate these repositories (Jablonka et al. 2023; Lin et al. 2023). The scarcity of data repositories with relevant information on critical materials poses a challenge, necessitating new efforts to collect and curate data.
  • AI and ML concepts are also reshaping both the computational and physical laboratory infrastructures used to generate materials data. Surrogate models and digital twins (Kalidindi et al 2022), trained on physics-based simulations (Montes de Oca Zapiain 2021; Oommen 2022), are actively under development, offering significantly enhanced speed while maintaining accuracy. Automation and guided high-throughput methods (Boyce and Uchic 2019; MacLeod et al. 2022) bring substantial efficiencies and eliminate redundancies in materials synthesis and characterization. In concert, orders-of-magnitude speedup in simulation and experiment are now feasible.
  • In recent years, major strides have been made in using ML to synthesize multimodal data (Baltrušaitis et al. 2018; Shi et al. 2019; Wu and Goodman 2018). Disparate types of data are synthesized to exploit correlation across modalities so that they are more than the sum of their parts. While commonplace in traditional applications of ML (e.g., combining audio, video, and text), this practice has seen limited adoption in materials science, where it can serve a critical role in mitigating the sparsity of data.
  • With the development of ML-accelerated models that are thousands of times faster than conventional simulations, it begs the question of what previously intractable analyses are now possible. Can we fuse physics-based models with purely data-driven modalities to identify rapidly measurable “fingerprints” of material performance? Tools from unsupervised learning could potentially enable the discovery of unconventional measurements more amenable to high-throughput collection, allowing an order-of-magnitude increase in the number of candidate alternatives to critical materials (Raccuglia et al. 2016).

Given the culmination of both advances in high-throughput data collection and scientific ML, herein we explore the gaps and opportunities in developing a candidate workflow (see figure 1) appropriate for the rapid discovery of alternatives to critical materials. While we focus here on the promise of high throughput and ML as an increasingly viable acceleration pathway, in a previous article (Boyce et al. 2023), we discussed the psychological, intellectual, infrastructural, and algorithmic barriers to adoption of such a workflow.

Dingrevillle figure 1.gif

Data Infrastructure for Exploring Alternative Materials Solutions to Critical Materials

The discovery and deployment of alternative materials solutions require the involvement of disparate communities ranging from materials science, chemistry, physics, data science, AI, and computer science to manufacturing, economics, and sustainability. Each community has unique terminologies, requirements, priorities, and standards when it comes to critical materials. Additionally, data on materials (whether they are critical or not) and their associated properties and physical and chemical characterization remain largely uncurated, under-explored, and not integrated, largely due to the compartmentalization of information amongst the various communities involved. The integration and standardization across these communities, along with the associated databases and data management infrastructures, are critical to an efficient discovery ecosystem for alternatives to critical materials. A number of efforts are currently attempting to federate data storage, data sharing, and data analysis in a similar fashion to existing data management workflows (Warren and Ward 2018). See, for example, Materials Data ­Facility (Blaiszik et al. 2016), Materials Commons (Puchala et al. 2016), Citrination (O’Mara et al. 2016), MPContribs (Huck et al. 2016), and the Materials Genome ­Initiative (Dima et al. 2016). The specific challenge regarding critical materials, however, is the scarcity of data. In fact, much of the corpus of data on critical materials is currently embedded in various publications and awaits conversion to standardized and interpretable (by humans or machine algorithms) knowledge via natural language processing (Tshitoyan et al. 2019). There is an opportunity today to employ such language processing tools to not only build a large repository of known (non-critical) materials but also to catalog knowledge of failures to limit and avoid redundant and unsuccessful efforts, increasing the likelihood of successful outcomes, reproducibility, and ­traceability when searching for viable alternatives. Such natural language processing may also help to identify and mitigate erroneous or conflicting observations. While open materials data sharing platforms are virtuous endeavors, as those data sources become increasingly commercially useful, there will be growing questions of ownership, remuneration, and intellectual property.

Rethinking Computational and Experimental Infrastructures for Rapid Explorations of Alternative Materials Solutions

Today, synthesizing, characterizing, and modeling new materials is a complex and challenging task that requires specialized knowledge and skills. Even among experts, the level of specialization required for different processing, characterization, and modeling techniques can make it difficult to explore all possible solutions for a given application. This is especially true for discovering replacements for critical materials, which often require a deep understanding of a specific class of materials. As a result, large portions of the materials space remain unexplored, not because researchers lack the ability to imagine new alternative materials with promising functionality but rather because they lack the tools and expertise to ­rapidly synthesize, characterize, and model these materials. Automated, high-throughput experimental methods and advanced modeling tools are, however, becoming increasingly democratized and accessible.

Combining traditional and unconventional testing methods with simulation data can accelerate the ­discovery of new materials and their processing routes. Unconventional, high-throughput testing methods allow for easy automated data collection and ensure that the data is collected systematically, in a way that is amenable to subsequent automated data integration and analysis while limiting human intervention and associated bias and data corruption (Bassett et al. 2023). These unconventional data sources may be particularly advantageous as surrogates for expensive and time-consuming ­measurands such as fatigue life, creep resistance, and even fracture toughness or tensile ductility. Synthetic data from simulation can also be used in conjunction with experi­mental measurements. Algorithms needed for the discovery and understanding of alternative materials solutions to critical materials cannot be simply derived or adapted from existing, “black-box” ML algorithms, because they were designed for physics-agnostic computer science applications. Algorithm inputs for ­materials science are often sparse and heterogeneous and can only be understood in the context of the chemical and physical laws that govern materials. Recent developments in ­physics-informed neural networks (Karniadakis et al. 2021) or replacing numerical solvers with machine-learned solvers (Montes de Oca Zapiain 2021; Oommen 2022) provide techniques to combine first principles physics with traditional ML—many promising applications have seen acceleration over traditional simulation by a factor of a thousand. By integrating these types of fast, experimental, and computational surrogate tools, a rapid feedback loop could be designed to synthesize, characterize, and model novel materials solutions with similar process-structure-properties linkages observed in critical materials. While some efforts in this arena aim for completely autonomous discoveries without a human in the loop, there is value in maintaining a ­subject ­matter expert engaged in the process to audit the execution, add interpretative value, mitigate pathological extrapolations, and even contribute their dexterity for rare physical tasks (Boyce et al. 2023).

Exploiting Multimodality for Accelerated Discovery

The chief hurdle in applying ML to the identification of alternatives to critical materials is the dearth of data. Experiments are slow, expensive, and classically con­ducted in an artisanal manner to design a precise measurement that reveals a specific property. This is antagonistic to the rapid search of design space necessary to identify alternatives to critical materials. An underutilized dimension of materials data is the broad availability of multimodal characterization (X-ray diffractograms, scanning electron microscopy, synchrotron experiments, tribological testing, etc.). There is a hierarchy of modality quality and speed ranging from information dense or slow to information sparse or fast. Additionally, there are modalities available during fabrication and characterization that are typically neglected, such as audio and video of the synthesis process. Multimodal learning aims to integrate these in a manner that exploits correlation across modalities; for instance, by combining two-dimensional images of two sides of a coin, we can understand the full three-dimensional object.

While early efforts to machine learn material responses from datasets were successful in the 1990s, they often relied upon intuition-intense feature engineering and supervised learning, whereby a human needed to empirically identify quantities of interest that exposed correlation across modalities (e.g., Atz et al. 2021, Weininger 1988). In modern unsupervised and semi-supervised learning, it is routine to autonomously identify features without human intervention. This is crucial because datasets now contain many different types of modalities, overwhelming human cognition. To grapple with this challenge of dimensionality, materials scientists have historically reduced data to low-dimensional descriptors (e.g., reducing an X-ray diffractogram to the location and height of peaks). ML-aided identification of ­material “finger­prints” could avoid this indiscriminate loss of information. In recent years advances from the synthesis of text, audio, and image data have provided powerful new tools that are prime for application to materials ­datasets: fusion of modalities through the tokens of a transformer (Wang et al. 2022; Xu et al. 2023), variational inference frameworks for product-of-expert embeddings (Wu and Goodman 2018; Xu et al. 2021), and multimodal graph embeddings (Tao et al. 2020).

With these tools in hand, it is possible to revisit the classical problem of design of experiments and rapidly explore the space of alternatives to critical materials. If one can identify a correlation between the fingerprints of fast and slow modalities, it becomes possible to move beyond process-property maps to process-structure-property while maintaining high-throughput workflows. For this approach to be successful, the necessary multimodal data must not only be voluminous but also satisfy the five V’s of big data: veracity (“good,” clean data free from measurement errors or corruptions), value (providing the necessary information related to the objective), variety (spanning a sufficient range to allow for useful interpolations rather than extrapolations), velocity (acquired in a timely manner), and volume (lots of it for data-hungry algorithms!).

Generative Modeling for Scientific Discovery and Understanding

A comprehensive and exhaustive exploration of large-dimensional design spaces rapidly becomes combinatorically intractable (there have been estimates as high as 10177 for the number of potential alloys left to be explored [Cantor 2014]), necessitating intelligent iterative algorithms and an active learning approach that maximizes incremental information gain while reducing uncertainty. The proposal of candidate materials and associated manufacturing protocols can be posed as a generative modeling problem. One can use a database of materials with corresponding process-structure-property maps and use iterative and guided exploit-and-explore strategies to efficiently search for a new candidate replacement material. While classical tools like Bayesian optimization (Shahriari et al. 2015) and genetic algorithms (Tao et al. 2020) are mature, recent advances in deep learning architectures allow access to significantly more complex generative processes. Some material properties, like creep life, are difficult to predict on a purely computational basis and will require extensive investments in experimental data as part of such a search strategy, further necessitating highly selective active learning approaches.

While DALL-E and variants of Stable Diffusion (Ramesh et al. 2022) are commonplace (a prompt “cat playing basketball on the moon” yields a strikingly realistic image), there is work required to robustly produce the materials science equivalent (asking ChatGPT for a “manufacturing protocol for an alternative to gold with high conductivity” yields less impressive results). Physics must be integrated into the generative process to ensure exploitable results that are physically viable. In the graph neural network community, major strides have been made in drug discovery and biomechanics by mapping from molecular configuration to performance (Sohl-­Dickstein et al. 2015; Wieder et al. 2020; Jumper et al. 2021; Bengio et al. 2021). These problems are purely geometric, however, attempting to map directly from molecular configuration to a quantity of interest with no representation of the underlying dynamic physical process is more complicated. With the vast acceleration of ML-driven surrogates over traditional simulation, there is an opportunity to integrate physical processes into generative ML architectures. Further, there is a need to advance the underlying simulation capabilities—even tasks such as predicting the yield strength of complex alloys based on individual dislocation simulations are com­putationally challenging. In the near future, purely computational approaches will be limited to those material properties that are more confidently predictable.

Several modern works attempt generative modeling now, using graphs as abstractions for designs. GFlowNets (Schölkopf et al. 2021), pioneered by Yoshua Bengio’s group at Mila, model the generative process by traversing a graph to encode a sequence of sequential decision. In the causal ML community, cause-effect relationships are encoded as edges (AàB if A causes B) in a directed acyclic graph to build structural causal models (Richens et al. 2020). These techniques have proven very effective in bioinformatics when disentangling the root cause of diseases from biomarkers (Pavlović et al. 2024), and integrating physical models into generative processes could yield the same success when searching for alternatives to critical materials.

A Pathway Towards a Future with Less Reliance on Critical Materials

By envisioning a future with reduced dependence on critical materials through materials informatics, several pathways emerge through innovative approaches in ­materials science. These innovations hinge on collaborative efforts, data infrastructures, and accelerated capabilities to characterize, model, and process such data. Establishing robust data infrastructures plays a pivotal role in seamless access to precompetitive materials data, empowering both scientific research and engineering applications. ­Concurrently, rethinking computational and experi­mental infrastructures enables rapid exploration of those databases, allowing for more efficient identification and testing of substitute, alternative material solutions. Embracing multimodality in those research approaches has the potential to accelerate the discovery process of alternatives to critical materials by providing deeper finger­prints and revealing unknown correlations, ultimately guiding us toward sustainable solutions. Additionally, leveraging generative modeling techniques, rooted in the fast-changing fields of ML and AI, offers a fresh lens for scientific understanding and discovery. These models learn from existing data to generate new candidate materials, providing the means to explore and exploit large-dimensional design spaces where new materials solutions to replace critical materials are waiting to be discovered. These integrated strategies not only expand our current understanding of the unique roles critical materials play in materials technology, but they also pave the way for tangible solutions for a more sustainable future.

Acknowledgments

Trask and Karniadakis’ work is supported by ­SEA-CROGS, a Department of Energy MMICCs ­center for next-generation scientific machine learning. This work is supported by the Center for Integrated Nanotechnologies (CINT), an Office of Science user facility operated for the U.S. Department of Energy. This article has been authored by an employee of National Technology & Engineering Solutions of Sandia, LLC under Contract No. DE-NA0003525 with the US Department of Energy (DOE). The employee owns all rights, title and interest in and to the article and is solely responsible for its contents. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-­exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this article or allow others to do so, for United States ­Government ­purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan www.energy.gov/downloads/doe-public-access-plan.

References

Akiyama M, Kamohara T, Kano K, Teshigahara A, Takeuchi Y, Kawahara N. 2009. Enhancement of piezoelectric response in scandium aluminum nitride alloy thin films prepared by dual reactive cosputtering. Advanced Materials 21(5):593–96.

Atz K, Grisoni F, Schneider G. 2021. Geometric deep learning on molecular representations. Nature Machine Intelligence 3(12):1023–32.

Baltrušaitis T, Ahuja C, Morency LP. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on ­Pattern Analysis and Machine Intelligence 41(2):423–43.

Bassett KL, Watkins T, Coleman J, Bianco N, Bailey LS, ­Pillars J, Williams SG, Babuska TF, Curry J, DelRio FW, and 5 ­others. 2023. A workflow for accelerating ­multimodal data collection for electrodeposited films. Integrated ­Materials and Manufacturing Innovation 12:430–40.

Bauer D, Khazdozian H, Mehta J, Nguyen RT, Severson MH, Vaagensmith BC, Toba L, Zhang B, Hossain T, Sibal AP, and 1 other. 2023. 2023 Critical Materials Strategy. No. INL/RPT-23-72323-Rev001. Idaho National Laboratory. Idaho Falls, Idaho.

Bengio E, Jain M, Korablyov M, Precup D, Bengio Y. 2021. Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural ­Information ­Processing Systems 34:27381–94.

Bennett JA, Abolhasani M. 2023. Autonomous chemical science and engineering enabled by self-driving laboratories. Current Opinion in Chemical Engineering 36:100831.

Blaiszik B, Chard K, Pruyne J, Ananthakrishnan R, Tuecke S, Foster I. 2016. The materials data facility: Data services to advance materials science research. JOM 68(8):2045–52.

Boyce B, Dingreville R, Desai S, Walker E, Shilt T, Bassett KL, Wixom RR, Stebner AP, Arroyave R, Hattrick-Simpers J, Warren JA. 2023. Machine learning for materials science: Barriers to broader adoption. Matter 6(5):1320–23.

Boyce BL, Uchic MD. 2019. Progress toward autonomous experimental systems for alloy development. MRS Bulletin 44(4):273–80.

Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. 2018. Machine learning for molecular and materials science. Nature 559(7715):547–55.

Cantor B. 2014. Multicomponent and high entropy alloys. Entropy 16(9):4749–68.

DOE (US Department of Energy). 2023. Notice of Final Determination on 2023 DOE Critical ­Materials List, Aug 4. Online at www.federalregister.gov/­documents/2023/08/04/2023- 16611/notice-of-final-­determination-on-2023-doe- critica l-materials-list.

Dima A, Bhaskarla S, Becker C, Brady M, Campbell C, Dessauw P, Hanisch R, Kattner U, Kroenlein K, Newrock M, Peskin A. 2016. Informatics infrastructure for the materials genome initiative. JOM 68:2053–64.

Henaff O. 2020. Data-efficient image recognition with c­ontrastive predictive coding. International Conference on Machine Learning, PMLR 119:4182–92.

Huck P, Gunter D, Cholia S, Winston D, N’Diaye AT, Persson K. 2016. User applications driven by the community contribution framework MPContribs in the Materials Project. Concurrency and Computation: Practice and Experience 28(7):1982–93.

Jablonka KM, Ai Q, Al-Feghali A, Badhwar S, Bocarsly JD, Bran AM, Bringuier S, Brinson LC, Choudhary K, Circi D, and 44 others. 2023. 14 examples of how LLMs can transform materials science and chemistry: A reflection on a large language model hackathon. Digital Discovery 2(5):1233–50.

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, ­Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583–89.

Kalidindi SR, Buzzy M, Boyce BL, Dingreville R. 2022. Digital twins for materials. Frontiers in Materials 9:818535.

Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L. 2021. Physics-informed machine learning. Nature Reviews Physics 3(6):422–40.

Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Smetanin N, Verkuil R, Kabeli O, Shmueli Y, dos Santos Costa A. 2023. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379(6637):1123–30.

Liu Y, Zhao T, Ju W, Shi S. 2017. Materials discovery and design using machine learning. Journal of Materionomics 3(3):159–77.

MacLeod BP, Parlane FG, Brown AK, Hein JE, Berlinguette CP. 2022. Flexible automation accelerates materials ­discovery. Nature Materials 21(7):722–26.

Miracle DB, Senkov ON. 2017. A critical review of high ­entropy alloys and related concepts. Acta Materialia 122:448–511.

Montes de Oca Zapiain D, Stewart JA, Dingreville R. 2021. Accelerating phase-field-based microstructure evolution predictions via surrogate models trained by machine learning methods. npj Computational Materials 7(1):3.

O’Mara J, Meredig B, Michel K. 2016. Materials data infrastructure: a case study of the citrination platform to examine data import, storage, and access. JOM 68(8):2031–34.

Oommen V, Shukla K, Goswami S, Dingreville R, ­Karniadakis GE. 2022. Learning two-phase microstructure evolution using neural operators and autoencoder architectures. npj Computational Materials 8(1):190.

Pavlović M, Hajj GSA, Kanduri C, Pensar J, Wood ME, ­Sollid LM, Greiff V, Sandve GK. 2024. Improving generalization of machine learning-identified biomarkers using causal model­ling with examples from immune receptor diagnostics. Nature Machine Intelligence 6:15–24.

Puchala B, Tarcea G, Marquis EA, Hedstrom M, Jagadish HV, Allison JE. 2016. The materials commons: A collaboration platform and information repository for the global materials community. JOM 68:2035–44.

Raccuglia P, Elbert KC, Adler PD, Falk C, Wenny MB, ­Mollo A, Zeller M, Friedler SA, Schrier J, and 1 other. 2016. Machine-learning-assisted materials discovery using failed experiments. Nature 533(7601):73–6.

Ramesh A, Dhariwal P, Nichol A, Chu C, Chen M. 2022. ­Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1(2):3.

Richens JG, Lee CM, Johri S. 2020. Improving the accuracy of medical diagnosis with causal machine learning. Nature Communications 11(1):3923.

Schölkopf B, Locatello F, Bauer S, Ke NR, Kalchbrenner N, Goyal A, Bengio Y. 2021. Toward causal representation learning. Proceedings of the IEEE 109(5):612–34.

Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N. 2015. Taking the human out of the loop: A review of ­Bayesian optimization. Proceedings of the IEEE 104(1):148–75.

Shi Y, Paige B, Torr P. 2019. Variational mixture-of-experts autoencoders for multi-modal deep generative models. Advances in Neural Information Processing Systems 32.

Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. International Conference on Machine Learning PMLR:2256–65.

Startt J, Quazi M, Sharma P, Vazquez I, Poudyal A, Jackson N, Dingreville R. 2023. Unlocking AlN Piezoelectric Performance with Earth-Abundant Dopants. Advanced Electronic Materials 9(4):2201187.

Takeda S, Hama T, Hsu HH, Piunova VA, Zubarev D, ­Sanders DP, Pitera JW, Kogoh M, Hongo T, Cheng Y, Bocanett W. 2020. Molecular inverse-design platform for material industries. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data ­Mining:2961–69.

Tao Z, Wei Y, Wang X, He X, Huang X, Chua TS. 2020. Mgat: Multimodal graph attention network for recommendation. Information Processing & Management 57(5):102277.

Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, ­Kononova O, Persson KA, Ceder G, Jain A. 2019. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571(7763):95–8.

Wagih M, Schuh CA, 2022. Learning grain-boundary segregation: From first principles to polycrystals. Physical Review Letters 129(4):046102.

Wang Y, Chen X, Cao L, Huang W, Sun F, Wang Y. 2022. Multimodal token fusion for vision transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition:12186–95.

Warren JA, Ward CH. 2018. Evolution of a materials data infrastructure. JOM 70:1652–58.

Weininger D. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and ­encoding rules. Journal of Chemical Information and Computer ­Sciences 28(1):31–6.

Wieder O, Kohlbacher S, Kuenemann M, Garon A, Ducrot P, Seidel T, Langer T. 2020. A compact review of ­molecular property prediction with graph neural networks. Drug ­Discovery Today: Technologies 37:1–12.

Wu M, Goodman N. 2018. Multimodal generative models for scalable weakly-supervised learning. Advances in Neural Information Processing Systems 31.

Xu J, Ren Y, Tang H, Pu X, Zhu X, Zeng M, He L. 2021. ­Multi-VAE: Learning disentangled view-common and view-peculiar visual representations for multi-view clustering. Proceedings of the IEEE/CVF International Conference on Computer Vision:9234–43.

Xu P, Zhu X, Clifton DA. 2023. Multimodal learning with transformers: A survey. IEEE Transactions on Pattern ­Analysis and Machine Intelligence 45:12113–32.

Zhang Y, Ling C. 2018. A strategy to apply machine learning to small datasets in materials science. npj Computational Materials 4(1):25.

About the Author:Rémi Dingreville is distinguished member of the technical staff, Sandia National Laboratories; Nathaniel Trask is associate professor, University of Pennsylvania; Brad Lee Boyce is distinguished member of the technical staff, Sandia National Laboratories; George Em Karniadakis is Charles Pitts Robinson and John Palmer Barstow Professor of Applied Mathematics and Engineering, Brown University.