How2AI
Google and Microsoft are in the news recently with announcements of turning their AIs to the task of new material synthesis (Google A-lab, Microsoft-PNNL). Both are out with claims of new materials in a fraction of the time consumed by the human process for new material discovery, and they did it in solid oxides (at least Google did). In classic style, Google was out first with a prestigious article in Nature and Microsoft was fast to follow with their own announcement in the popular press only a few months later — but notably after the Google effort received critical feedback from the field.
I happen to have training in the old and unfashionable human methods for new material discovery in solid oxides (Skutterudites, and doped ceria compounds) including some with composition and site-occupancy variation as in the Google A-Labs paper. And I’ve published powder XRD analysis using the Rietveld method (as well as single crystal XRay, electron, and have familiarity with neutron diffraction techniques). It is easy to understand why they started with the Rietveld method, it is a least squares technique and the math isn’t hard. Knowing what to do with that math, however, takes some training. In my own PhD work I synthesized theoretical materials that were suggested 30 years earlier and had not yet been synthesized… by humans. I was able to synthesize the material, by breaking some well known but unhelpful conventions, but upon characterization of those materials the properties were not as theorized and the search for a structure with both electronic-crystal and phonon-glass properties, which would be ideal for thermoelectric materials, continues. I worked on several related materials, but the oldest discovery was 30 years in the making starting well before my time. Finding 40 new, relevant, and useful, materials in a matter of months would be an amazing time savings - it would be a really big deal. This isn’t that. This is a harbinger for the denial of service for folks doing the work. Life is hard enough for the experimentalists in the room (or downstairs in the sub-basement lab — where the building won’t shake their beam alignments) without AI ‘helping’ in the form of even more hallucinatory ideas to investigate — theorists were into creating loads of work for other people via hallucinations before they were cool. I jest, a bit, but ideas are easy here, we need to see the experiment validated in someone else’s lab before we jump on board (looking at you LK-99). I eagerly look forward to the day when AI makes our lives not just easier but better. Today is not that day.
What the Google A-Labs paper gave us was a series of rookie mistakes, the kind of thing you’d coach and correct from a new practitioner well before publication. It really helps to be more skeptical of results in general and less eager to jump on a rising tide of hype as the world of AI seems swept up in. The long and short of it: the way the Google team was computing the Rietveld analysis wasn’t sensitive to variable site composition (similar atoms can share a site rather than exclusively sit in one or the other) or site occupancy (individual crystallographic sites can be partially/fractionally occupied) and their AI ‘found’ a bunch of ordered versions of already known materials that exist in a state of partial disorder (the bulk meets the stoichiometry even if individual unit cells may be imperfect). One can read more here. This outcome is completely plausible, powder diffraction analysis is a bit of an art even 100 years later and relying primarily upon powder techniques instead of single-crystal walks them right into this mistake (to say nothing of bulk vs. boundary effects on structure and composition — equilibrium is hard in ceramic systems, what were they thinking here?).
More pointedly: in academic work in computer science one can publish results where 40–60% of the variation in the sample is explained by your model, whereas in crystallography your model needs to be explaining 99.99% of the variation in the material or you need to stand ready to explain to reviewer #2 why your extremely large thermal parameters or site occupancy effects exist in the real world and are not just fudge factors masking your model’s poor fit. The standard here isn’t acceptance by humans as plausible, but replication in the physical world by indifferent actors.
In practice, a LLM[1] is looking to find the next word to satisfy a human’s expectation within the context of a conversation. One only needs to listen to the speech of a former President to understand just how much work happens on both sides of a speech act to create meaning and therefore qualify an utterance as an acceptable response. When you are pushing atoms around in a material you’ve got physics, chemistry, thermodynamics, and kinetics all exerting influence on where that atom goes — it isn’t so much a matter of interpretation but a stochastic sampling of probability space. Perhaps that is akin to the interpretation of a human facing LLM, but a key difference here is that the ruleset for what is acceptable is not a human ruleset. What we’ve got in our physics, chemistry, thermodynamics, and kinematics is a human’s imperfect understanding of what the ruleset is. In thermodynamics, for instance, we had 100 years of offsetting errors baked into our ruleset because we didn’t yet have the test or technology to show our understanding was flawed all along. Why should an AI replicate that error in its understanding of these non-human rules? It shouldn’t. The ruleset discovered by the AI ought not conform to the human ruleset, though it ought to generate the same physical outcomes as the human ruleset in the domain of applicability for those rules. This is the critical difference, we’re not yet tasking AI to discover the universe, we’ve merely asked it to be better at parsing the universe as if it were a human. That’s pretty shitty of us and massively self-centered. I think this is also one of the reasons why we’re not actually progressing towards AGI and the hazards that might entail — we’re still a little too enamored with ourselves just now.
If I were to train a Materials Synthesis AI to speculate about new materials and then go synthesize them, I would do a few things differently. First, I’d not rely on powder XRD for novel material characterization. Cross-validation is the lesson of our history (sure, the AI should find its own lessons and history, touché) and going with a technique meant to resolve a single crystal, for instance, will really help pin down what it is you’ve actually synthesized in your experiment. It just so happens that single crystal is a lot more difficult, both in synthesis and the experimental procedure for the characterization. This stuff is hard, the AI is supposed to be making it easier, no? That sample prep difficulty speaks to something else as well, it is a lot easier to end up in a quasi-stable configuration in a bulk powder than in a single crystal. That bulk technique starts out as an averaging of everything that came out of the end of your synthesis process and you won’t necessarily have the one desired novel composition or structure you were seeking throughout your sample. With single crystal one (mostly) skips that variability. But I’d also not start in a complicated space like oxide ceramics. No offense to the metallurgists, but I’d start a Materials Synthesis AI on a track to discover a binary alloy.
The examples in the press show an AI working in a novel material space looking for new things. There is a funny quote about predictions that we should generalize now: “predictions are hard, especially about the future.” I propose “predictions are hard, especially about the unknown” as a more general form. Instead of loading our human knowledge of materials synthesis and providing perfect precursors to our Materials Synthesis AI I’d load up our experimental techniques and raw ore processing methods, and then share an edited version of our materials catalog. I would also be an ass and delete any reference to the element Copper. But I would provide the Materials Synthesis AI with real world raw ores as starting materials. Copper is going to show up in the samples it makes even if there is nothing in its database to account for the existence of Copper. The Materials Synthesis AI will have to step back and figure out if it is looking at an error in its experimental technique, synthesis technique, or something new and novel before it moves on to whatever material it was trying to synthesize when Copper elbowed its way into the scene (this is a step that was likely skipped in current newsworthy efforts). When this Materials Synthesis AI can reproduce the Bronze Age we’ll know it is ready to move past binary metals and we can start testing it against more complex materials systems. Look, I understand I’m asking the most advanced AI on the planet to discover something humans sorted out 4,000 years ago, but it really isn’t that easy to do and it would be really impressive if the AI can get there in a few years rather than the 10,000 years it took modern humans. Yes, years. You can fool humans with a few months of math but my bronze spear don’t care.
We have an oracle problem and we need to establish the Materials Synthesis AI’s mastery of the techniques in domains where the oracles are well established before we send the AI off into the unknown. Without doing this establishing work we’ll have no idea if the AI has found something new or if it just found a new way to bullshit us again.
[1] Yes, DeepMind is a convolutional neural network and not an LLM like ChatGPT. No, that didn’t help it here. The reinforcement technique leveraged is presumably of human origin rather than a reconstruction of physics from experimental results. Otherwise, DeepMind wouldn’t have been surprised by impurities and imperfections in its novel material compositions — since every useful metal starts its human utility as an impurity in a naturally occurring ore.