An AI Read 551 Papers in 30 Minutes and Suggested a New Drug

Share

So far on this blog, AI has mostly been a very sophisticated instrument. A tool that predicts a protein's shape, reads a tumor's mutations, tunes a stimulator. In every case a human scientist holds the instrument and decides what to do with what it shows. A pair of papers in Nature this month pushes at something more unsettling and more interesting: AI not as the instrument, but as something closer to the scientist. Systems that generate their own hypotheses, design their own experiments, and decide what to investigate next.

I want to be careful here, because this is a topic where the hype is thick and the reality is more modest and, honestly, more interesting than the hype.

What these systems actually did

Two systems feature in the reporting. One is Google DeepMind's Co-Scientist, a reasoning system built from multiple cooperating AI agents that review the literature, propose ideas, and argue competing hypotheses against each other. The other, which I find the clearer story, is called Robin, built by a nonprofit lab called FutureHouse.

Robin is what people mean by a multi-agent system. Rather than one model doing everything, it is a team of specialized agents: some scour the scientific literature, some generate hypotheses, some design experiments, some analyze the resulting data. They pass work back and forth in a loop. You hand the system a disease, and it starts reasoning its way toward a therapeutic strategy.

The team pointed Robin at dry age-related macular degeneration, a leading cause of irreversible blindness in the developed world with few good treatments. What happened next is the part worth pausing on. Robin reviewed the literature, proposed ten possible disease mechanisms, and settled on one specific strategy: enhancing a particular housekeeping function of the cells at the back of the eye. It then proposed an existing drug, ripasudil, normally used for glaucoma, as a candidate, something that had not previously been suggested for this condition. When the experiments came back, it proposed a follow-up that pointed toward a possible new molecular target.

According to the report, Robin worked through hundreds of papers in about half an hour, a task estimated to take a human on the order of hundreds of hours. The system, the authors note, generated the hypotheses, the experimental plans, and the data analyses itself.

The sentence everyone skips

Here is the line that the breathless coverage tends to bury, and it is the most important one: humans ran every physical experiment.

Robin did the reading, the reasoning, the planning, and the interpretation. But when it came time to actually put cells in a dish and test the drug, people did that. The system operates in what its makers call a scientist-in-the-loop framework. It is a brilliant research assistant that never touches a pipette. That distinction is not a footnote. It is the whole shape of what was and was not achieved.

And the achievement is real. Knowing what to read, which of ten plausible mechanisms to chase, which existing drug might be quietly repurposable, this is genuinely hard intellectual work, and the part of science that is often the slowest. Compressing the literature-and-hypothesis stage from weeks into an afternoon is not nothing. If it holds up, it changes the tempo of discovery.

Where I plant my feet

Now the cold water, which the careful scientists involved would pour themselves.

A candidate that shows promise in lab-grown cells is at the very beginning of a long road, not the end. The overwhelming majority of drug candidates that look good in a dish go on to fail in more demanding tests, and then most of the survivors fail in human trials. Ripasudil being an intriguing, AI-surfaced idea for this disease is exciting; it is not a treatment, and it would be irresponsible to suggest otherwise. None of these AI-proposed candidates has been fully evaluated.

There is also a subtler worry that a companion Nature editorial put well, under the pointed title that AI cannot do good science without humans. An AI system trained on the existing literature is, almost by construction, very good at recombining what is already known. It is much less obvious that it can do the thing that defines the best science: the leap that contradicts the consensus, the weird result nobody was looking for, the intuition born of years at the bench. There is a real risk that armies of AI agents, all trained on the same corpus, could make us faster at exploring the ideas we already have while quietly narrowing the diversity of ideas we generate. Speed is not the same as imagination.

What I actually think

I land somewhere that is neither dismissive nor dazzled. What Robin represents is a genuine new kind of tool, and the honest framing is the one its own creators chose: not an artificial scientist replacing the human kind, but a tireless collaborator that handles the synthesis and the bookkeeping of ideas so the humans can do the parts that need judgment, taste, and a willingness to be surprised.

That framing connects to the thread running through everything I have written lately. We taught machines to read the language of proteins and cells; now we are teaching them to read the language of the scientific literature itself and to propose what to do with it. The capability is striking and the trajectory is steep. But the experiment still has to be run, the result still has to be doubted, and the meaning still has to be judged. An AI can now read 551 papers in 30 minutes and hand you a hypothesis. Whether that hypothesis is true, and whether it matters, remains, for now and for good reason, a human question.

Read more