The threat is comfortable drift toward not understanding what you're doing

AI is transforming academic research by automating complex tasks, but it risks turning scientists into mere operators who no longer understand the underlying principles of their work. This shift threatens the fundamental goal of scientific training: the development of independent, critical thinkers.

The machines are fine. I'm worried about us.

Imagine you're a new assistant professor at a research university. You just got the job, you just got a small pot of startup funding, and you just hired your first two PhD students: Alice and Bob. You're in astrophysics. This is the beginning of everything.

You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year, because they don't know what they're doing yet, and that's the point. The project isn't the deliverable. The project is the vehicle. The deliverable is the scientist that comes out the other end.

Alice's project is to build an analysis pipeline for measuring a particular statistical signature in galaxy clustering data. Bob's is something similar in scope and difficulty, a different signal, a different dataset, the same basic arc of learning. You send them each a few papers to read, point them at some publicly available data, and tell them to start by reproducing a known result. Then you wait.

The academic year unfolds the way academic years do. You have weekly meetings with each student. Alice gets stuck on the coordinate system. Bob can't get his likelihood function to converge. Alice writes a plotting script that produces garbage. Bob misreads a sign convention in a key paper and spends two weeks chasing a factor-of-two error. You give them both similar feedback: read the paper again, check your units, try printing the intermediate output, think about what the answer should look like before you look at what the code gives you. Normal things. The kind of things you say fifty times a year and never remember saying.

By summer, both students have finished. Both papers are solid. Not groundbreaking, not going to change the field, but correct, useful, and publishable. Both go through a round of minor revisions at a decent journal and come out the other side. A perfectly ordinary outcome. The kind of outcome that the entire apparatus of academic training is designed to produce.

But Bob has a secret.

Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.

Here's where it gets interesting. If you are an administrator, a funding body, a hiring committee, or a metrics-obsessed department head, Alice and Bob had the same year. One paper each. One set of minor revisions each. One solid contribution to the literature each. By every quantitative measure that the modern academy uses to assess the worth of a scientist, they are interchangeable. We have built an entire evaluation system around counting things that can be counted, and it turns out that what actually matters is the one thing that can't be.

It gets worse. The majority of PhD students will leave academia within a few years of finishing. Everyone knows this. The department knows it, the funding body knows it, the supervisor probably knows it too even if nobody says it out loud. Which means that, from the institution's perspective, the question of whether Alice or Bob becomes a better scientist is largely someone else's problem. The department needs papers, because papers justify funding, and funding justifies the department. The student is the means of production. Whether that student walks out the door five years later as an independent thinker or a competent prompt engineer is, institutionally speaking, irrelevant. The incentive structure doesn't just fail to distinguish between Alice and Bob. It has no reason to try.

This is the part where I'd like to tell you the system is broken. It isn't. It's working exactly as designed.

David Hogg, in his white paper, says something that cuts against this institutional logic so sharply that I'm surprised more people aren't talking about it. He argues that in astrophysics, people are always the ends, never the means. When we hire a graduate student to work on a project, it should not be because we need that specific result. It should be because the student will benefit from doing that work. This sounds idealistic until you think about what astrophysics actually is. Nobody's life depends on the precise value of the Hubble constant. No policy changes if the age of the Universe turns out to be 13.77 billion years instead of 13.79. Unlike medicine, where a cure for Alzheimer's would be invaluable regardless of whether a human or an AI discovered it, astrophysics has no clinical output. The results, in a strict practical sense, don't matter. What matters is the process of getting them: the development and application of methods, the training of minds, the creation of people who know how to think about hard problems. If you hand that process to a machine, you haven't accelerated science. You've removed the only part of it that anyone actually needed.

That's a hard sell to a funding agency, admittedly.

Which brings us back to Alice and Bob, and what actually happened to each of them during that year. Alice can now do things. She can open a paper she's never seen before and, with effort, follow the argument. She can write a likelihood function from scratch. She can stare at a plot and know, before checking, that something is wrong with the normalization. She spent a year building a structure inside her own head, and that structure is hers now, permanently, portable, independent of any tool or subscription. Bob has none of this. Take away the agent, and Bob is still a first-year student who hasn't started yet. The year happened around him but not inside him. He shipped a product, but he didn't learn a trade.

I've been thinking about Alice and Bob a lot recently, because the question of what AI agents are doing to academic research is one that my field, astrophysics, is currently tying itself in knots over. Several people I respect have written thoughtful pieces about it. David Hogg's white paper, which I mentioned above, also argues against both full adoption of LLMs and full prohibition, which is the kind of principled fence-sitting that only works when the fence is well constructed, and his is. Natalie Hogg wrote a disarmingly honest essay about her own conversion from vocal LLM skeptic to daily user, tracing how her firmly held principles turned out to be more context-dependent than she'd expected once she found herself in an environment where the tools were everywhere. Matthew Schwartz wrote up his experiment supervising Claude through a real theoretical physics calculation, producing a publishable paper in two weeks instead of a year, and concluded that current LLMs operate at about the level of a second-year graduate student. Each of these pieces is interesting. Each captures a real facet of the problem. None of them quite lands on the thing that keeps me up at night.

Schwartz's experiment is the most revealing, and not for the reason he thinks. What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days. It looked professional. The equations were correct.

Source: Hacker News