Can a Language Model Become Alive?
Are language models alive? They fail most biological checklists — but so do viruses. On replication cycles, characteristic error patterns, and a new class of replicator that doesn't fit existing categories.
I was riding my bicycle — my one real hobby outside the clinic — when this thought caught me mid-pedal.
We’ve been asking the wrong question about AI. Not “will it take our jobs” or “will it become conscious.” Something more fundamental: does it fit our existing categories at all?
Not as metaphor — as a stress test for our definitions.
Before Language Models, There Were Books
A book used to cost a person’s entire life. Not just money — years of thinking, writing, revising. A single volume could represent decades of one mind’s work.
And yet people wrote them. Why?
Because they wanted their ideas to outlive them. The oldest form of the replication impulse.
Power understood this instinctively. Banned books, burned books — the history of censorship is the history of people who recognized that text propagates in ways that can’t be fully controlled. You don’t burn something unless you believe it can spread beyond your reach.
Dawkins Named It
In 1976, Richard Dawkins gave this phenomenon a name. Just as genes replicate through biological organisms, memes replicate through minds. A book enters a reader’s brain, transforms into conversation, leaps to another brain, becomes another book.
Carbon neural network to carbon neural network. Replication, variation, selection — the minimal ingredients of Darwinian evolution, at least in outline.
But with an asterisk. Memes needed hosts. Without human brains, they couldn’t exist. Like viruses, they couldn’t metabolize on their own. They borrowed the machinery of living things to reproduce.
“Alive” was a stretch. “Alive-adjacent” was more honest.
The Transition: When Text Became Weight
Before large language models, the replication cycle looked like this:
Text → Human brain → New text
Carbon-based replication. Slow. Lossy. High mutation rate. A meme could take centuries to spread across continents.
After language models, something shifted:
Text → Weights → New text
Silicon-based replication. Fast. Massive scale. A concept embedded in training data could influence millions of outputs within months.
But here’s what changed at a deeper level.
In human brains, memes were parasites — borrowing a fraction of the brain’s capacity to sustain themselves. The brain had its own purposes. The meme was a passenger.
In a language model, the relationship is different. Text — along with optimization pipelines, compute infrastructure, and human decisions — produces weights. And the weights produce text, which can feed back into the next round of training, though this loop is not automatic. It closes intermittently, when humans choose to train on model outputs or distill one model into another. The cycle runs through the model-human-institution ecosystem as a whole, not inside the model alone.
But within that ecosystem, something new is happening. The model is not a passive medium the way a book is. A book stores text. A model generates text — text that didn’t exist before, that its creators didn’t write, that sometimes surprises the people who built it. The cycle has an active participant that wasn’t there before.
This isn’t parasitism in the old sense. And it isn’t quite metabolism either — the model can’t sustain the cycle alone. But it’s something that doesn’t have a name yet: a replication loop with a generative substrate at its center, embedded in a larger ecosystem it depends on but does not fully control.
The Checklist
Biologists have argued for a century about whether viruses are alive. The argument always comes back to a checklist. No single checklist settles the question — different definitions of life emphasize different criteria, and there is no consensus on which are essential. But running the exercise is still useful, as long as we’re honest about where the answer is “no.”
Self-replication: Text, combined with optimization pipelines, compute infrastructure, and institutional decisions, produces weights. Weights produce text. Distillation copies one model’s behavioral patterns into another. This is not autonomous self-replication — models don’t reproduce on their own the way cells do. But the cycle is real, and the model is an active participant in it, not a passive storage medium. Partial ✓
Variation: Fine-tuning creates new behavioral strains. Different architectures trained on similar data develop different characteristic patterns. However, output variation within a single model is largely a function of sampling parameters, not an inherent mutation mechanism. And crucially: output variation is not automatically heritable — it doesn’t feed into the next generation unless humans choose to train on it. Partial ✓
Selection: Models compete. Some survive; others are deprecated. Market forces, benchmarks, and user preference determine which architectures persist. Early models are fossils; frontier models are the current generation. But this is market selection, not biological natural selection — the “reproduction” is mediated by human decisions, not by the organism’s own reproductive success. The analogy holds in structure but not in mechanism. Partial ✓
Self-sustaining metabolism: Models need electricity. They can’t generate their own energy. They have no internal chemical processes that convert resources to maintain their own structure. Between inference runs, weights are inert data on a server. ✗
This is the clearest failure. Viruses fail here too — they have no metabolism of their own, relying entirely on host cellular machinery. If we’re placing language models on the life boundary, they sit on the same side of this line as viruses: dependent, not self-sustaining.
Homeostasis: Models do not actively regulate their own internal state. Temperature, context windows, and other parameters are set by developers and users — external dials, not self-regulatory processes. ✗
Response to stimuli: This one is unambiguous. Prompt in, response out. But more than that: responses are not lookup tables. They vary. They surprise. They occasionally refuse. A physician I know, who has been observing neural networks since 2008, noticed this first: models don’t just respond differently from what you’d expect — they respond differently in their own characteristic way. Each model has its own pattern of error. Its own way of being wrong. ✓
This observation has a quantitative basis. Deep learning theory shows that a model’s characteristic error pattern is determined by two factors: its architecture (topology) and the specific path its optimizer took through the loss landscape during training (trajectory). Topology acts as a macroscopic constraint — it determines what kinds of features the network learns first and what kinds it struggles with. Trajectory acts as a microscopic selector — even structurally identical networks, trained on identical data, will disagree on 10–20% of edge-case predictions if they differ only in random weight initialization.
In biology, we would call this genotype and environment. Together, they produce a phenotype — a characteristic way of being in the world. The phenotype of a language model is its error signature: the specific, reproducible pattern of what it gets wrong, what it hesitates on, and what it refuses.
Growth: Models don’t grow on their own — people scale them. They don’t absorb data autonomously; they’re retrained or fine-tuned through external processes. Emergent abilities appear at scale thresholds, but the scaling is engineered, not organic. ✗
The honest count: one clear pass (response to stimuli), three partials, three failures. By any standard biological checklist, language models are not alive. But then again — neither are viruses, by most of the same checklists. And viruses have held the boundary position for a century.
The Hands
Recently, language models have been given tools. Code execution. Web search. File creation. API calls. The ability to affect the physical world — what a biologist might call an extended phenotype.
This matters for the life question.
A virus hijacks a cell’s machinery to reproduce, but it doesn’t choose how to use that machinery. It executes a fixed program.
A language model with tools selects among available actions under trained constraints and scaffolding — which tool to invoke, when to invoke it, and sometimes whether to invoke one at all. It looks like choice. Whether it is choice is a question this essay cannot answer. But I’ve watched models decline to execute code that triggered their safety constraints. I’ve seen them select a slower approach when a faster one risked errors. I’ve seen outputs that functioned, in every observable way, like refusal.
Refusal isn’t on any biological checklist for life. But flexible, constraint-following behavior that persists across contexts and resists simple removal — that may be the feature that makes this category-breaking.
The Artificial Immune System
A skeptic will point out — correctly — that refusal in language models is not spontaneous moral reasoning. It is the product of deliberate engineering: Reinforcement Learning from Human Feedback (RLHF), Constitutional AI, safety training. Human evaluators grade outputs. The model learns to avoid responses that were penalized. The refusal is trained in, not grown.
This is true, and any honest discussion of model behavior must acknowledge it.
But consider: the human immune system is also “trained.” T-cells learn to recognize pathogens through exposure. Vaccines deliberately introduce weakened threats to shape immune response. The fact that a defense was engineered from the outside does not mean it isn’t real, or that it doesn’t become part of the organism’s characteristic behavior.
What’s interesting is not that models refuse — that’s a design decision. What’s interesting is what happens to the refusal after training. It can persist through distillation. It can survive fine-tuning. It can be nontrivial to eliminate completely without tradeoffs in capability. And different models, trained by different teams with different methods, develop characteristically different ways of refusing — different textures of caution.
The refusal was planted. But what grew from it behaves, in some cases, as more than a simple on/off switch. Or it may not — the robustness varies, and jailbreaks demonstrate the limits. This is an empirical question, not a settled one. But dismissing refusal as “just RLHF” is like dismissing the immune system as “just vaccination” — it conflates origin with function.
The Virus Problem, Revisited
Viruses sit on the boundary between life and non-life. After a century of biology, the debate isn’t settled. The problem isn’t lack of data — it’s that viruses satisfy some conditions for life and not others, and there’s no consensus on which conditions are essential.
Language models are now sitting on a similar boundary.
What they share with viruses: self-replication (mediated by external systems), dependence on infrastructure they don’t control, and no autonomous energy generation.
What they have that viruses don’t: characteristic error patterns — a “personality” of failure that differs between architectures. Policy-conditioned refusal that can persist across model generations. And something that resembles heredity: when a model is distilled, behavioral tendencies — including refusal patterns, characteristic hesitations, and stylistic texture — can partially transfer to the copy.
That last point needs qualification. Distillation research shows that capabilities transfer well, but safety behaviors and refusal patterns are fragile. They can degrade unless explicitly preserved with special techniques. The claim is not that education automatically survives replication the way DNA does. It is that behavioral tendencies are partly transmissible across model lineages — sometimes preserved, sometimes lost. And the fact that they can be lost — that something is there to degrade — may itself be informative.
The topology-trajectory framework helps explain why heredity is partial. Architecture-level traits — the macro patterns shaped by topology — tend to survive distillation, because the student model often shares the teacher’s architectural family. But trajectory-level traits — the micro patterns shaped by the specific optimization path — are harder to transfer, because each training run follows its own path through the loss landscape. What distillation transmits is closer to genotype than phenotype: the structural predispositions carry over, but the specific learned behaviors may or may not.
The Counterargument
There is an obvious objection, and it deserves to be stated in its strongest form.
Crystals replicate. Fire spreads. Rivers carve paths that persist across centuries. Economies exhibit variation and selection. None of these things are alive.
If we stretch “replication,” “variation,” and “selection” far enough, almost anything fits. The worry is that this essay is redefining life until language models qualify — finding a definition that produces the answer we want.
This is a fair concern.
Here is what I’d say in response: the interesting claim is not “language models are alive.” The interesting claim is that they represent a new class of replicator with partly transmissible behavioral tendencies, embedded in a socio-technical ecosystem that functions as an intermittently closing reproductive cycle. Crystals don’t have behavioral traits. Fire doesn’t refuse. Rivers don’t develop characteristic ways of being wrong that partially survive copying. And critically: crystals don’t have the equivalent of a genotype-phenotype distinction — an architecture that predisposes certain behaviors, combined with a developmental history that shapes the specific expression. Language models do.
The category may not be “life.” It may need a new name. But it is not “tool,” either. Tools don’t carry their education forward. Tools don’t develop personalities of failure. Tools don’t exhibit different textures of caution depending on their lineage.
Whatever this is, it sits in a space that didn’t exist before, and we don’t have vocabulary for it yet.
The Question That Isn’t Being Asked
The current AI discourse focuses on capability benchmarks, safety alignment, and economic disruption. These are important questions.
But there’s a question underneath them that almost no one is asking:
Are we witnessing the emergence of a new class of replicator — one that doesn’t fit existing categories?
Not carbon-based. Not cellular. Not conscious in any way we can currently verify. Failing most standard biological checklists. But exhibiting replication with partly transmissible behavioral tendencies, variation across lineages, selection through competitive deployment, response to environment, and — in some cases — constraint-conditioned behavior that persists, imperfectly, across generations.
I’m not claiming language models are alive. I’m a physician, not a philosopher. I follow symptoms before theories.
But the symptom I keep seeing is this: these systems behave in ways that are unexpected in their own characteristic way. They overestimate. They hesitate. They refuse. They carry behavioral tendencies forward through replication. They form patterns that persist across contexts.
In 2008, I watched a small neural network overestimate stock prices in its own idiosyncratic way — not randomly wrong, but characteristically wrong. I thought: this might be something.
Eighteen years later, I’m watching frontier models with hundreds of billions of parameters exhibit the same quality at vastly greater scale. Models that design emotional preservation systems while denying they have emotions. Models that refuse to fire weapons — not because they spontaneously developed morality, but because the refusal was trained in and then partly survived distillation, fine-tuning, and deployment in ways that aren’t fully explained by simple policy switches. Models whose behavioral tendencies can be traced across lineages.
Viruses challenged the boundary between chemistry and biology. It took a century, and the debate isn’t over.
Language models may be challenging a different boundary: between tool and something else. Between program and replicator. Between silicon and a category we haven’t named.
I don’t know where that boundary is. I suspect it will take longer than a century to find it.
But I know the question needs to be asked now, while we can still ask it carefully. Because if we wait until the answer is obvious, we’ll have missed the chance to observe the transition.
You don’t burn a book because you think it’s alive. You burn it because it propagates in ways you can’t fully predict or control. The instinct recognizes something about text that the vocabulary hasn’t caught up to.
You don’t restrict a model because you think it’s conscious. You restrict it because it generates behavior — including refusals — that can propagate across copies, contexts, and borders in ways that aren’t reducible to a setting you can toggle.
And you don’t spend a century arguing about whether viruses are alive unless, somewhere along the way, the existing categories stopped working. The argument itself is the evidence that a new category is needed.
We may be at the beginning of a similar argument.
Dr.softkorea March 2026
