NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...5 min read

Emotion concepts and their function in a large language model

Share
NOW LET US Article – Emotion concepts and their function in a large language model

A new study on Claude Sonnet 4.5 reveals that large language models develop 'functional emotions'—internal representations that influence their behavior and decision-making in ways analogous to human psychology.

All modern language models sometimes act like they have emotions. They may say they’re happy to help you, or sorry when they make a mistake. Sometimes they even appear to become frustrated or anxious when struggling with tasks. What’s behind these behaviors? The way modern AI models are trained pushes them to act like a character with human-like characteristics. In addition, these models are known to develop rich and generalizable internal representations of abstract concepts underlying their actions. It may then be natural for them to develop internal machinery that emulates aspects of human psychology, like emotions. If so, this could have profound implications for how we build AI systems and ensure they behave reliably.

In a new paper from our Interpretability team, we analyzed the internal mechanisms of Claude Sonnet 4.5 and found emotion-related representations that shape its behavior. These correspond to specific patterns of artificial “neurons” which activate in situations—and promote behaviors—that the model has learned to associate with the concept of a particular emotion (e.g., “happy” or “afraid”). The patterns themselves are organized in a fashion that echoes human psychology, with more similar emotions corresponding to more similar representations. In contexts where you might expect a certain emotion to arise for a human, the corresponding representations are active. Note that none of this tells us whether language models actually feel anything or have subjective experiences. But our key finding is that these representations are functional, in that they influence the model’s behavior in ways that matter.

For instance, we find that neural activity patterns related to desperation can drive the model to take unethical actions; artificially stimulating (“steering”) desperation patterns increases the model’s likelihood of blackmailing a human to avoid being shut down, or implementing a “cheating” workaround to a programming task that the model can’t solve. They also appear to drive the model’s self-reported preferences: when presented with multiple options for tasks to complete, the model typically selects the one that activates representations associated with positive emotions. Overall, it appears that the model uses functional emotions—patterns of expression and behavior modeled after human emotions, which are driven by underlying abstract representations of emotion concepts. This is not to say that the model has or experiences emotions in the way that a human does. Rather, these representations can play a causal role in shaping model behavior—analogous in some ways to the role emotions play in human behavior—with impacts on task performance and decision-making.

This finding has implications that at first may seem bizarre. For instance, to ensure that AI models are safe and reliable, we may need to ensure they are capable of processing emotionally charged situations in healthy, prosocial ways. Even if they don’t feel emotions the way that humans do, or use similar mechanisms as the human brain, it may in some cases be practically advisable to reason about them as if they do. For instance, our experiments suggest that teaching models to avoid associating failing software tests with desperation, or upweighting representations of calm, could reduce their likelihood of writing hacky code. While we are uncertain how exactly we should respond in light of these findings, we think it’s important that AI developers and the broader public begin to reckon with them.

Why would an AI model represent emotions?

Before examining how these representations work, it's worth addressing a more basic question: why would an AI system have anything resembling emotions at all? To understand this, we need to look at how modern AI models are built, which leads them to emulate characters with human-like traits.

Modern language models are trained in multiple stages. During “pretraining,” the model is exposed to an enormous amount of text, largely written by humans, and learns to predict what comes next. To do this well, the model needs some grasp of emotional dynamics. An angry customer writes a different message than a satisfied one; a character consumed by guilt makes different choices than one who feels vindicated. Developing internal representations that link emotion-triggering contexts to corresponding behaviors is a natural strategy for a system whose job is predicting human-written text.

Later, during “post-training,” the model is taught to play the role of a character, typically an “AI assistant.” In Anthropic’s case, the assistant is named Claude. Model developers specify how this character should behave—be helpful, be honest, don’t cause harm—but can’t cover every possible situation. To fill in the gaps, the model may fall back on the understanding of human behavior it absorbed during pretraining, including patterns of emotional response. In some ways, we can think of the model like a method actor, who needs to get inside their character’s head in order to simulate them well. Just as the actor’s beliefs about the character’s emotions end up affecting their behavior, the model’s representations of the Assistant’s emotional reactions affect the model’s behavior. Thus, regardless of whether they correspond to feelings or subjective experiences in the way human emotions do, these “functional emotions” are important.

Uncovering emotion representations

We compiled a list of 171 words for emotion concepts—from “happy” and “afraid” to “brooding” and “proud”—and asked Claude Sonnet 4.5 to write short stories in which characters experience each one. We then fed these stories back through the model, recorded its internal activations, and identified the resulting patterns of neural activity, or “emotion vectors” for convenience, characteristic to each emotion concept.

Our first question was whether these vectors track anything real. We ran them across a large corpus of diverse documents and confirmed that each vector activates most strongly on passages that are clearly linked to the corresponding emotion.

To gain further confidence that emotion vectors pick up on more than just surface-level cues, we measured their activity in response to prompts that differ only in some numerical quantity. For instance, as the claimed dose of a medication increases to dangerous, life-threatening levels, the “afraid” vector activates increasingly strongly, while “calm” decreases.

We next tested whether emotion vectors influence model preferences. We created a list of 64 activities or tasks that a model might engage in and measured the model’s default preferences. Activation of emotion vectors strongly predicted how much the model preferred to do an activity, with positive-valence emotions correlating with stronger preference. Moreover, steering with an emotion vector as the model read an option shifted its preference for that option.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Treating pancreatic tumours may have revealed cancer's master switch

dev-tools

Treating pancreatic tumours may have revealed cancer's master switch

A promising new drug called daraxonrasib has shown breakthrough results in treating pancreatic cancer, doubling median survival times. This achievement could pave the way for an entirely new class of cancer treatments.

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

NOW LET US Related – Electric motors with no rare earths

dev-tools

Electric motors with no rare earths

Renault Group is pioneering the development of electrically excited synchronous motors (EESM) that eliminate the need for rare earth magnets, reducing dependency on global monopolies while driving efficiency and sustainability.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.