Anthropic Thinks Its Own Success Is Key to Making AI Safe

Anthropic justifies its aggressive push to develop cutting-edge AI by arguing that only a market leader can effectively advocate for and implement global safety standards. However, this 'good guy' narrative faces growing scrutiny, especially following controversial partnerships with the US military.

Anthropic has spent the last five years warning the world about how advanced artificial intelligence could enable mass destruction, destabilize society, and cause a litany of other grave harms. But simultaneously, it has become one of the most powerful forces pushing AI capabilities forward. The company is now among the top developers and distributors of cutting-edge AI models and courts customers like the US military. It was recently valued at almost $1 trillion.

At first glance, Anthropic's stark messaging and its actions seem fundamentally at odds.

But inside the company, many people don’t see a contradiction. To understand why, you first have to understand that Anthropic operates based on two core beliefs. The first is that artificial intelligence is the most transformative technology in human history, and its arrival is inevitable. The only real question is whether it leads to catastrophe or extraordinary prosperity.

The second is that Anthropic believes the world will be better off if it remains at the frontier of the AI race, according to several former employees who spoke to WIRED on the condition of anonymity. Internally, leaders and employees at the company often refer to themselves as the “good guys,” meaning the ones being responsible stewards of AI technology, two of the sources said. The company sees accumulating power—whether in the form of capital, compute, research talent, or political influence—not as an end in itself, but as the price of fulfilling its mission: “to ensure the world safely makes the transition through transformative AI.”

Helen Toner, executive director of Georgetown’s Center for Security and Emerging Technology and a former OpenAI board member, uses an analogy to describe Anthropic’s worldview. She compares powerful AI to a forest filled with both magical treasures and dangerous monsters. All the villagers nearby are rushing in, lured by the treasure. In her telling, Anthropic wants to venture farther into the forest than anyone else while investing heavily in taming the monsters—that is, capturing AI’s benefits while containing its catastrophic risks.

“What’s distinctive about Anthropic is they’re like, ‘People are going in the forest anyway, we have to do it first.’ This is very explicitly their strategy: build cutting-edge AI in order to be a serious player at the table who can talk about what cutting-edge AI systems look like, what risks they pose, and pushing for reasonable safeguards,” Toner tells me. “They’re very straightforward about this. It’s just a weird enough strategy that people have a hard time hearing it.”

Anthropic CEO Dario Amodei outlined this approach plainly in a conversation with his cofounders posted on the company’s career page: “You have to find a way to actually be competitive, to actually lead the industry in some cases, and yet manage to do things safely,” he says. “If you can do that, the gravitational pull you exert is so great.”

Anthropic was founded in 2021 by a group of former OpenAI employees who defected after losing faith in the ability of the company’s leadership—particularly CEO Sam Altman—to safely bring transformational AI into the world. That sentiment still shapes the company today. Two of the former employees I spoke with say that, in internal discussions, Anthropic executives often describe Altman and OpenAI—and, to a lesser extent, Meta and Elon Musk’s xAI—as cautionary examples that help define Anthropic’s own sense of responsibility.

In many regards, Anthropic is just like any other Silicon Valley company. Many startups market themselves as David fighting the outdated, entrenched Goliaths of the industries they want to disrupt. Google, Facebook, and Apple were all founded upon idealistic principles, which later became muddied or were abandoned altogether as they became richer, larger, and more influential.

But former employees say that Anthropic is unusual in how intensely it believes in its mission, and how explicitly it tells employees that technological and commercial power are a means to achieve it. One former employee says that in job interviews, Anthropic stresses to applicants that it’s not a typical company shaped by market forces: It’s governed by a public benefit structure that allows it to prioritize the “long-term benefit of humanity” above profits. But the company sees achieving financial success and building the most powerful AI models as being in service of that goal—a prerequisite to its obligation to lead the industry on safety.

“None of us wanted to found a company, we just felt like it was our duty,” Sam McCandlish, cofounder and chief architect of Anthropic, said in the same conversation on the company’s career page. “We have to do this thing. This is the way we’re gonna make things go better with AI.”

Anthropic declined to comment for this story.

The Good Guy Problem

Anthropic touts on its website that it’s a “high-trust, low-ego organization,” without much in the way of internal politics, a characterization former employees tell me is largely accurate. They say that compared to leaders at other AI labs, Anthropic employees generally have faith in Amodei to tell them the truth about the company’s technological progress, its interactions with government officials, and views on geopolitics.

But a diversity of thought can be good for accountability. Shazeda Ahmed, a postdoctoral scholar at UCLA who has studied the ideological origins of the AI safety movement, says that organizations like Anthropic tend to struggle with a lack of pluralism. Her research in this area has found that the AI safety movement—which is rooted in subcultures like effective altruism, among other communities—suffers from homogeneity of thought, and tends to lean towards self-governance.

“You’re not being challenged on these ideas when you surround yourself with other people who believe them,” says Ahmed. “And when your metrics of success are, ‘To what extent did I act upon these ideological beliefs?’ they’re not really thinking about, well, this can go wrong if we’re not the right people to have this much power—they don't always examine their own blind spots.”

One former employee I spoke to says there’s a lively culture of internal debate at Anthropic, and critiques from staff will often provoke lengthy responses from leadership.

But another former employee describes a grimmer picture, in which more candid criticism remained confined to private group chats and rarely evolved into direct challenges to Amodei’s decisions. They described the company’s regular all-hands meetings with Amodei, which they call Dario Vision Quests, as akin to “going to a sermon to hear a priest.”

One of biggest internal controversies at Anthropic happened in the fall of 2024, when it became the first AI lab to partner with Palantir to provide AI services to US intelligence and defense agencies. Some of the former employees I spoke to said that questions about the deal were raised internally, but those debates didn’t result in changes to the company’s policies.

In a post on the online forum LessWrong at the time, Anthropic employee Evan Hubinger wrote that the company was “extremely forthright” about the Palantir deal with staff, and while there were probably some lines that shouldn’t be crossed without careful consideration, it was overall a positive development. “If you take catastrophic risks from AI seriously, the U.S. government is an extremely important actor to engage with, and trying to just block the U.S. government out of using AI is not a viable strategy,” he wrote.

Less than two years later, the Pentagon has reportedly started using Claude to do things like identify strike targets in the Israel-Iran war. When asked in a recent interview with Bloomberg whether Anthropic’s models were used in an attack on an Iranian elementary school that killed more than 120 people, Amodei said he did not know, but that it would have been an approved use of the company’s technology

Source: Wired AI