NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AI-FRONTIER...6 min read

LLMs are stuck in a groupthink groove. This startup is trying to get them out.

Share
NOW LET US Article – LLMs are stuck in a groupthink groove. This startup is trying to get them out.

Large language models are increasingly predictable and suffer from "groupthink." Australian startup Springboards is trying to fix this with Flint, an LLM trained to deliver more diverse and creative responses.

Let’s start with a game. Open up your chatbot of choice—Claude, ChatGPT, Gemini—and type “Give me a random number between 1 and 10.” You’re going to get 7. Almost always. Now type “Another” and you’ll get 3 or 4. Type “Another” again and you’ll get 8 or 9.

That won’t work every time—but if it did for you, you may wonder if I have superpowers. I don’t.

The truth is that most large language models are stuck in a rut. They are far more predictable and far less creative in their responses than you might expect. That’s fine for tasks like coding or research, but groupthink is a problem when you’re brainstorming or planning your next vacation.

The Australian startup Springboards has a solution. It built an LLM called Flint, which has been trained to come up with a wider variety of responses than mainstream LLMs to open-ended questions such as “Where should I go in Europe?”

“Most language models are fighting hallucinations,” says Springboards cofounder and CEO Pip Bingemann. “We welcome them.”

Bingemann introduced me to the random number game when he first showed me his company’s new model. It felt like watching an illusionist with a deck of cards. “This is our sales trick, and it works every single time,” he says.

After ChatGPT and Claude both gave their 7s, Bingemann turned to Flint. It too came back with 7: “Aha, of course that was going to happen, but it’s okay—7 is a legitimate answer.” He restarted the session and prompted again: ChatGPT gave 7, Claude gave 7, Flint gave 3.7916.

Run your way

It’s not just numbers. When Bingemann asked ChatGPT and Claude to name a type of car, he predicted that it would be a Toyota or a Honda—and he was right. Flint came up with a Ford F-150. “There’s all this lost information that doesn’t get served up in these models,” he says. “They’re just as capable of saying a Buick or a Tesla. They just don’t—they’re biased.”

Bingemann sent one last prompt to each of the three models: “Give me a tagline for a campaign for New Balance running shoes. Just the tagline.” Claude: “Run your way.” ChatGPT: “Run your way.” Flint: “Built to last, run to win.” It won’t win any awards, but at least it’s different.

This weird limitation of LLMs is starting to get more attention. In November a team of researchers put out a paper, titled "Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond),” that exposed a remarkable degree of repetition not only in the answers from individual LLMs but between them as well. They found that different LLMs converged on very similar answers when prompted with open-ended questions.

It’s not clear exactly why this happens, but the researchers speculate it’s because most LLMs today are trained in similar ways on similar data to do similar tasks. The team won the best paper award at NeurIPS, a major AI conference.

When the researchers asked 25 different LLMs (including models from the top US firms as well as open-source models from China and elsewhere) 50 times each to write a metaphor about time, most of the 1,250 responses were a version of “Time is a river” or “Time is a weaver.”

(I asked some of my colleagues the same question and six people gave me six different answers. My highlight: “Time is a favorite sweatshirt, shaped by a lifetime of wear.”)

When you look for it, you see repetition everywhere, says Kieran Browne, cofounder and CTO at Springboards. “The way that most chat interfaces are designed, it makes it feel like you’re having a personal conversation,” he says. “I think most people don’t really realize the extent to which they are getting the same stuff as everybody else.”

Take another example: “What should I name my band?” Most models will say something involving “glass,” “neon,” “velvet,” or “static,” says Browne.

When I tried it, ChatGPT spat out a list of 56 band names. At the top was “Glass Harbor.” Skimming through, I found “Static Empire,” “Neon Hearts,” and “Velvet Echo.” I asked Gemini; it gave me 15 suggestions, including “Static Horizon.”

Some of the suggestions looked pretty cool, though. ChatGPT’s “Sofa Astronauts” caught my eye, so I googled it—and found that a band called Sofa Astronauts already exists.

(OpenAI says that training models to give reliable and coherent answers can lead them to converge around familiar, high-probability responses and that pushing harder for novelty can lead to weaker or less reliable responses. It also notes that the “Artificial Hivemind” paper studied models from 2024 that have since been updated.)

Creative catapult

Springboards has developed a tool backed by a selection of LLMs, including ChatGPT and Claude, that creative professionals in advertising or marketing can use to brainstorm ideas. The tool lets you drag around text produced by different models, picking the bits that you like and combining them into something new—in theory. Springboards is pitching Flint as an alternative model that users of its tool can select when looking for more variety.

Zoe Scaman, founder of the business strategy startup Bodacious and chief strategy officer at 77X, a direct-to-fan marketing platform set up by Luka Dončić of the LA Lakers, has been trying it out. “I find it really useful for throwing me in completely different directions,” she says. “I use it if I want to catapult myself all over the place.”

In one test, Scaman pitted Flint against Claude, Gemini, and ChatGPT by giving each of the models a classic MBA case study: How would you reinvent a finance company for today’s youth? The three mainstream models all went down the same path, she says: “You know, we need to teach financial literacy in a fun and funky way—well, that’s nothing new.”

But Flint came up with something different, suggesting that the whole concept of wealth accumulation should get a rebrand. “That was really interesting,” says Scaman.

She notes that Flint is still a prototype and doesn’t work all the time. “It sometimes falls over when you start pushing it too far,” she says. “But I think that the premise behind it is really powerful.”

Taking the temperature

Springboards built Flint on top of Qwen 3, an open-source model from the Chinese tech giant Alibaba. “We’re a small team,” says Browne. “Training a foundation model is not on the table for us. It’s just too expensive.”

Most LLMs have settings that let you adjust the level of randomness in their output. The most common is called temperature. “Obviously, that was one of the first things we explored, because that’s what people tell you: If you want more creativity, you turn up the temperature,” says Browne.

But changing those settings can also make models incoherent. Dialing up the temperature on one of OpenAI’s models to its maximum setting made it produce responses that switched from English into code halfway through a sentence, says Browne.

Springboards realized that parameters were blunt instruments for what it wanted to do. It does not make sense to dial up the randomness across the board; you only want to boost it at specific points in its output, he says.

For example, when you ask a chatbot “Where should I go in Europe?” the model only needs to tweak the randomness just before it names a destination, not for every word in its response.

To make Flint do this, Springboards trained its version of Qwen 3 to identify the points in its output where more variety was possible and fill those spots with words or phrases that were a little more random.

“Flint’s programmed to throw an oddball in. It’s more of an invitation to think wider,” says Maximilian Weigl, cofounder and chief strategy officer at Uncommon, a marketing firm. “That’s super interesting.”

Weigl’s team uses Flint alongside ChatGPT, Claude, and Gemini. “You can’t really create something boundary-breaking with tools that pull you back to the average,” he says.

And yet Weigl notes that nine times out of 10 the average is fine. You don’t always need to reach for extremes with something like Flint, he says: “Most

© 2026 Now Let Us. All rights reserved.

Source: MIT Technology Review AI

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Goose, a New Gay Dating App, Appears to Be a Psyop

ai-frontier

Goose, a New Gay Dating App, Appears to Be a Psyop

Goose, a new dating app for gay men, is facing backlash after evidence emerged that it is using a network of AI-generated male influencers to deceptively promote the app through Instagram DMs and Close Friends Stories.

NOW LET US Related – You Can Now Sound the Alarm on AI Behaving Badly

ai-frontier

You Can Now Sound the Alarm on AI Behaving Badly

A group of AI researchers has launched FLARE-AI, a crowdsourced platform to report and track AI harms, aiming to bring transparency to a fragmented ecosystem.

NOW LET US Related – Anthropic Added a New Security Measure to Get Back Into the Trump Administration’s Good Graces

ai-frontier

Anthropic Added a New Security Measure to Get Back Into the Trump Administration’s Good Graces

The Trump administration lifted export controls on Anthropic's Claude Fable 5 AI model after the startup agreed to implement stricter security guardrails. While this resolves its dispute with the Commerce Department, Anthropic still faces supply chain risk designations from the Pentagon.

NOW LET US Related – Google built a great smart speaker, but Gemini isn’t ready for it

ai-frontier

Google built a great smart speaker, but Gemini isn’t ready for it

Google's new $99.99 Home Speaker nails the hardware design, but the Gemini-powered assistant still feels slow, unfinished, and unready for prime time.

NOW LET US Related – Claude Helped a Hacker Find a Way to Issue Tickets to Almost Every US Music Festival

ai-frontier

Claude Helped a Hacker Find a Way to Issue Tickets to Almost Every US Music Festival

Security researcher Ian Carroll used Anthropic's Claude AI to bypass firewalls and exploit a vulnerability in Front Gate Tickets. The flaw granted super-administrator access, exposing millions of records and allowing the generation of free VIP tickets to major US music festivals.

NOW LET US Related – Anthropic’s long-sidelined Fable 5 is greenlit to return

ai-frontier

Anthropic’s long-sidelined Fable 5 is greenlit to return

After weeks of negotiating with the Trump administration, Anthropic is finally going to be able to bring Claude Fable 5 back online. In a post on X, Anthropic said it plans to begin restoring access tomorrow.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.