How robots learn: A brief, contemporary history

A look at the evolution of robotics from rigid, rule-based programming to modern AI-driven learning models that allow machines to adapt and interact with the real world.
Roboticists used to dream big but build small. They’d hope to match or exceed the extraordinary complexity of the human body, and then they’d spend their career refining robotic arms for auto plants. Aim for C-3P0; end up with the Roomba.
The real ambition for many of these researchers was the robot of science fiction—one that could move through the world, adapt to different environments, and interact safely and helpfully with people. For the socially minded, such a machine could help those with mobility issues, ease loneliness, or do work too dangerous for humans. For the more financially inclined, it would mean a bottomless source of wage-free labor. Either way, a long history of failure left most of Silicon Valley hesitant to bet on helpful robots.
That has changed. The machines are yet unbuilt, but the money is flowing: Companies and investors put $6.1 billion into humanoid robots in 2025 alone, four times what was invested in 2024.
What happened? A revolution in how machines have learned to interact with the world.
Imagine you’d like a pair of robot arms installed in your home purely to do one thing: fold clothes. How would it learn to do that? You could start by writing rules. Check the fabric to figure out how much deformation it can tolerate before tearing. Identify a shirt’s collar. Move the gripper to the left sleeve, lift it, and fold it inward by exactly this distance. Repeat for the right sleeve. If the shirt is rotated, turn the plan accordingly. If the sleeve is twisted, correct it. Very quickly the number of rules explodes, but a complete accounting of them could produce reliable results. This was the original craft of robotics: anticipating every possibility and encoding it in advance.
Around 2015, the cutting edge started to do things differently: Build a digital simulation of the robotic arms and the clothes, and give the program a reward signal every time it folds successfully and a dings every time it fails. This way, it gets better by trying all sorts of techniques through trial and error, with millions of iterations—the same way AI got good at playing games.
The arrival of ChatGPT in 2022 catalyzed the current boom. Trained on vast amounts of text, large language models work not through trial and error but by learning to predict what word should come next in a sentence. Similar models adapted to robotics were soon able to absorb pictures, sensor readings, and the position of a robot’s joints and predict the next action the machine should take, issuing dozens of motor commands every second.
This conceptual shift—to reliance on AI models that ingest large amounts of data—seems to work whether that helpful robot is supposed to talk to people, move through an environment, or even do complicated tasks. And it was paired with other ideas about how to accomplish this new way of learning, like deploying robots even if they aren’t yet perfect so they can learn from the environment they’re meant to work in. Today, Silicon Valley roboticists are dreaming big again. Here’s how that happened.
Jibo
Jibo
A movable social robot carried out conversations long before the age of LLMs.
An MIT robotics researcher named Cynthia Breazeal introduced an armless, legless, faceless robot called Jibo to the world in 2014. It looked, in fact, like a lamp. Breazeal’s aim was to create a social robot for families, and the idea pulled in $3.7 million in a crowdsourced funding campaign. Early preorders cost $749.
The early Jibo could introduce itself and dance to entertain kids, but that was about it. The vision was always for it to become a sort of embodied assistant that could handle everything from scheduling and emails to telling stories. It earned a number of devoted users, but ultimately the company shut down in 2019.
In retrospect, one thing that Jibo really needed was better language capabilities. It was competing against Apple’s Siri and Amazon’s Alexa, and all those technologies at the time relied on heavy scripting. In broad terms, when you spoke to them, software would translate your speech into text, analyze what you wanted, and create a response pulled from preapproved snippets. Those snippets could be charming, but they were also repetitive and simply boring—downright robotic. That was especially a challenge for a robot that was supposed to be social and family oriented.
What has happened since, of course, is a revolution in how machines can generate language. Voice mode from any leading AI provider is now engaging and impressive, and multiple hardware startups are trying (and failing) to build products that take advantage of it.
But that comes with a new risk: While scripted conversations can’t really go off the rails, ones generated by AI certainly can. Some popular AI toys have, for example, talked to kids about how to find matches and knives.
OpenAI
Dactyl
A robot hand trained with simulations tries to model the unpredictability and variation of the real world.
By 2018, every leading robotics lab was trying to scrap the old scripted rules and train robots through trial and error. OpenAI tried to train its robotic hand, Dactyl, virtually—with digital models of the hand and of the palm-size cubes Dactyl was supposed to manipulate. The cubes had letters and numbers on their faces; the model might set a task like “Rotate the cube so the red side with the letter O faces upward.”
Here’s the problem: A robotic hand might get really good at doing this in its simulated world, but when you take that program and ask it to work on a real version in the real world, the slight differences between the two can cause things to go awry. Colors might be slightly different, or the deformable rubber in the robot’s fingertips could turn out to be stretchier than it was in simulation.
The solution is called domain randomization. You essentially create millions of simulated worlds that all vary slightly and randomly from one another. In each one the friction might be less, or the lighting more harsh, or the colors darkened. Exposure to enough of this variation means the robots will be better able to manipulate the cube in the real world. The approach worked on Dactyl, and one year later it was able to use the same core techniques to do something harder: solving Rubik’s Cubes (though it worked only 60% of the time, and just 20% when the scrambles were particularly hard).
Still, the limits of simulation mean that this technique plays a far smaller role today than it did in 2018. OpenAI shuttered its robotics effort in 2021 but has recently started the division up again—reportedly focusing on humanoids.
Google DeepMind
RT-2
Training on images from across the internet helps robots translate language into action.
Around 2022, Google’s robotics team was up to some strange things. It spent 17 months handing people robot controllers and filming them doing everything from picking up bags of chips to opening jars. The team ended up cataloguing 700 different tasks.
The point was to build and test one of the first large-scale foundation models for robotics. As with large language models, the idea was to input lots of text, tokenize it into a format an algorithm could work with, and then generate an output. Google’s RT-1 received input about what the robot was looking at and how the many parts of the robotic arm were positioned; then it took an instruction and translated it into motor commands to move the robot. When it had seen tasks before, it carried out 97% of them successfully; it succeeded at 76% of the instructions it hadn’t seen before.
The second iteration, RT-2, came out the following year and went even further. Instead of training on data specific to robotics, it went broad: It trained on more general images from across the internet, like the vision-language models lots of researchers were working on at the time. That allowed the robot to interpret where certain objects were in the scene.
Source: MIT Technology Review AI









