AGENTIC-SYSTEMSMarch 13, 20261 min read25 views

Figuring out why AIs get flummoxed by some games

Powerful AIs like DeepMind's AlphaGo, trained through self-play, have been found to fail at an entire category of 'impartial games,' revealing critical blind spots that need to be addressed for future AI development.

With its Alpha series of game-playing AIs, Google’s DeepMind group seemed to have found a way for its AIs to tackle any game, mastering games like chess and Go by repeatedly playing itself during training. But then some odd things happened as people started identifying Go positions that would lose against relative newcomers to the game but easily defeat a similar Go-playing AI.

While beating an AI at a board game may seem relatively trivial, it can help us identify failure modes of the AI, or ways in which we can improve their training to avoid having them develop these blind spots in the first place—things that may become critical as people rely on AI input for a growing range of problems.

A recent paper published in Machine Learning describes an entire category of games where the method used to train AlphaGo and AlphaChess fails. The games in question can be remarkably simple, as exemplified by the one the researchers worked with: Nim, which involves two players taking turns removing matchsticks from a pyramid-shaped board until one is left without a legal move.

Impartiality

Nim involves setting up a set of rows of matchsticks, with the top row having a single match, and every row below it having two more than the one above. This creates a pyramid-shaped board. Two players then take turns removing matchsticks from the board, choosing a row and then removing anywhere from one item to the entire contents of the row. The game goes until there are no legal moves left. It’s a simple game that can easily be taught to children.

It also turns out to be a critical example of an entire category of rule sets that define “impartial games.” These differ from something like chess, where each player has their own set of pieces; in impartial games, the two players share the same pieces and are bound by the same set of rules. Nim’s importance stems from a theorem showing that any position in an impartial game can be represented by a configuration of a Nim pyramid. Meaning that if something applies to Nim, it applies to all impartial games.

Source: Ars Technica AI

More in this category

agentic-systems

S1-Omni: A Unified Multimodal Reasoning Model for Scientific Understanding, Prediction, and Generation

Researchers have introduced S1-Omni, a unified multimodal reasoning model designed for AI for Science (AI4S). Outperforming advanced models like GPT-5.5 and Gemini-3.1-Pro on various benchmarks, S1-Omni addresses the fragmentation in modeling complex scientific data.

agentic-systems

NeurOWL: An LLM-Based Neural-symbolic Framework for Incomplete OWL Ontology Reasoning

Researchers have introduced NeurOWL, a novel LLM-based neuro-symbolic framework designed to address reasoning challenges in incomplete OWL ontologies. By unifying subsumption verification and ontology abduction, NeurOWL demonstrates robust performance across multiple domains.

agentic-systems

ToolVerse: Unlocking Massive Environments and Long-Horizon Tasks for Agentic Reinforcement Learning

Researchers introduce ToolVerse, a comprehensive framework designed to scale up agentic reinforcement learning environments and enable LLM agents to perform complex, long-horizon reasoning tasks using thousands of real-world tools.

NOW LET US Related – Neuro-Symbolic AI for LEED compliance: Document-Centric Benchmarking, Deterministic Numeric Checking, and When Multimodal Hurts

agentic-systems

Neuro-Symbolic AI for LEED compliance: Document-Centric Benchmarking, Deterministic Numeric Checking, and When Multimodal Hurts

A new study introduces a local neuro-symbolic AI pipeline to automate LEED v4.1 green building certification screening. The findings reveal that the small 4-billion-parameter Gemma 3 model outperforms larger models, while incorporating multimodal drawing images consistently degrades performance.

agentic-systems

SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

Researchers have introduced SeerGuard, a proactive safety framework that prevents mobile GUI agents from executing risky actions by predicting consequences beforehand using a Safety-Augmented World Model (SAWM).

NOW LET US Related – MGDT: MLLM-Guided Diffusion Transformer with Relation-Adaptive Mixture-of-Experts for Multimodal Knowledge Graph Completion

agentic-systems

MGDT: MLLM-Guided Diffusion Transformer with Relation-Adaptive Mixture-of-Experts for Multimodal Knowledge Graph Completion

Researchers have proposed MGDT, a novel framework for Multimodal Knowledge Graph Completion (MKGC) that utilizes an align-then-diffuse paradigm. By integrating a frozen MLLM and a Relation-Adaptive Mixture-of-Experts module, MGDT significantly outperforms existing baselines.