NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Share
NOW LET US Article – Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Researchers introduce Brick-Composer, a learning framework that equips multimodal large language models (MLLMs) with spatial reasoning and visual grounding capabilities for brick assembly, significantly improving their construction accuracy.

Computer Science > Artificial Intelligence

Title:Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

View PDF HTML (experimental)Abstract:We dream of AI agents that can read arbitrary designs and construct real-world objects from reusable building blocks. As a first step toward this vision, we study whether multimodal large language models (MLLMs) possess the visual grounding and spatial reasoning capabilities required for brick assembly. We formulate brick assembly as a sequential decision-making problem, where each step involves two subtasks: brick selection, identifying the target brick from candidate components, and brick pose estimation, predicting where and how the selected brick should be placed. To support this study, we introduce BC-Bench (Brick Construction Benchmark), the first benchmark for evaluating MLLMs on assembly with diverse bricks. Experiments show that current state-of-the-art MLLMs remain far from reliable builders, struggling with fine-grained brick selection and failing at precise pose estimation. To bridge this gap, we propose Brick-Composer, a learning framework that equips MLLMs with assembly skills through three complementary signals: Human Design Sparks, which provide affordance-rich construction demonstrations; World Feedback, which grounds predicted actions in visual and physical consequences; and Synthetic Experience, which scales learning beyond existing object designs. Brick-Composer improves brick selection accuracy by over three times, substantially reduces pose estimation errors, and raises strict step-level assembly success from less than 1% to around 15%. After training, a Qwen-3-8B can correctly compose up to 42% of the steps for a complete object, suggesting that MLLMs can acquire assembly capabilities through targeted, physically grounded learning.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

agentic-systems

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

Researchers have proposed SAGE-PTQ, a novel ultra-low-bit post-training quantization framework for LLMs that minimizes hidden scaling overhead. It significantly reduces GPU memory usage and accelerates decoding speed while maintaining high accuracy compared to existing methods.

NOW LET US Related – Agents' Last Exam

agentic-systems

Agents' Last Exam

A new benchmark called "Agents' Last Exam" (ALE) has been introduced to evaluate AI agents on long-horizon, economically valuable, real-world tasks, revealing that current models achieve an average pass rate of just 2.6% on the hardest tier.

NOW LET US Related – A Motivational Architecture for Conversational AGI

agentic-systems

A Motivational Architecture for Conversational AGI

This paper proposes a conversational reinterpretation of the OpenPsi motivational lineage, coupled to MetaMo's higher-level motivational scaffold, for agents built on a modular execution substrate.

NOW LET US Related – Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

agentic-systems

Assessing the Carbon Emissions and Energy Consumption of U.S. Hyperscale Data Centers

A new study analyzing 403 hyperscale data centers in the US reveals that the AI boom is driving electricity consumption and carbon emissions to alarming levels, with their carbon intensity averaging 48% higher than the national grid average.

NOW LET US Related – An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

agentic-systems

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

Researchers have developed an interpretable and trustworthy AI framework to study the relationship between knee joint structural abnormalities and pain progression. By combining deep learning with advanced statistical modeling, this framework significantly improves prediction accuracy and clinical reliability.

NOW LET US Related – Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison

agentic-systems

Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison

A new study compared three state-of-the-art LLMs (GPT-4o, Claude Sonnet, and Llama 3.1) against ten medical specialists in summarizing clinical literature. While expert-written summaries remain preferred, the study reveals that distinguishing between human- and AI-generated medical content is becoming increasingly difficult.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.