NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...4 min read

Unverified: What Practitioners Post About OCR, Agents, and Tables

Share
NOW LET US Article – Unverified: What Practitioners Post About OCR, Agents, and Tables

A deep dive into the unverified but consistent reports from engineers and practitioners on the real-world challenges of OCR and AI document processing, highlighting the gap between vendor promises and production reality.

On This Page

  • About This Report
  • The Demo Works. Production Does Not.
  • The OCR Fragmentation
  • 85% on Page One. 65% by Page Three.
  • The €2,000 Stack
  • The Hybrid Pipeline Won

About This Report

I spent a month reading engineering forums and practitioner discussion boards instead of vendor press releases. Anonymous posts, unverified credentials, no editorial review. Someone claims to have processed 150,000 handwritten pages. Someone else claims their agent failed silently on day 11. A developer says they replaced $100 per month in API costs with a €2,000 eBay purchase. None of this is verified.

What I can verify is that the same patterns showed up independently across all 22 capability areas on this site. The same complaints, the same workarounds, the same numbers within the same ranges, posted by people who do not appear to know each other. That consistency is either a coincidence or a signal. I am treating it as a signal, with the caveat that forum posts are forum posts.

The Demo Works. Production Does Not.

Someone describing themselves as an operations coordinator writes about testing eight OCR tools on 200+ multilingual shipping invoices. Most destroyed table formatting. Perfectly organized invoices turned into alphabet soup. Adobe Acrobat, Google Docs upload, free online OCR tools all failed to maintain structure. ABBYY delivered better accuracy but felt dated. Weeks spent finding something that worked.

A poster claiming to process 10,000 NASA technical documents, scanned typewriter reports and handwritten notes and propulsion diagrams from the 1950s onward, describes rebuilding their entire pipeline from scratch using vision-language models. Off-the-shelf parsers broke down on the first batch.

An RPA developer describes spending weeks building regex-based document parsing for loan applications. Then rebuilding the entire workflow in two hours using n8n plus a language model.

From our February vendor coverage: Box Extract reported contract processing reduced from 20 minutes to under 2 minutes. UiPath's healthcare launch claimed medical record review dropped from 70 minutes to 6 minutes. If those numbers hold on vendor-selected use cases, they are impressive. The question is whether they hold on yours.

The OCR Fragmentation

Six months ago, a practitioner could name a preferred OCR engine with confidence. Based on what I read, that confidence is gone.

One widely discussed benchmark tested seven solutions on an academic document with footnotes, tables, figures, and equations. Mistral's API ranked first. Marker with Gemini second. Docling third. Tesseract did not place. The discussion that followed was more revealing than the rankings. Nearly every practitioner preferred a different stack. PaddleOCR. MinerU. Qwen2.5-VL with Marker. PyMuPDF4LLM. Each reportedly worked on someone's documents and failed on someone else's.

Posters describing handwriting-heavy workloads report legacy OCR achieves zero useful accuracy on cursive. Cloud OCR from Azure, Google, and AWS reportedly manages 45-50% on handwriting. For these posters, the shift to vision-language models is not optional.

85% on Page One. 65% by Page Three.

Someone describing a 12-month production deployment, not a benchmark but an operation processing 150,000+ handwritten pages, posted accuracy numbers that vendor pitch decks do not contain.

GPT-4.1 reportedly achieved roughly 85% accuracy on clean single-page handwriting. By page three it dropped to 65%. The poster describes the model fabricating data for later pages rather than flagging uncertainty. One inspector's name from page one appeared on page three's extraction where a different inspector had signed.

Claude Sonnet 4 was described as the most consistent at approximately 83% across all pages. But it returned editorial prose when raw field extraction was needed. Ask for structured JSON, get a summary of the document instead.

Gemini reportedly achieved around 84% on clean sections, 70% on messier content. Structured output came back valid JSON sometimes and garbled other times. Multiple posters independently described being surprised by Gemini's OCR quality, particularly on handwritten documents where other frontier models extracted nothing useful.

The €2,000 Stack

One poster documented replacing roughly $100 per month in cloud API costs with a Mac Studio M1 Ultra purchased on eBay for under €2,000. Three AI agents coordinating through Telegram. Qwen 3.5 running at 60 tokens per second. Vision processing, speech, document extraction, all local. Zero cloud dependencies. If the numbers are accurate, the hardware pays for itself in under two years.

This matches a broader pattern in what I read. Developers who need document processing describe building their own pipelines rather than evaluating IDP vendors. Docling, Marker, PaddleOCR, Kreuzberg, MinerU. The open-source tools appear to be good enough for many use cases.

The Hybrid Pipeline Won

The technical consensus, if forum posts reflect consensus, points to a two-stage architecture. A dedicated OCR or layout model converts documents to structured markdown, then a language model handles extraction and reasoning.

Posters describing large-scale deployments, one claims 2.6 million pages in a burst ingestion, report that the hybrid approach beats sending raw images directly to a vision model in both accuracy and cost. End-to-end LLM processing reportedly costs an order of magnitude more than reserving the LLM for extraction logic.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

NOW LET US Related – Electric motors with no rare earths

dev-tools

Electric motors with no rare earths

Renault Group is pioneering the development of electrically excited synchronous motors (EESM) that eliminate the need for rare earth magnets, reducing dependency on global monopolies while driving efficiency and sustainability.

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.