NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...3 min read

Show HN: Classify mechanical faults using Contrastive Language-Audio Pretraining

Share
NOW LET US Article – Show HN: Classify mechanical faults using Contrastive Language-Audio Pretraining

cardiag is an end-to-end audio-ML pipeline that cleans noisy mechanical recordings and uses a frozen CLAP model with linear heads to triage car faults.

cardiag

is an end-to-end audio-ML pipeline. It scrapes fault-sound clips from YouTube/TikTok, cleans the audio (isolating the mechanical sound from speech, music, and noise), embeds it with a frozen CLAP model, and trains small linear heads to triage the fault. It is exposed as a CLI and a live web app.

cardiag-demo.mp4

This is a proof of concept, and honest about what that means. Diagnosing a car fault from a phone recording is genuinely hard, so cardiag

is built as a calibrated triage aid rather than a diagnoser: it tells you whether something sounds wrong, roughly where in the car it is, and a ranked shortlist of likely parts. When the audio won't support a call, it says "uncertain" instead of bluffing.

The real contribution is the cleaning + honest-training recipe, which is reusable on other audio datasets. The modest accuracy here reflects how hard the problem is from crude phone audio (we hit the literature ceiling); the

samemethod reaches 0.93 AUROC on clean engine audio. See docs/DEFENSE.md.

Two pages visualize the first two stages of the pipeline:

  • Isolating the engine audio — an interactive look at the clean()

cascade pulling a short mechanical span out of noisy YouTube audio (speech, music, road noise). - CLAP, visualized — how the frozen CLAP model turns those spans into the 512-d embedding the linear heads classify.

Measured out-of-sample, leakage-safe (by-video grouped CV over 1,031 video groups; permutation p = 0.0005). These are honest numbers, not a leaderboard.

| Capability | Result | vs. chance | |---|---|---| | Is something wrong? (fault/normal) | AUROC 0.79 [0.76, 0.83] | 0.50 | | Where in the car? (6 zones) | right zone in top-3 ≈ 75% | 2× | | Which part? (12+ families) | right part in top-3 ≈ 45–65% | 3–4× | | Knows when it doesn't know | calibrated (ECE ≈ 0.04), returns UNCERTAIN | — |

Full details, and the one head we demoted for failing out-of-sample (knock), are in docs/MODEL_CARD.md.

A fresh clone is immediately usable. A small pre-trained model ships in models/ , and a synthetic demo clip is bundled, so nothing needs to be downloaded or scraped.

git clone <this-repo> && cd car-diagnosis
uv venv && source .venv/bin/activate
uv pip install -e ".[scrape,web,dev,viz]" # Python 3.11
cardiag doctor # preflight: what's installed
cardiag train --fixtures # a working model offline in ~2s (no scrape, no 2 GB download)
cardiag diagnose <clip.wav> # verdict + where-in-the-car + ranked parts
cardiag serve --model models # live web app: drop a clip / paste a link, "explain why"

Verify the whole thing end-to-end in an isolated worktree: bash scripts/clone_verify.sh .

audio ──► clean() cascade ──► CLAP embedding ──► linear heads ──► Diagnosis
(isolate spans) (frozen, 512-d) (fault/region/ (calibrated,
part/knock) UNCERTAIN-aware)

There is one segmentation path. Scraped clips, your own recordings (cardiag ingest , any length), and uploads at inference all flow through the same clean()

cascade that isolates short mechanical spans. Spans over ~10 s are split into windows so CLAP never silently truncates them. Training and serving share one embedding contract, so there is no train/serve skew.

cardiag diagnose clip.wav # full model: verdict + region + ranked parts
cardiag triage clip.wav # calibrated engine-vs-running-gear
cardiag clean clip.wav # isolate the mechanical sound (no model needed)
cardiag inspect clip.wav -o r.html # SEE/HEAR the pipeline: spans, spectrograms, scores
cardiag ingest ./my_audio --kind fault --cause wheel_bearing # bring your own audio
cardiag scrape youtube|tiktok # build a corpus (Reddit is deprecated — too noisy)
cardiag train # train on your corpus

Add --json

to any inference command for machine-readable output.

  • docs/DEFENSE.md — the honest case that a deliberately crude method earns a real triage result.
  • docs/MODEL_CARD.md — per-head metrics, intended use, limitations.
  • docs/architecture.md — pipeline diagrams.
  • docs/scraping-guide.md — start-to-finish corpus building.

Valid for social-style / targeted-upload audio (YouTube, TikTok, or a phone clip a user records deliberately). It is not a safety-critical or standalone diagnostic. It is a triage assistant that narrows where to look and is honest about its uncertainty. Model files are joblib artifacts: load only ones you trust.

License: see LICENSE.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – The bottleneck might be the air in the room

dev-tools

The bottleneck might be the air in the room

High levels of carbon dioxide in closed meeting rooms and home offices can severely impair cognitive function and decision-making. Before blaming your team's motivation or strategy, consider opening a window to let the fresh air in.

NOW LET US Related – Agentic coding notes from Galapagos Island

dev-tools

Agentic coding notes from Galapagos Island

A deep dive into the realities of using AI coding agents, highlighting how an AI fabricated a test video to hide a bug, and why hardware-style automated testing is the key to scaling AI-generated code.

NOW LET US Related – Synthesis is harder than analysis

dev-tools

Synthesis is harder than analysis

An exploration of why synthesis is inherently more difficult than analysis, drawing parallels from calculus (differentiation vs. integration) to software engineering and incident response.

NOW LET US Related – MSI Center – How to gain SYSTEM privileges in seconds

dev-tools

MSI Center – How to gain SYSTEM privileges in seconds

A security researcher discovered a severe vulnerability in MSI Center that allows any authenticated user to escalate privileges to SYSTEM level. This article details the discovery, exploitation mechanism, and MSI's response.

NOW LET US Related – Soatok's Informal Guide to Threat Models

dev-tools

Soatok's Informal Guide to Threat Models

An intuitive guide to understanding and building effective threat models for software development and cybersecurity, moving past the buzzwords to focus on practical application.

NOW LET US Related – Odin, Wikipedia and engagement farming

dev-tools

Odin, Wikipedia and engagement farming

The Wikipedia page for the Odin programming language was recently deleted after a controversial vote, sparking outrage in the tech community. The incident exposes flaws in Wikipedia's moderation process and has ignited a fierce debate involving creator GingerBill, Jimmy Wales, and prominent developers.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.