I built an AI receptionist for a mechanic shop

A tech engineer built 'Axle', a custom AI voice agent, to help a luxury mechanic shop stop losing thousands of dollars from missed calls by using RAG and advanced LLMs.

My brother is a luxury mechanic shop owner, and he’s losing thousands of dollars per month because he misses hundreds of calls per week. He’s under the hood all day. The phone rings, he can’t answer, the customer hangs up and calls someone else. That’s a lost job — sometimes a $450 brake service, sometimes a $2,000 engine repair — just gone because no one picked up.

So I’m building him an AI receptionist. I named it Axle — like a car axle — because of course I did. 😏

This isn’t a generic chatbot. It’s a custom-built voice agent that answers his phone, knows his exact prices, his hours, his policies, and can collect a callback when it doesn’t know something. To get this right requires a custom build, so first I scraped his website data, created a product requirements document (PRD), and scoped the project into a 3-part build.

Step 1: Building the Brain (RAG Pipeline)

The first step was making sure the AI could actually answer questions accurately — without hallucinating prices or making things up.

A raw LLM is dangerous here. If a customer asks “how much for brakes?” and the AI guesses $200 when the real answer is $450, that’s a broken expectation and a frustrated customer. The fix is Retrieval-Augmented Generation (RAG): instead of letting the model guess, you give it a knowledge base of real information and make it answer only from that.

Here’s what I did:

Scraped Dane’s website — I pulled his service pages and pricing into markdown files. From there I built a structured knowledge base covering 21+ documents: every service type, pricing, turnaround times, hours, payment methods, cancellation policies, warranty info, loaner vehicles, and what car makes he specializes in.

Embedded the knowledge base into MongoDB Atlas — Each document gets converted into a 1024-dimensional vector using Voyage AI (voyage-3-large). These vectors capture the semantic meaning of each document, not just the keywords. They’re stored in MongoDB Atlas alongside the raw text, with an Atlas Vector Search index on the embedding field.

Built the retrieval pipeline — When a customer asks a question, the query gets embedded using the same Voyage AI model and then run against the Atlas Vector Search index. It returns the top 3 most semantically similar documents — so “how much for a brake job?” correctly retrieves the brake service pricing doc even if those exact words don’t appear together.

Wired up Claude for response generation — The retrieved documents get passed as context to Anthropic Claude (claude-sonnet-3.5) along with a strict system prompt: answer only from the knowledge base, keep responses short and conversational, and if you don’t know — say so and offer to take a message. No hallucinations allowed.

By the end of Part 1, I could type a question in the terminal and get a grounded, accurate answer back. “How much is an oil change?” → “$45 for conventional, $75 for synthetic. Includes oil filter, fluid top-off, and tire pressure check. Takes about 30 minutes.”

Step 2: Connecting It to a Real Phone Number

Next I had to get this brain onto an actual phone line that customers could call.

I chose Vapi as the voice platform. It handles everything on the telephony side: purchasing a phone number, speech-to-text (via Deepgram), text-to-speech (via ElevenLabs), and real-time function calling back to my server. The whole voice infrastructure is handled — I just needed to build the webhook it calls.

Built a FastAPI webhook server — Every time a caller asks a question, Vapi sends a tool-calls request to my /webhook endpoint with the caller’s query. The server routes that to the RAG pipeline, gets a response from Claude, and sends it back to Vapi, which reads it aloud to the caller. The whole round trip has to be fast enough to feel like a natural conversation.

Exposed it with Ngrok — During development, the server runs locally on port 8000. Ngrok punches a tunnel through to a public HTTPS URL, which I paste into the Vapi dashboard as the webhook endpoint. Vapi can now reach my local server in real time as calls come in.

Configured the Vapi assistant — In the Vapi dashboard I set up the assistant with a greeting, wired up two tools (answerQuestion for RAG-backed responses and saveCallback for collecting a name and number when a question can’t be answered), and pointed both at the webhook URL.

Logged every call to MongoDB — Each interaction gets stored in a calls collection: the caller’s number, the query, the AI’s response, and the timestamp. This turns the phone system into a data asset — he can see what customers are asking most, when call volume spikes, and how often the AI hands off to a human.

Step 3: Tuning for Voice

Then finally, the thing that took the most iteration: making it sound right.

Text responses and voice responses are completely different. A response that reads fine on screen — with bullet points, dollar signs formatted as “$45.00” — sounds awful when spoken aloud. I had to tune the system prompt specifically for voice delivery.

Picked the right voice — Vapi integrates with ElevenLabs. I went through about 20 of them and landed on Christopher — calm, natural, unhurried. The kind of voice that sounds like someone who actually knows cars.

Rewrote the system prompt for voice — Short sentences. No markdown. No filler phrases. Prices spoken naturally (“forty-five dollars” instead of “$45”). The goal is to sound like a knowledgeable, friendly human — not a chatbot reading a webpage.

Tested the escalation flow — When a caller asks something that isn’t in the knowledge base, the AI doesn’t guess. It tells the caller it doesn’t have that information, asks for their name and a good callback number, and saves that to MongoDB. Dane gets a list of callbacks to return — no lost leads.

The Stack

Vapi (with Deepgram & ElevenLabs integration) — phone number, speech-to-text, text-to-speech, tool calling
Ngrok — local development tunnel
FastAPI + Uvicorn — webhook server
MongoDB Atlas — knowledge base storage, vector search, call logs
Voyage AI (voyage-3-large) — text embeddings for semantic retrieval
Anthropic Claude (claude-sonnet-3.5) — response generation
Python — everything glued together