I Built an AI Receptionist for a Luxury Mechanic Shop – Part 1

A developer builds 'Axle', a custom AI voice agent using RAG and MongoDB Atlas, to help a luxury mechanic shop recover thousands of dollars in lost revenue from missed customer calls.

My brother is a luxury mechanic shop owner, and he’s losing thousands of dollars per month because he misses hundreds of calls per week. He’s under the hood all day. The phone rings, he can’t answer, the customer hangs up and calls someone else. That’s a lost job — sometimes a $450 brake service, sometimes a $2,000 engine repair — just gone because no one picked up.

So I’m building him an AI receptionist. I named it Axle — like a car axle — because of course I did. 😏

This isn’t a generic chatbot. It’s a custom-built voice agent that answers his phone, knows his exact prices, his hours, his policies, and can collect a callback when it doesn’t know something. To get this right requires a custom build, so first I scraped his website data, created a product requirements document (PRD), and scoped the project into a 3-part build.

Step 1: Building the Brain (RAG Pipeline)

The first step was making sure the AI could actually answer questions accurately — without hallucinating prices or making things up.

A raw LLM is dangerous here. If a customer asks “how much for brakes?” and the AI guesses $200 when the real answer is $450, that’s a broken expectation and a frustrated customer. The fix is Retrieval-Augmented Generation (RAG): instead of letting the model guess, you give it a knowledge base of real information and make it answer only from that.

Here’s what I did:

Scraped Dane’s website — I pulled his service pages and pricing into markdown files. From there I built a structured knowledge base covering 21+ documents: every service type, pricing, turnaround times, hours, payment methods, cancellation policies, warranty info, loaner vehicles, and what car makes he specializes in.

Embedded the knowledge base into MongoDB Atlas — Each document gets converted into a 1024-dimensional vector using Voyage AI (voyage-3-large). These vectors capture the semantic meaning of each document, not just the keywords. They’re stored in MongoDB Atlas alongside the raw text, with an Atlas Vector Search index on the embedding field.

Built the retrieval pipeline — When a customer asks a question, the query gets embedded using the same Voyage AI model and then run against the Atlas Vector Search index. It returns the top 3 most semantically similar documents — so “how much for a brake job?” correctly retrieves the brake service pricing doc even if those exact words don’t appear together.

Wired up Claude for response generation — The retrieved documents get passed as context to Anthropic Claude (claude-sonnet-4-6) along with a strict system prompt: answer only from the knowledge base, keep responses short and conversational, and if you don’t know — say so and offer to take a message. No hallucinations allowed.

By the end of Part 1, I could type a question in the terminal and get a grounded, accurate answer back. “How much is an oil change?” → “$45 for conventional, $75 for synthetic. Includes oil filter, fluid top-off, and tire pressure check. Takes about 30 minutes.”

Step 2: Connecting It to a Real Phone Number

Next I had to get this brain onto an actual phone line that customers could call.

I chose Vapi as the voice platform. It handles everything on the telephony side: purchasing a phone number, speech-to-text (via Deepgram), text-to-speech (via ElevenLabs), and real-time function calling back to my server. The whole voice infrastructure is handled — I just needed to build the webhook it calls.

Built a FastAPI webhook server — Every time a caller asks a question, Vapi sends a tool-calls request to my /webhook endpoint with the caller’s query. The server routes that to the RAG pipeline, gets a response from Claude, and sends it back to Vapi, which reads it aloud to the caller. The whole round trip has to be fast enough to feel like a natural conversation.

Exposed it with Ngrok — During development, the server runs locally on port 8000. Ngrok punches a tunnel through to a public HTTPS URL, which I paste into the Vapi dashboard as the webhook endpoint. Vapi can now reach my local server in real time as calls come in.

Configured the Vapi assistant — In the Vapi dashboard I set up the assistant with a greeting, wired up two tools (answerQuestion for RAG-backed responses and saveCallback for collecting a name and number), and pointed both at the webhook URL.

Logged every call to MongoDB — Each interaction gets stored in a calls collection. Callback requests from unknown questions go into a separate callbacks collection so Dane can follow up. This turns the phone system into a data asset.

Step 3: Tuning for Voice

Then finally, the thing that took the most iteration: making it sound right.

Text responses and voice responses are completely different. A response that reads fine on screen sounds awful when spoken aloud. I had to tune the system prompt specifically for voice delivery.

Picked the right voice — I landed on Christopher — calm, natural, unhurried. The kind of voice that sounds like someone who actually knows cars.

Rewrote the system prompt for voice — Short sentences. No markdown. No filler phrases. Prices spoken naturally (“forty-five dollars” instead of “$45”). Responses capped at 2–4 sentences max.

Tested the escalation flow — When a caller asks something that isn’t in the knowledge base, the AI doesn’t guess. It tells the caller it doesn’t have that information, asks for their name and a good callback number, and saves that to MongoDB.

The Stack

Vapi (with Deepgram & ElevenLabs integration) — phone number, STT, TTS, tool calling
Ngrok — local development tunnel
FastAPI + Uvicorn — webhook server
MongoDB Atlas — knowledge base storage, vector search, call logs, callback queue
Voyage AI (voyage-3-large) — text embeddings
Anthropic Claude (claude-sonnet-4-6) — response generation
Python — everything glued together