Chef's Console: teardown of an AI email-to-booking pipeline
Chef's Console reads a restaurant's inbox and turns customer email into structured bookings. No one retypes anything. The hard part is not reading an email. It is deciding, before the system writes a record, what the email actually is. Which client sent it. Which thread it belongs to. Whether the extraction is good enough to trust. One email is easy. Reconciling it against everything already in the database is the job.
Snapshot
| Role | Multi-tenant SaaS for restaurants that run bulk and group bookings: catering, tour groups, corporate events, galas |
| Core job | Connect a restaurant inbox → AI reads incoming mail → structured Enquiry → Booking + Order |
| Architecture | Three-app monorepo: backend (FastAPI), frontend (Next.js 15 dashboard), admin (Next.js 15 billing panel) |
| Datastore | MongoDB via Beanie 2.0, an async ODM over Pydantic models. 14 collections |
| AI | OpenAI primary → DeepSeek fallback → regex/keyword floor |
| Gmail API (OAuth2), Outlook (MSAL), IMAP. SMTP for auto-reply | |
| Hosting | Backend: one containerized instance on AWS, single region. Frontend and admin: Vercel |
| Scale (achieved, not projected) | 10+ restaurants live · ~1,200 emails processed/day · ~20 enquiry→order conversions per restaurant · enquiry→confirm time down ~10× · 24/7 |
| Status | In production. Early product, lots of room above current load |
The problem
Group-booking email is about the worst input you can build on. It arrives as prose. It carries no schema. It spikes hard around events. One message might hold the party size, the date, the adult and child split, dietary notes, and a delivery window. The next holds none of that, buried three replies deep in a forward.
The two obvious fixes both fail:
| Obvious approach | Why it breaks |
|---|---|
| Staff read and transcribe each email | The exact cost you are trying to remove. Slow, error-prone, and it collapses under a spike, right when a big event is being planned |
| Force customers onto a structured web form | Customers ignore it and email you anyway. Now you parse email and maintain a form nobody uses |
Most teams treat this as a parsing problem. It is a reconciliation problem. Pulling fields out of one email is the part a model does well on the first try. The part that quietly makes a mess is matching. Is this the same "Acme Tours" you already have, or a near-duplicate? Is this a new enquiry, or the fourth reply on a thread you already opened? Get it wrong and you mint duplicate clients and orphaned bookings faster than staff can clean them. The tool that was supposed to save time becomes a second mess to reconcile.
The architecture
One decision drives everything else. Keep the slow, bursty, failure-prone work away from the request path the dashboard depends on. Talking to mailboxes and AI providers is slow and unreliable. Serving the dashboard has to stay fast. So the API only ever does quick reads and writes against MongoDB. The email work runs in a background loop. The two meet in exactly one place: the database. Neither can stall the other.
How a customer email becomes a dashboard record without ever touching the request path. The API and the ingestion loop share one process, but they only ever meet at the database.
The core choices, and what each costs
| Decision | Chose | Over | Why it wins here | What it costs |
|---|---|---|---|---|
| Where the email worker lives | One asyncio loop inside the FastAPI process, polling every 60s | Celery / Arq / SQS worker fleet | Zero extra infrastructure, one deploy artifact, easily fits 1,200 emails/day (~1 every 70s on average) | Single point of failure. Workers can't scale apart from the API. One blocking call on that loop steals from request latency |
| Backend footprint | Single instance | Autoscaled fleet | Sidesteps distributed coordination entirely. With one worker, two processes can never double-fetch the same inbox | Mail→enquiry latency grows under load. Request latency doesn't, as long as the loop stays non-blocking |
| Datastore shape | Document store (MongoDB + Beanie) | Relational (Postgres) | extracted_entities changes shape every time you revise the AI prompt. A schemaless sub-document absorbs that with no migration | No foreign keys. Every cross-collection link is enforced in your code, not the database |
| AI reliability | OpenAI → DeepSeek → regex cascade | OpenAI only | The pipeline never hard-stops because a provider had a bad hour | Fallback quality sits well below GPT. The floor keeps the lights on. It is not where the accuracy number comes from |
| Billing controls | Separate admin app | Role-gated route inside the main app | Smaller blast radius for the controls that flip every tenant's paywall. Ships on its own cadence | One more app to build and authenticate against |
Here is the discipline worth naming. Build the thing that survives the next 10x, not the next 100x. A single async loop steadily clearing 1,200 emails a day tells me more about an engineer's judgment than a message-queue cluster doing the same work for a product that doesn't need one yet.
The core loop: one pass over the inbox
Dedup and client-matching are the reconciliation core. The system decides what an email is before it writes anything. Steps 7 to 10 are where duplicates get prevented, or created.
One detail worth flagging: processing is at-least-once. The loop marks mail read after it acts. So a crash mid-handle can re-send or drop a message. Thread dedup, on sender plus normalized subject, is the only idempotency guard right now. It is what stops a retried email from spawning a second enquiry.
Enquiry → Booking + Order conversion
The /convert endpoint resolves date, time, and party size through a cascade of fallbacks. It tries the conversion form first. Then the enquiry's extracted_entities. Then a text parser over the raw body. A missing form field degrades to AI metadata instead of failing the request. Then it computes total = adults × price + children × child_price from the client's default pricing, and creates a Booking and an Order that share the enquiry's booking_reference. There is no multi-row transaction here by default, so atomicity is a compensating rollback: if the second write fails, the first one is deleted. A half-converted enquiry never persists. Edit the event date on the booking and the change cascades back to the enquiry and the order, so the three never drift apart.
Data model
The schema is the backbone. It is built around one idea: a linked chain joined by a shared booking_reference string, scoped to a tenant by restaurant_id.
The 14 collections, centered on the booking_reference chain. MongoDB enforces none of these relationships. Your code does. That is the schema's biggest risk and its biggest source of flexibility at once.
Schema decisions that hold up
| Decision | Consequence in practice |
|---|---|
booking_reference as the cross-collection join key | Enquiry, Booking, and Order stay linked without a relational join. The catch: integrity is application-enforced. Every write path has to set it correctly, or records orphan in silence |
extracted_entities as a free-form sub-document on Enquiry | This is where the document store earns its keep. The AI's output shape can change with every prompt revision, and nothing migrates |
RestaurantMember embedded in Restaurant, not its own collection | A permission check becomes a single-document read on the hot path. Fast. Safe too, because members per restaurant is small and bounded, so the document never grows out of control |
restaurant_id stamped on every tenant-scoped collection | Multi-tenancy is logical, not physical. One database, filtered per request. Cheap and simple. The whole wall is the discipline of always applying the filter |
Separate PasswordResetToken / EmailVerificationToken collections | Short-lived auth artifacts, kept out of the User document. A natural fit for TTL expiry |
Status as a constrained string lifecycle (new → read → replied → approved → rejected → closed) | You read the enquiry's state straight off the data, instead of inferring it from scattered booleans |
Infrastructure and operations
Build and deploy path. The frontend ships on its own through Vercel. The backend ships as one Docker image to one AWS instance.
| Concern | Posture | Where it bites, and how the design absorbs it |
|---|---|---|
| Scaling | Single backend instance | At ~1,200/day, a 60s poll and sequential AI calls have huge headroom. Volume doesn't touch correctness. Latency degrades first, and a few seconds of mail→enquiry lag is invisible in an automation like this |
| Availability | 24/7. Restart on failure. 10-min stuck-job sweep | The in-process loop is a single point of failure. If the process dies, ingestion pauses, but the dashboard keeps serving from MongoDB. The stuck-job sweep recovers hung jobs on its own |
| Multi-tenancy | Logical isolation via restaurant_id + verify_restaurant_access() on every scoped route | The risk is never a breached wall. It is a forgotten filter. One query that misses its tenant scope is the whole failure mode. That is why the access check sits in one place instead of being rewritten per endpoint |
| AuthN/Z | JWT access + refresh (python-jose), auto-refresh at 25 min and on 401. bcrypt password hashing. RBAC across owner / manager / staff with a granular Permission enum | Google sign-in (account auth) stays deliberately separate from email-provider OAuth (inbox connection). Different scopes, different blast radius |
| CI/CD | GitHub Actions → Docker → AWS. Vercel for the frontend | Backend and frontend deploy on their own cadences. A UI change can't take down the API, or the other way around |
| Secrets | Per-tenant email OAuth tokens / app-passwords stored in MongoDB | Credentials live with the tenant record they belong to. Scoped per integration. Fine at this scale |
Outcome
Every number below is measured on current production load, not projected.
| Metric | Result |
|---|---|
| Restaurants live | 10+ |
| Emails processed | ~1,200 / day |
| Enquiry→order conversions | ~20 per restaurant |
| Enquiry → confirmation time | down ~10× |
| Uptime | 24/7 |
| Extraction accuracy | 100% on booking emails processed so far |
That 100% needs a denominator, or no engineer will believe it. Here is the honest version. Across every booking email handled so far, none has needed a manual correction. That is the OpenAI path working on a narrow, high-signal domain: group-dining enquiries, from senders whose emails tend to look the same. Measured over a production volume in the thousands, not the millions. The DeepSeek and regex fallback did not produce that number. It exists so the pipeline survives a provider outage. Read it that way and 100% is a fair early result, not a pitch. The ceiling sits well above the load that produced it.
What I'd watch
Four things, in rough order of when they'll matter:
- Linkage integrity is the soft spot.
booking_referenceis enforced in code across three collections, and Mongo won't catch a mistake. Route every create-and-link through the one conversion service. The second write path that forgets the reference is where your orphaned bookings start. - Matching is heuristic, and heuristics drift. Threads dedup on sender plus normalized subject. Clients auto-match or auto-create. Two ways that bites: a customer edits the subject line and one conversation forks into two threads, or two unrelated enquiries from the same sender share a similar subject and collapse into one. Both are the exact reconciliation mess the system exists to prevent. Before tenant count climbs, add a confidence threshold that sends low-certainty matches to a human queue instead of creating records in silence.