// LIVE: chefsconsole.com

Chef's Console: teardown of an AI email-to-booking pipeline

Chef's Console reads a restaurant's inbox and turns customer email into structured bookings. No one retypes anything. The hard part is not reading an email. It is deciding, before the system writes a record, what the email actually is. Which client sent it. Which thread it belongs to. Whether the extraction is good enough to trust. One email is easy. Reconciling it against everything already in the database is the job.

Snapshot


Role	Multi-tenant SaaS for restaurants that run bulk and group bookings: catering, tour groups, corporate events, galas
Core job	Connect a restaurant inbox → AI reads incoming mail → structured Enquiry → Booking + Order
Architecture	Three-app monorepo: `backend` (FastAPI), `frontend` (Next.js 15 dashboard), `admin` (Next.js 15 billing panel)
Datastore	MongoDB via Beanie 2.0, an async ODM over Pydantic models. 14 collections
AI	OpenAI primary → DeepSeek fallback → regex/keyword floor
Email	Gmail API (OAuth2), Outlook (MSAL), IMAP. SMTP for auto-reply
Hosting	Backend: one containerized instance on AWS, single region. Frontend and admin: Vercel
Scale (achieved, not projected)	10+ restaurants live · ~1,200 emails processed/day · ~20 enquiry→order conversions per restaurant · enquiry→confirm time down ~10× · 24/7
Status	In production. Early product, lots of room above current load

The problem

Group-booking email is about the worst input you can build on. It arrives as prose. It carries no schema. It spikes hard around events. One message might hold the party size, the date, the adult and child split, dietary notes, and a delivery window. The next holds none of that, buried three replies deep in a forward.

The two obvious fixes both fail:

Obvious approach	Why it breaks
Staff read and transcribe each email	The exact cost you are trying to remove. Slow, error-prone, and it collapses under a spike, right when a big event is being planned
Force customers onto a structured web form	Customers ignore it and email you anyway. Now you parse email and maintain a form nobody uses

Most teams treat this as a parsing problem. It is a reconciliation problem. Pulling fields out of one email is the part a model does well on the first try. The part that quietly makes a mess is matching. Is this the same "Acme Tours" you already have, or a near-duplicate? Is this a new enquiry, or the fourth reply on a thread you already opened? Get it wrong and you mint duplicate clients and orphaned bookings faster than staff can clean them. The tool that was supposed to save time becomes a second mess to reconcile.

The architecture

One decision drives everything else. Keep the slow, bursty, failure-prone work away from the request path the dashboard depends on. Talking to mailboxes and AI providers is slow and unreliable. Serving the dashboard has to stay fast. So the API only ever does quick reads and writes against MongoDB. The email work runs in a background loop. The two meet in exactly one place: the database. Neither can stall the other.

How a customer email becomes a dashboard record without ever touching the request path. The API and the ingestion loop share one process, but they only ever meet at the database.

The core choices, and what each costs

Decision	Chose	Over	Why it wins here	What it costs
Where the email worker lives	One asyncio loop inside the FastAPI process, polling every 60s	Celery / Arq / SQS worker fleet	Zero extra infrastructure, one deploy artifact, easily fits 1,200 emails/day (~1 every 70s on average)	Single point of failure. Workers can't scale apart from the API. One blocking call on that loop steals from request latency
Backend footprint	Single instance	Autoscaled fleet	Sidesteps distributed coordination entirely. With one worker, two processes can never double-fetch the same inbox	Mail→enquiry latency grows under load. Request latency doesn't, as long as the loop stays non-blocking
Datastore shape	Document store (MongoDB + Beanie)	Relational (Postgres)	`extracted_entities` changes shape every time you revise the AI prompt. A schemaless sub-document absorbs that with no migration	No foreign keys. Every cross-collection link is enforced in your code, not the database
AI reliability	OpenAI → DeepSeek → regex cascade	OpenAI only	The pipeline never hard-stops because a provider had a bad hour	Fallback quality sits well below GPT. The floor keeps the lights on. It is not where the accuracy number comes from
Billing controls	Separate `admin` app	Role-gated route inside the main app	Smaller blast radius for the controls that flip every tenant's paywall. Ships on its own cadence	One more app to build and authenticate against

Here is the discipline worth naming. Build the thing that survives the next 10x, not the next 100x. A single async loop steadily clearing 1,200 emails a day tells more about engineer's judgment than a message-queue cluster doing the same work for a product that doesn't need one yet.

The core loop: one pass over the inbox

Dedup and client-matching are the reconciliation core. The system decides what an email is before it writes anything. Steps 7 to 10 are where duplicates get prevented, or created.

One detail worth flagging: processing is at-least-once. The loop marks mail read after it acts. So a crash mid-handle can re-send or drop a message. Thread dedup, on sender plus normalized subject, is the only idempotency guard right now. It is what stops a retried email from spawning a second enquiry.

Enquiry → Booking + Order conversion

The /convert endpoint resolves date, time, and party size through a cascade of fallbacks. It tries the conversion form first. Then the enquiry's extracted_entities. Then a text parser over the raw body. A missing form field degrades to AI metadata instead of failing the request. Then it computes total = adults × price + children × child_price from the client's default pricing, and creates a Booking and an Order that share the enquiry's booking_reference. There is no multi-row transaction here by default, so atomicity is a compensating rollback: if the second write fails, the first one is deleted. A half-converted enquiry never persists. Edit the event date on the booking and the change cascades back to the enquiry and the order, so the three never drift apart.

Data model

The schema is the backbone. It is built around one idea: a linked chain joined by a shared booking_reference string, scoped to a tenant by restaurant_id.

The 14 collections, centered on the booking_reference chain. MongoDB enforces none of these relationships. Your code does. That is the schema's biggest risk and its biggest source of flexibility at once.

Schema decisions that hold up

Decision	Consequence in practice
`booking_reference` as the cross-collection join key	Enquiry, Booking, and Order stay linked without a relational join. The catch: integrity is application-enforced. Every write path has to set it correctly, or records orphan in silence
`extracted_entities` as a free-form sub-document on Enquiry	This is where the document store earns its keep. The AI's output shape can change with every prompt revision, and nothing migrates
`RestaurantMember` embedded in Restaurant, not its own collection	A permission check becomes a single-document read on the hot path. Fast. Safe too, because members per restaurant is small and bounded, so the document never grows out of control
`restaurant_id` stamped on every tenant-scoped collection	Multi-tenancy is logical, not physical. One database, filtered per request. Cheap and simple. The whole wall is the discipline of always applying the filter
Separate `PasswordResetToken` / `EmailVerificationToken` collections	Short-lived auth artifacts, kept out of the User document. A natural fit for TTL expiry
Status as a constrained string lifecycle (`new → read → replied → approved → rejected → closed`)	You read the enquiry's state straight off the data, instead of inferring it from scattered booleans

Infrastructure and operations

Build and deploy path. The frontend ships on its own through Vercel. The backend ships as one Docker image to one AWS instance.

Concern	Posture	Where it bites, and how the design absorbs it
Scaling	Single backend instance	At ~1,200/day, a 60s poll and sequential AI calls have huge headroom. Volume doesn't touch correctness. Latency degrades first, and a few seconds of mail→enquiry lag is invisible in an automation like this
Availability	24/7. Restart on failure. 10-min stuck-job sweep	The in-process loop is a single point of failure. If the process dies, ingestion pauses, but the dashboard keeps serving from MongoDB. The stuck-job sweep recovers hung jobs on its own
Multi-tenancy	Logical isolation via `restaurant_id` + `verify_restaurant_access()` on every scoped route	The risk is never a breached wall. It is a forgotten filter. One query that misses its tenant scope is the whole failure mode. That is why the access check sits in one place instead of being rewritten per endpoint
AuthN/Z	JWT access + refresh (`python-jose`), auto-refresh at 25 min and on 401. bcrypt password hashing. RBAC across owner / manager / staff with a granular `Permission` enum	Google sign-in (account auth) stays deliberately separate from email-provider OAuth (inbox connection). Different scopes, different blast radius
CI/CD	GitHub Actions → Docker → AWS. Vercel for the frontend	Backend and frontend deploy on their own cadences. A UI change can't take down the API, or the other way around
Secrets	Per-tenant email OAuth tokens / app-passwords stored in MongoDB	Credentials live with the tenant record they belong to. Scoped per integration. Fine at this scale

Outcome

Every number below is measured on current production load, not projected.

Metric	Result
Restaurants live	10+
Emails processed	~1,200 / day
Enquiry→order conversions	~20 per restaurant
Enquiry → confirmation time	down ~10×
Uptime	24/7
Extraction accuracy	100% on booking emails processed so far

That 100% needs a denominator, or no engineer will believe it. Here is the honest version. Across every booking email handled so far, none has needed a manual correction. That is the OpenAI path working on a narrow, high-signal domain: group-dining enquiries, from senders whose emails tend to look the same. Measured over a production volume in the thousands, not the millions. The DeepSeek and regex fallback did not produce that number. It exists so the pipeline survives a provider outage. Read it that way and 100% is a fair early result, not a pitch. The ceiling sits well above the load that produced it.

What I'd watch

Four things, in rough order of when they'll matter:

Linkage integrity is the soft spot. booking_reference is enforced in code across three collections, and Mongo won't catch a mistake. Route every create-and-link through the one conversion service. The second write path that forgets the reference is where your orphaned bookings start.
Matching is heuristic, and heuristics drift. Threads dedup on sender plus normalized subject. Clients auto-match or auto-create. Two ways that bites: a customer edits the subject line and one conversation forks into two threads, or two unrelated enquiries from the same sender share a similar subject and collapse into one. Both are the exact reconciliation mess the system exists to prevent. Before tenant count climbs, add a confidence threshold that sends low-certainty matches to a human queue instead of creating records in silence.