V.Trivedy_
← Architectural Teardowns

Chef's Console: teardown of an AI email-to-booking pipeline

Chef's Console reads a restaurant's inbox and turns customer email into structured bookings. No one retypes anything. The hard part is not reading an email. It is deciding, before the system writes a record, what the email actually is. Which client sent it. Which thread it belongs to. Whether the extraction is good enough to trust. One email is easy. Reconciling it against everything already in the database is the job.


Snapshot

RoleMulti-tenant SaaS for restaurants that run bulk and group bookings: catering, tour groups, corporate events, galas
Core jobConnect a restaurant inbox → AI reads incoming mail → structured Enquiry → Booking + Order
ArchitectureThree-app monorepo: backend (FastAPI), frontend (Next.js 15 dashboard), admin (Next.js 15 billing panel)
DatastoreMongoDB via Beanie 2.0, an async ODM over Pydantic models. 14 collections
AIOpenAI primary → DeepSeek fallback → regex/keyword floor
EmailGmail API (OAuth2), Outlook (MSAL), IMAP. SMTP for auto-reply
HostingBackend: one containerized instance on AWS, single region. Frontend and admin: Vercel
Scale (achieved, not projected)10+ restaurants live · ~1,200 emails processed/day · ~20 enquiry→order conversions per restaurant · enquiry→confirm time down ~10× · 24/7
StatusIn production. Early product, lots of room above current load

The problem

Group-booking email is about the worst input you can build on. It arrives as prose. It carries no schema. It spikes hard around events. One message might hold the party size, the date, the adult and child split, dietary notes, and a delivery window. The next holds none of that, buried three replies deep in a forward.

The two obvious fixes both fail:

Obvious approachWhy it breaks
Staff read and transcribe each emailThe exact cost you are trying to remove. Slow, error-prone, and it collapses under a spike, right when a big event is being planned
Force customers onto a structured web formCustomers ignore it and email you anyway. Now you parse email and maintain a form nobody uses

Most teams treat this as a parsing problem. It is a reconciliation problem. Pulling fields out of one email is the part a model does well on the first try. The part that quietly makes a mess is matching. Is this the same "Acme Tours" you already have, or a near-duplicate? Is this a new enquiry, or the fourth reply on a thread you already opened? Get it wrong and you mint duplicate clients and orphaned bookings faster than staff can clean them. The tool that was supposed to save time becomes a second mess to reconcile.


The architecture

One decision drives everything else. Keep the slow, bursty, failure-prone work away from the request path the dashboard depends on. Talking to mailboxes and AI providers is slow and unreliable. Serving the dashboard has to stay fast. So the API only ever does quick reads and writes against MongoDB. The email work runs in a background loop. The two meet in exactly one place: the database. Neither can stall the other.

How a customer email becomes a dashboard record without ever touching the request path. The API and the ingestion loop share one process, but they only ever meet at the database.

The core choices, and what each costs

DecisionChoseOverWhy it wins hereWhat it costs
Where the email worker livesOne asyncio loop inside the FastAPI process, polling every 60sCelery / Arq / SQS worker fleetZero extra infrastructure, one deploy artifact, easily fits 1,200 emails/day (~1 every 70s on average)Single point of failure. Workers can't scale apart from the API. One blocking call on that loop steals from request latency
Backend footprintSingle instanceAutoscaled fleetSidesteps distributed coordination entirely. With one worker, two processes can never double-fetch the same inboxMail→enquiry latency grows under load. Request latency doesn't, as long as the loop stays non-blocking
Datastore shapeDocument store (MongoDB + Beanie)Relational (Postgres)extracted_entities changes shape every time you revise the AI prompt. A schemaless sub-document absorbs that with no migrationNo foreign keys. Every cross-collection link is enforced in your code, not the database
AI reliabilityOpenAI → DeepSeek → regex cascadeOpenAI onlyThe pipeline never hard-stops because a provider had a bad hourFallback quality sits well below GPT. The floor keeps the lights on. It is not where the accuracy number comes from
Billing controlsSeparate admin appRole-gated route inside the main appSmaller blast radius for the controls that flip every tenant's paywall. Ships on its own cadenceOne more app to build and authenticate against

Here is the discipline worth naming. Build the thing that survives the next 10x, not the next 100x. A single async loop steadily clearing 1,200 emails a day tells me more about an engineer's judgment than a message-queue cluster doing the same work for a product that doesn't need one yet.

The core loop: one pass over the inbox

Dedup and client-matching are the reconciliation core. The system decides what an email is before it writes anything. Steps 7 to 10 are where duplicates get prevented, or created.

One detail worth flagging: processing is at-least-once. The loop marks mail read after it acts. So a crash mid-handle can re-send or drop a message. Thread dedup, on sender plus normalized subject, is the only idempotency guard right now. It is what stops a retried email from spawning a second enquiry.

Enquiry → Booking + Order conversion

The /convert endpoint resolves date, time, and party size through a cascade of fallbacks. It tries the conversion form first. Then the enquiry's extracted_entities. Then a text parser over the raw body. A missing form field degrades to AI metadata instead of failing the request. Then it computes total = adults × price + children × child_price from the client's default pricing, and creates a Booking and an Order that share the enquiry's booking_reference. There is no multi-row transaction here by default, so atomicity is a compensating rollback: if the second write fails, the first one is deleted. A half-converted enquiry never persists. Edit the event date on the booking and the change cascades back to the enquiry and the order, so the three never drift apart.


Data model

The schema is the backbone. It is built around one idea: a linked chain joined by a shared booking_reference string, scoped to a tenant by restaurant_id.

The 14 collections, centered on the booking_reference chain. MongoDB enforces none of these relationships. Your code does. That is the schema's biggest risk and its biggest source of flexibility at once.

Schema decisions that hold up

DecisionConsequence in practice
booking_reference as the cross-collection join keyEnquiry, Booking, and Order stay linked without a relational join. The catch: integrity is application-enforced. Every write path has to set it correctly, or records orphan in silence
extracted_entities as a free-form sub-document on EnquiryThis is where the document store earns its keep. The AI's output shape can change with every prompt revision, and nothing migrates
RestaurantMember embedded in Restaurant, not its own collectionA permission check becomes a single-document read on the hot path. Fast. Safe too, because members per restaurant is small and bounded, so the document never grows out of control
restaurant_id stamped on every tenant-scoped collectionMulti-tenancy is logical, not physical. One database, filtered per request. Cheap and simple. The whole wall is the discipline of always applying the filter
Separate PasswordResetToken / EmailVerificationToken collectionsShort-lived auth artifacts, kept out of the User document. A natural fit for TTL expiry
Status as a constrained string lifecycle (new → read → replied → approved → rejected → closed)You read the enquiry's state straight off the data, instead of inferring it from scattered booleans

Infrastructure and operations

Build and deploy path. The frontend ships on its own through Vercel. The backend ships as one Docker image to one AWS instance.

ConcernPostureWhere it bites, and how the design absorbs it
ScalingSingle backend instanceAt ~1,200/day, a 60s poll and sequential AI calls have huge headroom. Volume doesn't touch correctness. Latency degrades first, and a few seconds of mail→enquiry lag is invisible in an automation like this
Availability24/7. Restart on failure. 10-min stuck-job sweepThe in-process loop is a single point of failure. If the process dies, ingestion pauses, but the dashboard keeps serving from MongoDB. The stuck-job sweep recovers hung jobs on its own
Multi-tenancyLogical isolation via restaurant_id + verify_restaurant_access() on every scoped routeThe risk is never a breached wall. It is a forgotten filter. One query that misses its tenant scope is the whole failure mode. That is why the access check sits in one place instead of being rewritten per endpoint
AuthN/ZJWT access + refresh (python-jose), auto-refresh at 25 min and on 401. bcrypt password hashing. RBAC across owner / manager / staff with a granular Permission enumGoogle sign-in (account auth) stays deliberately separate from email-provider OAuth (inbox connection). Different scopes, different blast radius
CI/CDGitHub Actions → Docker → AWS. Vercel for the frontendBackend and frontend deploy on their own cadences. A UI change can't take down the API, or the other way around
SecretsPer-tenant email OAuth tokens / app-passwords stored in MongoDBCredentials live with the tenant record they belong to. Scoped per integration. Fine at this scale

Outcome

Every number below is measured on current production load, not projected.

MetricResult
Restaurants live10+
Emails processed~1,200 / day
Enquiry→order conversions~20 per restaurant
Enquiry → confirmation timedown ~10×
Uptime24/7
Extraction accuracy100% on booking emails processed so far

That 100% needs a denominator, or no engineer will believe it. Here is the honest version. Across every booking email handled so far, none has needed a manual correction. That is the OpenAI path working on a narrow, high-signal domain: group-dining enquiries, from senders whose emails tend to look the same. Measured over a production volume in the thousands, not the millions. The DeepSeek and regex fallback did not produce that number. It exists so the pipeline survives a provider outage. Read it that way and 100% is a fair early result, not a pitch. The ceiling sits well above the load that produced it.


What I'd watch

Four things, in rough order of when they'll matter:

  1. Linkage integrity is the soft spot. booking_reference is enforced in code across three collections, and Mongo won't catch a mistake. Route every create-and-link through the one conversion service. The second write path that forgets the reference is where your orphaned bookings start.
  2. Matching is heuristic, and heuristics drift. Threads dedup on sender plus normalized subject. Clients auto-match or auto-create. Two ways that bites: a customer edits the subject line and one conversation forks into two threads, or two unrelated enquiries from the same sender share a similar subject and collapse into one. Both are the exact reconciliation mess the system exists to prevent. Before tenant count climbs, add a confidence threshold that sends low-certainty matches to a human queue instead of creating records in silence.