SMAD AI: Architecture Teardown
A multi-tenant SaaS that carries a social post from raw idea to live publication across LinkedIn, Meta (Facebook/Instagram), and YouTube. The scheduling was never the hard part. The hard part is keeping one state machine correct while three kinds of unreliable actor (language models, human designers, and third-party platform APIs) each fail in their own way and on their own clock.
Snapshot
| What it is | Social Media Automation Dashboard: plan → generate → design → approve → schedule → publish → measure |
| Domain | Multi-tenant content-operations SaaS for marketing teams |
| Stage / status | Deployed for selected clients |
| Frontend | Next.js 15 (App Router), Tailwind, Shadcn/UI, Zustand, React Query, FullCalendar, dnd-kit |
| Backend | FastAPI, async SQLAlchemy, Pydantic; layered services and adapters; Alembic migrations |
| Data plane | PostgreSQL 16 + pgvector, Redis, AWS S3 |
| Async plane | Celery worker and Beat, Redis broker |
| AI | LangChain over OpenAI / DeepSeek / Gemini / Anthropic behind a fallback router; RAG on pgvector |
| Publish targets | LinkedIn, Facebook, Instagram, YouTube |
| Packaging | docker-compose (Postgres, Redis, API, worker, beat, frontend) for local and dev |
The problem
A team, not one person, pushes many posts through a pipeline full of handoffs. That single sentence is what breaks the obvious design. Five constraints sit underneath it.
| Constraint | Why it bites |
|---|---|
| LLM latency and flakiness | A generation call takes seconds to tens of seconds and sometimes times out or rate-limits. Block the request on it and the UI hangs. |
| Four incompatible platforms | LinkedIn, Meta, and YouTube disagree on auth, media format, rate limits, and even how you hand them a file. |
| Human gates that vary per tenant | A post is only ready when copy is approved, a designer has delivered assets, and someone signed off, and each brand wants those rules slightly different. |
| Concurrency at publish time | Many posts come due in the same minute. Two workers must never publish the same one twice. |
| Per-tenant secrets | Each tenant brings its own Meta app credentials. Isolation is about secrets, not just rows. |
A CRUD app with a cron job models exactly one thing: publish at time T. It has no opinion about whether the post is ready. So it does one of two bad things: publishes unapproved junk precisely on schedule, or freezes the UI while a model thinks. Most teams who build "a social scheduler" stop here and discover, six platforms in, that the scheduler was the easy 20%. The remaining 80% is a state machine that stays honest when a language model, a person, and a platform API can each fail independently.
SMAD's answer is to make readiness a first-class concept and to push everything slow or untrusted off the request path.
The architecture
The guiding principle is plain: the spine is a content-aware state machine, and anything slow or untrustworthy lives behind an idempotent async job layer so it can fail without taking down the request or a neighboring tenant.
The backend is layered so each concern has exactly one home. Endpoints stay thin (auth, validation, HTTP status), services own all business rules and transactions, adapters wrap every external system behind a stable interface, and Celery workers run whatever should not make a user wait.
How a request moves: the browser talks REST/JWT to FastAPI; anything slow (AI, publishing) is handed to Redis and run later by Celery; Postgres is the system of record, Redis is the queue and cache, S3 holds blobs, and the two boxes on the right are the systems nobody on the team controls.
Why each load-bearing choice was made
| Decision | Chosen | Over | What it buys | What it costs |
|---|---|---|---|---|
| Vector store | pgvector inside Postgres | Pinecone / Qdrant / Weaviate | One datastore to run; vectors and their metadata commit in the same transaction; tenant isolation reuses the row filtering already in place | Weaker beyond ~1M vectors per tenant; index type must be chosen and tuned by hand |
| Message broker | Redis | RabbitMQ / managed SQS | Simplest thing to operate, and it doubles as the engagement cache | At-least-once delivery: a job hidden by the visibility timeout can be redelivered even while still running, so the app must be idempotent or it double-acts |
| Concurrency control | SELECT … FOR UPDATE SKIP LOCKED | Advisory locks / external lock service / a single publisher | N publish workers scale out and never collide on the same row; no extra infrastructure | Postgres-specific; transaction scope must be disciplined |
| AI generation | Two-phase: sync create, async fill | Synchronous model call in the request | The UI returns in milliseconds and never blocks on a 10 to 30 second model call or a provider timeout | Eventual consistency, so the client must poll job status |
| LLM access | LangChain + an LLMRouter fallback chain | One hardcoded provider | Survives a single provider's outage, rate-limit, or price spike by trying the next | Abstraction overhead; prompts must stay portable across models |
| Platform integration | Adapter pattern (BaseSocialAdapter) | Per-platform branches inside services | Four wildly different APIs, auth models, and media rules stay quarantined behind one interface | The interface is a lowest common denominator; platform-only features need careful escape hatches |
| Tenant scoping | Enforced in the service layer on every query | Route-guard or middleware only | Defense at the data-access layer: a forgotten endpoint guard cannot leak another tenant's rows | Requires discipline on every query; no database-level row security is noted |
| Auth | Stateless JWT bearer tokens | Server-side sessions | Horizontal scale with no session store; super-admins cross tenants via explicit tenant_id | Revocation is harder; token lifetimes need tuning |
The pgvector call is the one I would defend hardest. For a per-tenant knowledge base (modest size, documents trickling in over time), a graph index like HNSW is the sensible default. It gives strong recall with little tuning, and unlike IVFFlat it can be built on an empty table because there is no training step, so a brand that uploads its first document still gets working search. Reaching for a dedicated vector database here would buy a second system to run and a consistency seam between the vectors and everything else, all to solve a scale this product does not have.
The workflow engine: the actual product
This is what separates SMAD from a scheduler. A post never advances on a clock. It advances only when its copy is complete and the gate guarding the next step is open.
The post lifecycle. Each arrow fires only when the work for that step is done and its gate is open; a rejected design loops back with notes, and a failed publish retries without re-running the whole pipeline.
Three gates govern the path, and each gate's policy is read from the most specific source available: Schedule, then Profile, then Tenant. A brand sets a default once and overrides it per profile or per recurring schedule without forking the logic.
| Gate | Enforced when | Evidence required |
|---|---|---|
| Content (1) | Leaving AI_GENERATED toward design or scheduling | content_approved_at is set |
| Design (2) | Before a visual post becomes SCHEDULED | design_approved_at set and media assets present |
| Publish (3) | Before the actual platform API call | publish_approved_at set (recorded implicitly on "Post Now") |
PostWorkflowService handles content-aware auto-promotion. When copy is complete and the relevant gate passes, the post moves on its own: AI_GENERATED to NEEDS_DESIGN, or a text-only post with scheduled_at straight to SCHEDULED. The post type (TEXT_ONLY, SINGLE_IMAGE, CAROUSEL, VIDEO, STORY, REEL) decides whether the design detour applies at all. Text skips it; anything visual has to route through a human.
The trap this design avoids is encoding the gate rules inside endpoints. Do that, and a designer-facing route and a calendar drag each grow their own slightly wrong copy of "is this ready?" One service holding the machine means there is exactly one definition of ready, and that definition is testable.
Data model
Schema is the backbone here, not a footnote. Two decisions make it sound. First, tenant_id rides on every table as the isolation root. Second, per-tenant secrets are split from per-profile OAuth tokens: SocialMediaKey holds a tenant's own Meta app credentials and is a separate entity from the access tokens stored on each SocialProfile. Conflate the two and one tenant ends up publishing through another's app.
The core entities. Tenant is the root every other row hangs from; Post is the workflow-bearing object; a knowledge source fans out into many embedded chunks; background jobs are real rows, not just queue messages. (Schema reconstructed from the attributes the brief describes. The id and key types are illustrative, while the approval timestamps, status, post_type, external_post_id, error_log, frequency, time_of_day, and the embedding column are named in the brief.)
A few details earn their place:
- Approval state lives as timestamps on the post (
content_approved_at,design_approved_at,publish_approved_at) rather than booleans. The gate cares not just whether a post was approved but when, which gives an audit trail for free and lets a rejection reset a single gate cleanly. KnowledgeSourcetoKnowledgeChunkis one-to-many, with the embedding stored on the chunk. A document is parsed once, split into roughly 1,500-token chunks, embedded, and stored, so retrieval pulls the relevant passages into the prompt instead of whole files.BackgroundJobis a real table, not just a Celery message. It carries anidempotency_keyand astate, which is what makes the job-claim gate possible and turns "did that generation actually run?" into a query rather than a guess.
AI generation and RAG
The two-phase pattern is the single trick that keeps the studio feeling instant despite multi-second model calls.
Generation in two phases: phase one creates placeholder posts and returns immediately; phase two does the slow model work in the background and advances each post through the workflow. The user sees posts appear at once and watches them fill in.
The ContentEngine pipeline runs in order: resolve platform, type, and caption style; load brand voice and schedule templates; do RAG retrieval over relevant knowledge chunks and recent posts; call the model through the provider fallback chain; then apply platform-specific optimization such as caption-length limits, hashtag rules, and design-brief templates. RAG is what makes the output sound like the brand instead of generic AI copy, because the model is grounded in the tenant's own uploaded documents.
The LLMRouter is cheap insurance with real teeth. When OpenAI rate-limits at the wrong moment, generation falls through to DeepSeek, Gemini, or Anthropic rather than failing the job.
The idempotency layer is where Redis's delivery model gets paid for. A Redis broker gives at-least-once delivery: once the visibility timeout elapses, a task can be redelivered even while a worker is still processing it. A long generation job can outrun that timeout, so the 24-hour idempotency key and the job-claim gate make a redelivered job a no-op instead of a second expensive model batch or a duplicate post. Teams that assume Celery on Redis is exactly-once skip this and find out the hard way, usually on a bill or a double publish.
Publishing and the failure boundary
This is where the system touches the part it does not control. After enough integrations you stop trusting any API you didn't write. You assume it will fail at the least convenient second, so you build the retry path before the happy path.
| Path | Trigger | Behavior |
|---|---|---|
| Scheduled | Beat every 60s | Find SCHEDULED posts with scheduled_at <= now(), lock with FOR UPDATE SKIP LOCKED, publish via the adapter |
| Post Now | POST /posts/{id}/publish-now | Synchronous in the API; ignores due time; records Gate 3 implicitly |
The scheduled publish path. Two beat ticks can overlap, so each worker grabs its rows with a lock that hides them from the other workers. SKIP LOCKED is the mechanism that lets you run many publishers in parallel without one post going out twice.
Two platform realities shape this boundary:
- Token resolution differs by platform. Meta publishing needs the tenant's app credentials plus a page token resolved through
meta_graph_service, while LinkedIn and YouTube use tokens stored on the profile. The adapter hides the difference, but the system still has to keep every token alive, which is what the token-refresh workers are for. - Media has to be fetchable by the platform. For standard image and carousel posts, Meta's Graph API does not accept a direct file upload. Its servers fetch the media from a public URL at publish time, so the file must be reachable then. That is why uploads land in S3 and publishing hands over either a public object or a presigned URL. The convenience is also the danger: flipping a bucket fully public to "just make it work" is how a tenant's unpublished drafts end up indexable by a search engine. Presigned, short-lived URLs are the disciplined answer.
A failed publish does not vanish. It lands in FAILED with an error_log, and /retry-publish walks it back to SCHEDULED once the credential or media problem is fixed.
Infrastructure and operations
docker-compose wires Postgres, Redis, the API, the Celery worker and beat, and the frontend with health checks for local and dev.
Celery Beat is the system's heartbeat:
| Cadence | Task | Why this interval |
|---|---|---|
| 60s | Publish due posts | Tightest tolerable lateness against worker load |
| Hourly | Pending-design check | Warns on designs approaching a 24h SLA |
| Daily | YouTube token refresh | Token expiry window |
| Daily | Meta token-expiry check | Proactive: fails before a post is due, not during it |
| Weekly | LinkedIn company token refresh | Longer-lived tokens |
The token-refresh cadence is quiet but load-bearing. OAuth tokens expire silently, and without proactive refresh the failure surfaces at the cruelest moment: a scheduled post is due and the person who could fix it is asleep. Refreshing on a schedule moves that failure into working hours.
Engagement reads are cached, not hammered. Platform APIs are rate-limited, and Instagram's Graph API allows on the order of 200 calls per hour per account, so analytics are cached in Redis for 10 minutes and comments for 3, with the cache cleared on interaction. A dashboard refresh hits Redis, not the platform.
Security posture:
| Surface | Control |
|---|---|
| AuthN/AuthZ | JWT bearer tokens; role guards (Super Admin, Tenant Admin, Content Manager, Graphic Designer, Viewer) |
| Tenant isolation | tenant_id enforced in services on every query; super-admin crosses tenants only via explicit tenant_id |
| Secrets | Tenant Meta credentials in SocialMediaKeys; OAuth tokens on profiles |
| Transport / origin | CORS via CORS_ORIGINS |
| Content validation | Schema layer (Pydantic) and service layer; platform media rules such as Instagram aspect ratios and YouTube requiring video |
Where load or failure would bite, and how the design absorbs it:
| Pressure point | Absorber |
|---|---|
| LLM latency or outage | Two-phase async + provider fallback chain |
| Duplicate job delivery (Redis) | Job-claim gate + 24h idempotency keys |
| Concurrent publishers | FOR UPDATE SKIP LOCKED row ownership |
| Expired OAuth tokens | Scheduled refresh workers |
| Platform read rate limits | Redis caching (10m / 3m) |
| Failed publish | FAILED + error_log → /retry-publish |
Outcome
What the architecture demonstrably delivers, from the brief:
- A content-aware workflow engine, built as a state machine with three hierarchically resolved approval gates, that advances posts on readiness rather than on a timer.
- RAG-grounded AI generation across four model providers with automatic failover, returning to the UI in milliseconds through the two-phase pattern.
- A designer handoff pipeline with brief export, S3 asset upload, and review and reject loops.
- Publishing and engagement through a clean adapter layer for LinkedIn, Meta, and YouTube, with concurrency-safe scheduled publishing and a real failure and retry path.
- Per-tenant isolation of both data and secrets, with five-role RBAC.