// LIVE: brahma-labs.com · Product deployed on each client's system

SMAD AI: Architecture Teardown

A multi-tenant SaaS that carries a social post from raw idea to live publication across LinkedIn, Meta (Facebook/Instagram), and YouTube. The scheduling was never the hard part. The hard part is keeping one state machine correct while three kinds of unreliable actor (language models, human designers, and third-party platform APIs) each fail in their own way and on their own clock.

Snapshot


What it is	Social Media Automation Dashboard: plan → generate → design → approve → schedule → publish → measure
Domain	Multi-tenant content-operations SaaS for marketing teams
Stage / status	Deployed for selected clients
Frontend	Next.js 15 (App Router), Tailwind, Shadcn/UI, Zustand, React Query, FullCalendar, dnd-kit
Backend	FastAPI, async SQLAlchemy, Pydantic; layered services and adapters; Alembic migrations
Data plane	PostgreSQL 16 + pgvector, Redis, AWS S3
Async plane	Celery worker and Beat, Redis broker
AI	LangChain over OpenAI / DeepSeek / Gemini / Anthropic behind a fallback router; RAG on pgvector
Publish targets	LinkedIn, Facebook, Instagram, YouTube
Packaging	docker-compose (Postgres, Redis, API, worker, beat, frontend) for local and dev

The problem

A team, not one person, pushes many posts through a pipeline full of handoffs. That single sentence is what breaks the obvious design. Five constraints sit underneath it.

Constraint	Why it bites
LLM latency and flakiness	A generation call takes seconds to tens of seconds and sometimes times out or rate-limits. Block the request on it and the UI hangs.
Four incompatible platforms	LinkedIn, Meta, and YouTube disagree on auth, media format, rate limits, and even how you hand them a file.
Human gates that vary per tenant	A post is only ready when copy is approved, a designer has delivered assets, and someone signed off, and each brand wants those rules slightly different.
Concurrency at publish time	Many posts come due in the same minute. Two workers must never publish the same one twice.
Per-tenant secrets	Each tenant brings its own Meta app credentials. Isolation is about secrets, not just rows.

A CRUD app with a cron job models exactly one thing: publish at time T. It has no opinion about whether the post is ready. So it does one of two bad things: publishes unapproved junk precisely on schedule, or freezes the UI while a model thinks. Most teams who build "a social scheduler" stop here and discover, six platforms in, that the scheduler was the easy 20%. The remaining 80% is a state machine that stays honest when a language model, a person, and a platform API can each fail independently.

SMAD's answer is to make readiness a first-class concept and to push everything slow or untrusted off the request path.

The architecture

The guiding principle is plain: the spine is a content-aware state machine, and anything slow or untrustworthy lives behind an idempotent async job layer so it can fail without taking down the request or a neighboring tenant.

The backend is layered so each concern has exactly one home. Endpoints stay thin (auth, validation, HTTP status), services own all business rules and transactions, adapters wrap every external system behind a stable interface, and Celery workers run whatever should not make a user wait.

How a request moves: the browser talks REST/JWT to FastAPI; anything slow (AI, publishing) is handed to Redis and run later by Celery; Postgres is the system of record, Redis is the queue and cache, S3 holds blobs, and the two boxes on the right are the systems nobody on the team controls.

Why each load-bearing choice was made

Decision	Chosen	Over	What it buys	What it costs
Vector store	pgvector inside Postgres	Pinecone / Qdrant / Weaviate	One datastore to run; vectors and their metadata commit in the same transaction; tenant isolation reuses the row filtering already in place	Weaker beyond ~1M vectors per tenant; index type must be chosen and tuned by hand
Message broker	Redis	RabbitMQ / managed SQS	Simplest thing to operate, and it doubles as the engagement cache	At-least-once delivery: a job hidden by the visibility timeout can be redelivered even while still running, so the app must be idempotent or it double-acts
Concurrency control	`SELECT … FOR UPDATE SKIP LOCKED`	Advisory locks / external lock service / a single publisher	N publish workers scale out and never collide on the same row; no extra infrastructure	Postgres-specific; transaction scope must be disciplined
AI generation	Two-phase: sync create, async fill	Synchronous model call in the request	The UI returns in milliseconds and never blocks on a 10 to 30 second model call or a provider timeout	Eventual consistency, so the client must poll job status
LLM access	LangChain + an `LLMRouter` fallback chain	One hardcoded provider	Survives a single provider's outage, rate-limit, or price spike by trying the next	Abstraction overhead; prompts must stay portable across models
Platform integration	Adapter pattern (`BaseSocialAdapter`)	Per-platform branches inside services	Four wildly different APIs, auth models, and media rules stay quarantined behind one interface	The interface is a lowest common denominator; platform-only features need careful escape hatches
Tenant scoping	Enforced in the service layer on every query	Route-guard or middleware only	Defense at the data-access layer: a forgotten endpoint guard cannot leak another tenant's rows	Requires discipline on every query; no database-level row security is noted
Auth	Stateless JWT bearer tokens	Server-side sessions	Horizontal scale with no session store; super-admins cross tenants via explicit `tenant_id`	Revocation is harder; token lifetimes need tuning

The pgvector call is the one I would defend hardest. For a per-tenant knowledge base (modest size, documents trickling in over time), a graph index like HNSW is the sensible default. It gives strong recall with little tuning, and unlike IVFFlat it can be built on an empty table because there is no training step, so a brand that uploads its first document still gets working search. Reaching for a dedicated vector database here would buy a second system to run and a consistency seam between the vectors and everything else, all to solve a scale this product does not have.

The workflow engine: the actual product

This is what separates SMAD from a scheduler. A post never advances on a clock. It advances only when its copy is complete and the gate guarding the next step is open.

The post lifecycle. Each arrow fires only when the work for that step is done and its gate is open; a rejected design loops back with notes, and a failed publish retries without re-running the whole pipeline.

Three gates govern the path, and each gate's policy is read from the most specific source available: Schedule, then Profile, then Tenant. A brand sets a default once and overrides it per profile or per recurring schedule without forking the logic.

Gate	Enforced when	Evidence required
Content (1)	Leaving `AI_GENERATED` toward design or scheduling	`content_approved_at` is set
Design (2)	Before a visual post becomes `SCHEDULED`	`design_approved_at` set and media assets present
Publish (3)	Before the actual platform API call	`publish_approved_at` set (recorded implicitly on "Post Now")

PostWorkflowService handles content-aware auto-promotion. When copy is complete and the relevant gate passes, the post moves on its own: AI_GENERATED to NEEDS_DESIGN, or a text-only post with scheduled_at straight to SCHEDULED. The post type (TEXT_ONLY, SINGLE_IMAGE, CAROUSEL, VIDEO, STORY, REEL) decides whether the design detour applies at all. Text skips it; anything visual has to route through a human.

The trap this design avoids is encoding the gate rules inside endpoints. Do that, and a designer-facing route and a calendar drag each grow their own slightly wrong copy of "is this ready?" One service holding the machine means there is exactly one definition of ready, and that definition is testable.

Data model

Schema is the backbone here, not a footnote. Two decisions make it sound. First, tenant_id rides on every table as the isolation root. Second, per-tenant secrets are split from per-profile OAuth tokens: SocialMediaKey holds a tenant's own Meta app credentials and is a separate entity from the access tokens stored on each SocialProfile. Conflate the two and one tenant ends up publishing through another's app.

The core entities. Tenant is the root every other row hangs from; Post is the workflow-bearing object; a knowledge source fans out into many embedded chunks; background jobs are real rows, not just queue messages. (Schema reconstructed from the attributes the brief describes. The id and key types are illustrative, while the approval timestamps, status, post_type, external_post_id, error_log, frequency, time_of_day, and the embedding column are named in the brief.)

A few details earn their place:

Approval state lives as timestamps on the post (content_approved_at, design_approved_at, publish_approved_at) rather than booleans. The gate cares not just whether a post was approved but when, which gives an audit trail for free and lets a rejection reset a single gate cleanly.
KnowledgeSource to KnowledgeChunk is one-to-many, with the embedding stored on the chunk. A document is parsed once, split into roughly 1,500-token chunks, embedded, and stored, so retrieval pulls the relevant passages into the prompt instead of whole files.
BackgroundJob is a real table, not just a Celery message. It carries an idempotency_key and a state, which is what makes the job-claim gate possible and turns "did that generation actually run?" into a query rather than a guess.

AI generation and RAG

The two-phase pattern is the single trick that keeps the studio feeling instant despite multi-second model calls.

Generation in two phases: phase one creates placeholder posts and returns immediately; phase two does the slow model work in the background and advances each post through the workflow. The user sees posts appear at once and watches them fill in.

The ContentEngine pipeline runs in order: resolve platform, type, and caption style; load brand voice and schedule templates; do RAG retrieval over relevant knowledge chunks and recent posts; call the model through the provider fallback chain; then apply platform-specific optimization such as caption-length limits, hashtag rules, and design-brief templates. RAG is what makes the output sound like the brand instead of generic AI copy, because the model is grounded in the tenant's own uploaded documents.

The LLMRouter is cheap insurance with real teeth. When OpenAI rate-limits at the wrong moment, generation falls through to DeepSeek, Gemini, or Anthropic rather than failing the job.

The idempotency layer is where Redis's delivery model gets paid for. A Redis broker gives at-least-once delivery: once the visibility timeout elapses, a task can be redelivered even while a worker is still processing it. A long generation job can outrun that timeout, so the 24-hour idempotency key and the job-claim gate make a redelivered job a no-op instead of a second expensive model batch or a duplicate post. Teams that assume Celery on Redis is exactly-once skip this and find out the hard way, usually on a bill or a double publish.

Publishing and the failure boundary

This is where the system touches the part it does not control. After enough integrations you stop trusting any API you didn't write. You assume it will fail at the least convenient second, so you build the retry path before the happy path.

Path	Trigger	Behavior
Scheduled	Beat every 60s	Find `SCHEDULED` posts with `scheduled_at <= now()`, lock with `FOR UPDATE SKIP LOCKED`, publish via the adapter
Post Now	`POST /posts/{id}/publish-now`	Synchronous in the API; ignores due time; records Gate 3 implicitly

The scheduled publish path. Two beat ticks can overlap, so each worker grabs its rows with a lock that hides them from the other workers. SKIP LOCKED is the mechanism that lets you run many publishers in parallel without one post going out twice.

Two platform realities shape this boundary:

Token resolution differs by platform. Meta publishing needs the tenant's app credentials plus a page token resolved through meta_graph_service, while LinkedIn and YouTube use tokens stored on the profile. The adapter hides the difference, but the system still has to keep every token alive, which is what the token-refresh workers are for.
Media has to be fetchable by the platform. For standard image and carousel posts, Meta's Graph API does not accept a direct file upload. Its servers fetch the media from a public URL at publish time, so the file must be reachable then. That is why uploads land in S3 and publishing hands over either a public object or a presigned URL. The convenience is also the danger: flipping a bucket fully public to "just make it work" is how a tenant's unpublished drafts end up indexable by a search engine. Presigned, short-lived URLs are the disciplined answer.

A failed publish does not vanish. It lands in FAILED with an error_log, and /retry-publish walks it back to SCHEDULED once the credential or media problem is fixed.

Infrastructure and operations

docker-compose wires Postgres, Redis, the API, the Celery worker and beat, and the frontend with health checks for local and dev.

Celery Beat is the system's heartbeat:

Cadence	Task	Why this interval
60s	Publish due posts	Tightest tolerable lateness against worker load
Hourly	Pending-design check	Warns on designs approaching a 24h SLA
Daily	YouTube token refresh	Token expiry window
Daily	Meta token-expiry check	Proactive: fails before a post is due, not during it
Weekly	LinkedIn company token refresh	Longer-lived tokens

The token-refresh cadence is quiet but load-bearing. OAuth tokens expire silently, and without proactive refresh the failure surfaces at the cruelest moment: a scheduled post is due and the person who could fix it is asleep. Refreshing on a schedule moves that failure into working hours.

Engagement reads are cached, not hammered. Platform APIs are rate-limited, and Instagram's Graph API allows on the order of 200 calls per hour per account, so analytics are cached in Redis for 10 minutes and comments for 3, with the cache cleared on interaction. A dashboard refresh hits Redis, not the platform.

Security posture:

Surface	Control
AuthN/AuthZ	JWT bearer tokens; role guards (Super Admin, Tenant Admin, Content Manager, Graphic Designer, Viewer)
Tenant isolation	`tenant_id` enforced in services on every query; super-admin crosses tenants only via explicit `tenant_id`
Secrets	Tenant Meta credentials in `SocialMediaKeys`; OAuth tokens on profiles
Transport / origin	CORS via `CORS_ORIGINS`
Content validation	Schema layer (Pydantic) and service layer; platform media rules such as Instagram aspect ratios and YouTube requiring video

Where load or failure would bite, and how the design absorbs it:

Pressure point	Absorber
LLM latency or outage	Two-phase async + provider fallback chain
Duplicate job delivery (Redis)	Job-claim gate + 24h idempotency keys
Concurrent publishers	`FOR UPDATE SKIP LOCKED` row ownership
Expired OAuth tokens	Scheduled refresh workers
Platform read rate limits	Redis caching (10m / 3m)
Failed publish	`FAILED` + `error_log` → `/retry-publish`

Outcome

What the architecture demonstrably delivers, from the brief:

A content-aware workflow engine, built as a state machine with three hierarchically resolved approval gates, that advances posts on readiness rather than on a timer.
RAG-grounded AI generation across four model providers with automatic failover, returning to the UI in milliseconds through the two-phase pattern.
A designer handoff pipeline with brief export, S3 asset upload, and review and reject loops.
Publishing and engagement through a clean adapter layer for LinkedIn, Meta, and YouTube, with concurrency-safe scheduled publishing and a real failure and retry path.
Per-tenant isolation of both data and secrets, with five-role RBAC.