V.Trivedy_
← Architectural Teardowns

SMAD AI: Architecture Teardown

A multi-tenant SaaS that carries a social post from raw idea to live publication across LinkedIn, Meta (Facebook/Instagram), and YouTube. The scheduling was never the hard part. The hard part is keeping one state machine correct while three kinds of unreliable actor (language models, human designers, and third-party platform APIs) each fail in their own way and on their own clock.


Snapshot

What it isSocial Media Automation Dashboard: plan → generate → design → approve → schedule → publish → measure
DomainMulti-tenant content-operations SaaS for marketing teams
Stage / statusDeployed for selected clients
FrontendNext.js 15 (App Router), Tailwind, Shadcn/UI, Zustand, React Query, FullCalendar, dnd-kit
BackendFastAPI, async SQLAlchemy, Pydantic; layered services and adapters; Alembic migrations
Data planePostgreSQL 16 + pgvector, Redis, AWS S3
Async planeCelery worker and Beat, Redis broker
AILangChain over OpenAI / DeepSeek / Gemini / Anthropic behind a fallback router; RAG on pgvector
Publish targetsLinkedIn, Facebook, Instagram, YouTube
Packagingdocker-compose (Postgres, Redis, API, worker, beat, frontend) for local and dev

The problem

A team, not one person, pushes many posts through a pipeline full of handoffs. That single sentence is what breaks the obvious design. Five constraints sit underneath it.

ConstraintWhy it bites
LLM latency and flakinessA generation call takes seconds to tens of seconds and sometimes times out or rate-limits. Block the request on it and the UI hangs.
Four incompatible platformsLinkedIn, Meta, and YouTube disagree on auth, media format, rate limits, and even how you hand them a file.
Human gates that vary per tenantA post is only ready when copy is approved, a designer has delivered assets, and someone signed off, and each brand wants those rules slightly different.
Concurrency at publish timeMany posts come due in the same minute. Two workers must never publish the same one twice.
Per-tenant secretsEach tenant brings its own Meta app credentials. Isolation is about secrets, not just rows.

A CRUD app with a cron job models exactly one thing: publish at time T. It has no opinion about whether the post is ready. So it does one of two bad things: publishes unapproved junk precisely on schedule, or freezes the UI while a model thinks. Most teams who build "a social scheduler" stop here and discover, six platforms in, that the scheduler was the easy 20%. The remaining 80% is a state machine that stays honest when a language model, a person, and a platform API can each fail independently.

SMAD's answer is to make readiness a first-class concept and to push everything slow or untrusted off the request path.


The architecture

The guiding principle is plain: the spine is a content-aware state machine, and anything slow or untrustworthy lives behind an idempotent async job layer so it can fail without taking down the request or a neighboring tenant.

The backend is layered so each concern has exactly one home. Endpoints stay thin (auth, validation, HTTP status), services own all business rules and transactions, adapters wrap every external system behind a stable interface, and Celery workers run whatever should not make a user wait.

How a request moves: the browser talks REST/JWT to FastAPI; anything slow (AI, publishing) is handed to Redis and run later by Celery; Postgres is the system of record, Redis is the queue and cache, S3 holds blobs, and the two boxes on the right are the systems nobody on the team controls.

Why each load-bearing choice was made

DecisionChosenOverWhat it buysWhat it costs
Vector storepgvector inside PostgresPinecone / Qdrant / WeaviateOne datastore to run; vectors and their metadata commit in the same transaction; tenant isolation reuses the row filtering already in placeWeaker beyond ~1M vectors per tenant; index type must be chosen and tuned by hand
Message brokerRedisRabbitMQ / managed SQSSimplest thing to operate, and it doubles as the engagement cacheAt-least-once delivery: a job hidden by the visibility timeout can be redelivered even while still running, so the app must be idempotent or it double-acts
Concurrency controlSELECT … FOR UPDATE SKIP LOCKEDAdvisory locks / external lock service / a single publisherN publish workers scale out and never collide on the same row; no extra infrastructurePostgres-specific; transaction scope must be disciplined
AI generationTwo-phase: sync create, async fillSynchronous model call in the requestThe UI returns in milliseconds and never blocks on a 10 to 30 second model call or a provider timeoutEventual consistency, so the client must poll job status
LLM accessLangChain + an LLMRouter fallback chainOne hardcoded providerSurvives a single provider's outage, rate-limit, or price spike by trying the nextAbstraction overhead; prompts must stay portable across models
Platform integrationAdapter pattern (BaseSocialAdapter)Per-platform branches inside servicesFour wildly different APIs, auth models, and media rules stay quarantined behind one interfaceThe interface is a lowest common denominator; platform-only features need careful escape hatches
Tenant scopingEnforced in the service layer on every queryRoute-guard or middleware onlyDefense at the data-access layer: a forgotten endpoint guard cannot leak another tenant's rowsRequires discipline on every query; no database-level row security is noted
AuthStateless JWT bearer tokensServer-side sessionsHorizontal scale with no session store; super-admins cross tenants via explicit tenant_idRevocation is harder; token lifetimes need tuning

The pgvector call is the one I would defend hardest. For a per-tenant knowledge base (modest size, documents trickling in over time), a graph index like HNSW is the sensible default. It gives strong recall with little tuning, and unlike IVFFlat it can be built on an empty table because there is no training step, so a brand that uploads its first document still gets working search. Reaching for a dedicated vector database here would buy a second system to run and a consistency seam between the vectors and everything else, all to solve a scale this product does not have.


The workflow engine: the actual product

This is what separates SMAD from a scheduler. A post never advances on a clock. It advances only when its copy is complete and the gate guarding the next step is open.

The post lifecycle. Each arrow fires only when the work for that step is done and its gate is open; a rejected design loops back with notes, and a failed publish retries without re-running the whole pipeline.

Three gates govern the path, and each gate's policy is read from the most specific source available: Schedule, then Profile, then Tenant. A brand sets a default once and overrides it per profile or per recurring schedule without forking the logic.

GateEnforced whenEvidence required
Content (1)Leaving AI_GENERATED toward design or schedulingcontent_approved_at is set
Design (2)Before a visual post becomes SCHEDULEDdesign_approved_at set and media assets present
Publish (3)Before the actual platform API callpublish_approved_at set (recorded implicitly on "Post Now")

PostWorkflowService handles content-aware auto-promotion. When copy is complete and the relevant gate passes, the post moves on its own: AI_GENERATED to NEEDS_DESIGN, or a text-only post with scheduled_at straight to SCHEDULED. The post type (TEXT_ONLY, SINGLE_IMAGE, CAROUSEL, VIDEO, STORY, REEL) decides whether the design detour applies at all. Text skips it; anything visual has to route through a human.

The trap this design avoids is encoding the gate rules inside endpoints. Do that, and a designer-facing route and a calendar drag each grow their own slightly wrong copy of "is this ready?" One service holding the machine means there is exactly one definition of ready, and that definition is testable.


Data model

Schema is the backbone here, not a footnote. Two decisions make it sound. First, tenant_id rides on every table as the isolation root. Second, per-tenant secrets are split from per-profile OAuth tokens: SocialMediaKey holds a tenant's own Meta app credentials and is a separate entity from the access tokens stored on each SocialProfile. Conflate the two and one tenant ends up publishing through another's app.

The core entities. Tenant is the root every other row hangs from; Post is the workflow-bearing object; a knowledge source fans out into many embedded chunks; background jobs are real rows, not just queue messages. (Schema reconstructed from the attributes the brief describes. The id and key types are illustrative, while the approval timestamps, status, post_type, external_post_id, error_log, frequency, time_of_day, and the embedding column are named in the brief.)

A few details earn their place:

  • Approval state lives as timestamps on the post (content_approved_at, design_approved_at, publish_approved_at) rather than booleans. The gate cares not just whether a post was approved but when, which gives an audit trail for free and lets a rejection reset a single gate cleanly.
  • KnowledgeSource to KnowledgeChunk is one-to-many, with the embedding stored on the chunk. A document is parsed once, split into roughly 1,500-token chunks, embedded, and stored, so retrieval pulls the relevant passages into the prompt instead of whole files.
  • BackgroundJob is a real table, not just a Celery message. It carries an idempotency_key and a state, which is what makes the job-claim gate possible and turns "did that generation actually run?" into a query rather than a guess.

AI generation and RAG

The two-phase pattern is the single trick that keeps the studio feeling instant despite multi-second model calls.

Generation in two phases: phase one creates placeholder posts and returns immediately; phase two does the slow model work in the background and advances each post through the workflow. The user sees posts appear at once and watches them fill in.

The ContentEngine pipeline runs in order: resolve platform, type, and caption style; load brand voice and schedule templates; do RAG retrieval over relevant knowledge chunks and recent posts; call the model through the provider fallback chain; then apply platform-specific optimization such as caption-length limits, hashtag rules, and design-brief templates. RAG is what makes the output sound like the brand instead of generic AI copy, because the model is grounded in the tenant's own uploaded documents.

The LLMRouter is cheap insurance with real teeth. When OpenAI rate-limits at the wrong moment, generation falls through to DeepSeek, Gemini, or Anthropic rather than failing the job.

The idempotency layer is where Redis's delivery model gets paid for. A Redis broker gives at-least-once delivery: once the visibility timeout elapses, a task can be redelivered even while a worker is still processing it. A long generation job can outrun that timeout, so the 24-hour idempotency key and the job-claim gate make a redelivered job a no-op instead of a second expensive model batch or a duplicate post. Teams that assume Celery on Redis is exactly-once skip this and find out the hard way, usually on a bill or a double publish.


Publishing and the failure boundary

This is where the system touches the part it does not control. After enough integrations you stop trusting any API you didn't write. You assume it will fail at the least convenient second, so you build the retry path before the happy path.

PathTriggerBehavior
ScheduledBeat every 60sFind SCHEDULED posts with scheduled_at <= now(), lock with FOR UPDATE SKIP LOCKED, publish via the adapter
Post NowPOST /posts/{id}/publish-nowSynchronous in the API; ignores due time; records Gate 3 implicitly

The scheduled publish path. Two beat ticks can overlap, so each worker grabs its rows with a lock that hides them from the other workers. SKIP LOCKED is the mechanism that lets you run many publishers in parallel without one post going out twice.

Two platform realities shape this boundary:

  • Token resolution differs by platform. Meta publishing needs the tenant's app credentials plus a page token resolved through meta_graph_service, while LinkedIn and YouTube use tokens stored on the profile. The adapter hides the difference, but the system still has to keep every token alive, which is what the token-refresh workers are for.
  • Media has to be fetchable by the platform. For standard image and carousel posts, Meta's Graph API does not accept a direct file upload. Its servers fetch the media from a public URL at publish time, so the file must be reachable then. That is why uploads land in S3 and publishing hands over either a public object or a presigned URL. The convenience is also the danger: flipping a bucket fully public to "just make it work" is how a tenant's unpublished drafts end up indexable by a search engine. Presigned, short-lived URLs are the disciplined answer.

A failed publish does not vanish. It lands in FAILED with an error_log, and /retry-publish walks it back to SCHEDULED once the credential or media problem is fixed.


Infrastructure and operations

docker-compose wires Postgres, Redis, the API, the Celery worker and beat, and the frontend with health checks for local and dev.

Celery Beat is the system's heartbeat:

CadenceTaskWhy this interval
60sPublish due postsTightest tolerable lateness against worker load
HourlyPending-design checkWarns on designs approaching a 24h SLA
DailyYouTube token refreshToken expiry window
DailyMeta token-expiry checkProactive: fails before a post is due, not during it
WeeklyLinkedIn company token refreshLonger-lived tokens

The token-refresh cadence is quiet but load-bearing. OAuth tokens expire silently, and without proactive refresh the failure surfaces at the cruelest moment: a scheduled post is due and the person who could fix it is asleep. Refreshing on a schedule moves that failure into working hours.

Engagement reads are cached, not hammered. Platform APIs are rate-limited, and Instagram's Graph API allows on the order of 200 calls per hour per account, so analytics are cached in Redis for 10 minutes and comments for 3, with the cache cleared on interaction. A dashboard refresh hits Redis, not the platform.

Security posture:

SurfaceControl
AuthN/AuthZJWT bearer tokens; role guards (Super Admin, Tenant Admin, Content Manager, Graphic Designer, Viewer)
Tenant isolationtenant_id enforced in services on every query; super-admin crosses tenants only via explicit tenant_id
SecretsTenant Meta credentials in SocialMediaKeys; OAuth tokens on profiles
Transport / originCORS via CORS_ORIGINS
Content validationSchema layer (Pydantic) and service layer; platform media rules such as Instagram aspect ratios and YouTube requiring video

Where load or failure would bite, and how the design absorbs it:

Pressure pointAbsorber
LLM latency or outageTwo-phase async + provider fallback chain
Duplicate job delivery (Redis)Job-claim gate + 24h idempotency keys
Concurrent publishersFOR UPDATE SKIP LOCKED row ownership
Expired OAuth tokensScheduled refresh workers
Platform read rate limitsRedis caching (10m / 3m)
Failed publishFAILED + error_log/retry-publish

Outcome

What the architecture demonstrably delivers, from the brief:

  • A content-aware workflow engine, built as a state machine with three hierarchically resolved approval gates, that advances posts on readiness rather than on a timer.
  • RAG-grounded AI generation across four model providers with automatic failover, returning to the UI in milliseconds through the two-phase pattern.
  • A designer handoff pipeline with brief export, S3 asset upload, and review and reject loops.
  • Publishing and engagement through a clean adapter layer for LinkedIn, Meta, and YouTube, with concurrency-safe scheduled publishing and a real failure and retry path.
  • Per-tenant isolation of both data and secrets, with five-role RBAC.