V.Trivedy_
← Architectural Teardowns

Clixs.ai: live photos, searchable in seconds

A guest uploads a selfie and gets back every photo they're in. Not next week. While the party is still on. The face matching is the easy part. AWS Rekognition does that. The hard part is the camera. Pull frames off a Canon, Nikon, or Sony body, across the venue's WiFi, into the cloud, and make them searchable in seconds. Do that without leaking one event's photos into another's, on a backend that charges for every call. That's the system.


Snapshot

ProductClixs.ai. Live photo sharing for events: weddings, conferences, corporate, sports, graduations.
PositioningMarketed as India's first live Cam2Cloud platform. Photos delivered as they're clicked (the company's words).
The edgePhotos go straight from the camera to the cloud, so guests find them during the event, not days later.
StageIn production. Mid-migration from an album model to an event/function/camera model.
BackendFastAPI (Python, async) with in-process background workers.
FrontendNext.js 14 App Router, TypeScript, Tailwind, Framer Motion.
DatastoresMongoDB for metadata and billing. SQLite for the edge upload queue and credentials.
CloudAWS S3 for storage, Rekognition for face vectors, SQS for event fan-in.
Edge ingestionA custom FTP/FTPS service, 3 instances, handling Canon, Nikon, and Sony cameras.
TenancyEvent-scoped. One Rekognition collection per event.
PaymentsRazorpay (INR). Per-event, per-photo pricing. Coupons, webhooks, fraud checks.
SurfacesMarketing clixs.photo. App app.clixs.ai. Find page app.clixs.ai/find.
Production scale10,000+ customers, 2.5M+ photos matched (company figures). Photo discovery averages 25s, 2 min at the tail. Ingest runs at 10,000 photos a minute and scales out.

The problem

The promise is speed. A guest on the dance floor scans a QR code, uploads a selfie, and finds the shot the photographer took two minutes ago. "Delivered as they're clicked" is not a slogan. It is a deadline. The clock starts at the shutter and runs across the venue's WiFi, into the cloud, through face indexing, and back to a phone, in seconds. Four things make that deadline hard to hit.

ConstraintWhy it bites
Live edge ingestionThe frames start on Canon, Nikon, and Sony bodies. These are sealed appliances. They speak their own dialect of FTP over congested venue WiFi that drops mid-transfer. You don't control the client and you can't fix the network.
Per-event isolationA guest's selfie must match only that event's photos. A cross-event match is a privacy breach. Your face turning up in a stranger's wedding gallery.
Instant search, heavy indexingThe guest is waiting. You can't index a continuous stream of full-res frames inside that wait. But the stream has to become searchable almost at once.
Metered AIRekognition charges per IndexFaces and per SearchFacesByImage. Every call also burns a paid quota. Index one frame twice and you pay twice, and you bill the customer for work you didn't do.

The obvious build fails on every count. One shared face index leaks faces between events. A synchronous cloud upload stalls the camera and backs up the next frame. With no idempotency, every venue-WiFi retry turns into money.

Two rules run through the design. On a pay-per-call AI, the unit of correctness is not the row. It's the line on the customer's invoice. So duplicate-safety comes before speed. And "live" is a deadline, not a feature. The system is a latency pipeline. Its slowest hop is the product.


The architecture

Two hard problems, one on each side of the pipeline.

Start with search. The search primitive is a face-vector lookup, and Rekognition's unit of search is the collection. So each event's 8-character event_code is its Rekognition collection ID. Isolation stops being app logic you can get wrong. It becomes a property of which collection you query. You cannot match a guest against another event, because you are searching one collection and only one.

Now ingest. The rule is simple: never let the camera wait on the cloud. Indexing is asynchronous and queue-backed. Search is synchronous, at request time. At the edge, the FTP handler does one thing. It writes the frame to local disk, fast. A durable queue and a worker pool carry it to S3 behind that. The camera sees a quick local transfer and fires the next shot. The upload, the SQS event, and the indexing all happen in the seconds after, off the camera's clock.

DecisionChoseOverWhyWhat it costs
Tenant boundaryOne Rekognition collection per event (event_code = collection ID)One shared collection filtered by event_idIsolation is structural. Per-event delete is one DeleteCollection. No documented cap on collections per account, so it scales across many tenants.event_code is now load-bearing identity tied to an external resource. Renaming or merging events becomes a migration. Rekognition TPS is account-global, shared by every collection.
Edge ingestionA custom FTP/FTPS service: land-to-staging, durable SQLite queue, retrying workersFTP inside the FastAPI app, or a synchronous upload inside the FTP sessionA stalled camera never starves the API. The camera never waits on a slow cloud leg. Frames survive a restart.A second service, a second datastore, a credential-sync contract, and a whole class of edge failures to own.
Ingest vs searchAsync indexing (S3 → SQS → consumer). Synchronous search at request time.Sync on both, or async on bothA continuous stream indexes in the background while guests still get sub-second results.Two code paths. An eventual-consistency window. Both fight for the same Rekognition TPS.
Upload fan-inS3 ObjectCreated → SQS → one consumer for every sourceEach uploader indexing its own facesOne indexing path, whatever the source: Cam2Cloud, web, or legacy. S3 is the only trigger.SQS standard delivery is at-least-once. The consumer has to be idempotent or it double-indexes and double-bills.
Auth surfaceJWT and RBAC enforced at the API. Client-side route guards for UX only.Next.js middleware as the security gateThe real boundary is the API. Client guards are for show.Next.js auth middleware is off. Every protected resource has to be checked server-side, or it's exposed.

The full system. Cameras flow through the custom Cam2Cloud service. Web and guest traffic go through the API. Both land objects in S3. S3 fires one SQS stream into one consumer, which indexes faces into the right per-event Rekognition collection. Guest search skips the queue and hits Rekognition directly.


Cam2Cloud: the hardest part

This is the part the team found hardest, and it's worth saying why. The difficulty isn't the happy path. It's everything that goes wrong between the shutter and the guest's screen, at a live event where there's no second take.

Here's the inversion. In normal client/server work, you control the client. Here the clients are sealed cameras. Canon, Nikon, and Sony bodies and their wireless FTP transmitters, each speaking FTP its own way, each with fixed retry and timeout behavior you can't change, and no way to push a firmware fix. The server has to bend to the device, because the device will never bend to the server. Now add the network. Venue WiFi is built for guests, not uploads. It's congested, NAT'd, sometimes behind a captive portal, and it drops connections mid-transfer. Every assumption a clean FTP server makes is wrong here.

What's true at the edgeWhy it breaks the naive buildWhat the service does
Cameras are sealed appliances with fixed, quirky FTP/FTPS behavior you can't patchA server that expects well-behaved clients rejects them or mishandles themThe handler and authorizer are built around the real devices, not the spec. Plain FTP and FTPS are both supported, because some firmware can't do TLS reliably.
Venue WiFi drops connections mid-transferA synchronous upload stalls, and frames are lostLand-to-staging. The FTP leg writes locally and fast. A durable queue and retrying workers move the frame to S3 after. The camera never waits on the cloud.
A dropped transfer leaves a truncated fileIndexing half a JPEG wastes a paid call and corrupts resultsA frame joins the queue only after the transfer is confirmed complete and the file is whole. A FileJanitor sweeps up the debris from aborted transfers.
The edge box restarts. The backend goes briefly unreachable.In-memory work vanishes. A remote queue needs connectivity the venue can't promise.A persistent SQLite queue on local disk. It's ACID, embedded, needs no network to enqueue, and survives a reboot.
Checking every login against the backend ties the edge to backend uptimeOne backend hiccup locks every camera out, mid-eventCredentials are pushed once to the service's local SQLite, stored as a SHA-256 hash. The SQLiteAuthorizer validates logins offline. To revoke, push a delete.
One event's camera must not see another's framesShared staging leaks frames between tenantsEach upload key is chroot-jailed to its own directory. The worker maps that staging path to the right tenant's S3 prefix, and so to the right Rekognition collection.
At a live event, a silent stall is invisible until guests complainMissing photos that nobody sees comingPrometheus metrics: connections, queue depth, upload success and retry, latency. An operator can watch frames flow during the event.

Shutter to searchable. Splitting the FTP leg (fast, local, on the camera's clock) from the S3 upload (slow, retryable) is the move that makes "delivered as they're clicked" real. The camera's clock never depends on the cloud's. A frame is still searchable seconds after it lands.

The backend's only job in this loop is paperwork. When a host makes an upload key in the dashboard, UploadKeyService hashes the secret into MongoDB and FTPNotificationService posts the credential to the FTP service's notification API, which is protected by a bearer token. After that, the edge service runs on its own. That independence is the whole reason to draw the microservice boundary here.

And it holds up at volume. The service takes in around 10,000 photos a minute, and you add throughput by adding instances.

The other clock: synchronous guest search

Ingestion is slow and forgiving. Search is fast and unforgiving. The guest is waiting, so the request goes straight to Rekognition and back. The same path enforces the quota.

The synchronous clock. The quota check is the same gate that returns 402 Payment Required when an event has used up its plan. Billing enforcement, on the hot path.


Data model

Schema is the one layer you don't get to quietly refactor on a slow afternoon. The moment real data lands on it, it's load-bearing. Here it does something unusual. The primary key of an event is also the identity of an AWS resource.

The current model. The legacy albums/photos pair at the bottom still runs alongside the media/functions/cameras tree and shares face_index. A migration in flight, not dead code.

Schema decisionWhat it doesWhat it buys, and costs
event_code = Rekognition collection IDOne identity for the DB key and the AI resourceIsolation is enforced in the data layer, not just app code. But event_code can never change casually. It's wired into an external system.
host_sharing_key separate from event_codeA 16-char unlisted gallery URL, distinct from the face-search codeA host can share the full gallery without exposing the face-search endpoint or the internal ID. Revoke one link and the other still works.
Hierarchy mirrors the S3 prefixEvent → Function → Camera → Media maps 1:1 to the storage keyOne structure is the DB tree, the storage path, and the access scope at once. Where a frame lives and who can see it never drift apart.
face_ids on media, detail in face_indexA small array on the hot document. Rich match data in a side collection.Hot-path reads stay cheap. The expensive per-face detail is paged only when you resolve matches.
SHA-256 hash of the upload-key secret, in backend and edgeThe camera secret is never stored in plaintext anywhereLeak either store and you get no working credential. The edge validates logins without calling the backend.
status enum on the eventOne field for publish, share, and billing statePublic sharing, the guest waitlist, and the 402 gate all read one field.
Two media models, side by sideA strangler-fig migration. The consumer branches on the S3 key format.New features land on the new model while old data keeps working. But every read path carries a branch, and that branch is where bugs hide.

One small tell of the same shift: the S3 prefix is still mifotos/. A fossil of the product's old name. Harmless, but it shows how a rebrand reaches the marketing site long before it reaches the storage keys. Internal names are the hardest to change, for the same reason event_code is. They hold the system up.


Infrastructure and operations

Runtime. The async FastAPI API and the Next.js frontend ship to Ubuntu through shell scripts (deploy.sh). The Cam2Cloud service ships on its own (deploy_ftp.sh, with Docker Compose available). The SQS consumer and the hourly ZIP-cleanup worker start inside the API process (main.py), not as separate daemons. Simple to run, with one scaling catch noted below.

Tenant isolation is layered

Isolation isn't one wall. It's five.

LayerMechanismWhat it isolates
Identity and computeJWT (python-jose, 24h) plus RBAC. User roles (event_host, photographer, event_manager, wedding_planner, admin) and per-event member roles (creator, admin, member, viewer) with permissions.Who can touch which event
Face dataOne Rekognition collection per eventCross-event matching. Impossible by construction. Not a filter.
Object storageS3 key prefix per user, event, function, camera. Presigned URLs are per-object and time-boxed.Media access and download links
MetadataMongoDB documents scoped by user_id and event membershipQuery-level tenant separation
Camera credentialsSHA-256 hash of key and secret in SQLite, plus a chroot-jailed FTP home per keyOne camera can't reuse another's credentials or read its frames

The rest of the security model, from the codebase. Razorpay webhooks are checked with HMAC-SHA256 over the raw request body against X-Razorpay-Signature, and duplicate webhooks are de-duped on the event-id header. The Cam2Cloud notification API needs a bearer token. Fraud scoring runs on event creation and on payments. Uploads are checked for type and size; Rekognition itself caps input images at 5 MB and takes PNG or JPEG. HTTPS is forced by middleware in production. Host galleries are shared only when status is published.

Billing and usage gating

The flow is order-first. POST /payments/create-order makes a Razorpay order. A signature-checked webhook confirms it and provisions an event_subscriptions record. From there, UsageTrackingService checks every upload and every guest search against the plan and returns 402 on overage.

Two things to watch. The webhook handler has to verify against the raw body and dedupe on the event-id, because Razorpay delivers at-least-once. And the live pricing has moved past the code. The product now sells per-event, per-photo: ₹0.89 a photo on Starter, ₹0.69 on Pro, down to ₹0.29 in bulk, first event free, 500 GB up to unlimited storage. The codebase overview still describes monthly tiers. The enforcement is the same either way. The pricing gap is one more sign of a system that outgrew its own description.

Where load and failure bite

Pressure pointWhat happens under loadWhat absorbs it, and what to watch
The "live" deadlineShutter to searchable runs through staging, the SQLite queue, S3, SQS, the consumer, Rekognition, then Mongo. It lands at about 25s on average, 2 min at the tail. Any one stall pushes the tail out.The async design makes "live" possible: the camera never waits on the cloud. It also makes it fragile: every hop can stall. Instrument the full chain end-to-end and alert when the tail drifts past 2 min. The edge Prometheus metrics cover only the first leg. [watch]
Rekognition account TPS (global)Every event shares one TPS budget. Two big events indexing at once can throttle indexing and live search for a third, unrelated event.SQS buffers indexing off the request path, and RekognitionService bounds concurrency with a thread pool. But synchronous search competes for the same budget. Per-event collections isolate data, not throughput. Levers: raise the TPS quota, give search a priority lane over batch indexing, or rate-limit ingest per account. [watch]
SQS at-least-once deliveryA duplicate ObjectCreated would become a duplicate IndexFaces. Corrupted matches, and inflated usage that over-bills or burns quota.The consumer dedupes on the object key, so a redelivered event is a no-op. Keep that, run a visibility timeout around 6x the processing time, and send poison messages to a DLQ. Idempotency here is a billing control.
Usage counters in MongoDBPer-upload and per-search increments are a write hotspot. Read-modify-write loses updates under load, and the quota drifts.Use atomic $inc. The 402 gate has to read a consistent counter. [watch]
In-process workers vs API scale-outN API replicas spawn N SQS consumers (fine, they compete for messages) but also N hourly cleanup crons (racing S3 list and delete).Consumers scale cleanly. The cleanup job should be leader-elected, or moved to a single scheduler, the moment the API runs more than one replica. [watch]
Bulk ZIP downloadZipping hundreds of full-res photos in-request would blow memory and block.download_links plus presigned URLs keep big downloads off the hot path. The cleanup worker expires stale ZIPs after 6h and old links after 90 days.

Testing and deploy, as described. Unit and API tests in backend/app/tests/ cover photos, albums, guest flows, and security. The edge service has lab and integration scripts. Deployment is script-based on Ubuntu.


Outcome

The codebase shows what was built. The live site shows positioning and traction. Kept apart, both are honest.

DimensionSourceStatus
Live Cam2Cloud as the edgeMarketed as India's first live Cam2Cloud platform. Canon, Nikon, Sony, and AWS listed as technology partners (clixs.photo).Company's claim
Per-event face capacityUp to 20 million face vectors per Rekognition collection. For a single event, effectively no ceiling.Designed limit
Ingestion paths shippedCam2Cloud (the custom edge service, 3 instances), web multipart, and legacy album. All funnel into one S3 → SQS → index pipeline.Built
How it evolvedFrom a conference-only album gallery to a full event/function/camera SaaS with live Cam2Cloud, payments, branding, and team roles. The album-to-media migration is still in flight.Done, and ongoing
Commercial traction10,000+ customers, 2.5 million+ photos matched, 99.8% match accuracy (company figures).Reported
Engineering scalePhoto discovery averages 25s, 2 min at the tail. Ingest runs at 10,000 photos a minute, scaled by adding FTP instances.Measured

The system proved out a clean tenant boundary (one collection per event), a search path that stays fast, and the hard one: an edge service that turns sealed cameras on bad venue WiFi into a reliable live feed to the cloud. The proof is in the clock. Photos surface in about 25 seconds, 2 minutes at the worst, while the service takes in 10,000 a minute. It did all of this while a live data-model migration ran underneath it.


What I'd watch

Five things I'd keep an eye on.

"Live" is a deadline, and the slowest hop owns it. Today it runs about 25 seconds, 2 minutes at the tail. That headroom is thin, and the chain is long: staging, the SQLite queue, S3, SQS, Rekognition, Mongo. The edge metrics only see the first leg. Measure the whole chain end-to-end and alert when the tail drifts past 2 minutes. Skip that, and "delivered as they're clicked" quietly becomes "delivered later," and you'll hear it from guests before you see it on a graph.

Per-event collections isolate data, not throughput. Rekognition TPS is account-global. The day two big events ingest at once is the day a third event's guests wait. Pick the priority rule before a flagship wedding, not during it. Let synchronous search jump the queue ahead of batch indexing, or throttle ingest per account. Raising the quota only moves the ceiling.

Idempotency is a billing control, not housekeeping. The S3 to SQS path is at-least-once, and venue WiFi guarantees retries. The consumer already dedupes on the object key, so a redelivered event does nothing. Keep it that way. The day someone "optimizes" that check out, one retry double-indexes a face and double-charges a paying customer's quota.