Skip to content
B
All projects
AI productFull-stack engineer (end-to-end) Sep 1, 2024

SMSAI — AI-powered SMS services on AWS

An AI application that delivers LLM-powered services over plain SMS — built with React, FastAPI and DynamoDB on AWS, designed for low-bandwidth markets.

Channel
SMS-first
Latency
<2s p95
Cost / msg
Predictable
Infra
Serverless

Stack

ReactFastAPIPythonDynamoDBAWS LambdaLangChainTwilio

Problem

LLM apps assume you have a smartphone, a browser, and reliable bandwidth. In a lot of the world, none of that is true.

SMSAI started from a simple question: what if the only thing a user has is a basic phone and SMS? Could you still give them useful AI — search, translation, summarization, agriculture/health Q&A — through the most universal channel that exists?

Constraints

  • Channel limits. SMS is 160 chars per segment. The product had to feel useful within 1–3 segments.
  • Latency. Carriers will retry. The whole round-trip — user → carrier → API → LLM → reply — had to feel snappy.
  • Cost. Every message has a real per-unit cost. The architecture had to make cost predictable, not "depends on the prompt."
  • Reliability over magic. Better a slightly simpler answer that always arrives than a brilliant one that fails 5% of the time.

My role

End-to-end engineer. Designed the API and data model, built the FastAPI backend and React admin, integrated the LLM provider, wired up Twilio + AWS, set up CI/CD.

Architecture

   ┌────────┐    SMS     ┌─────────┐  HTTPS   ┌──────────────┐
   │ Phone  │──────────▶│ Twilio  │─────────▶│  API Gateway │
   └────────┘            └─────────┘          │   + Lambda   │
                                              │  (FastAPI)   │
                                              └──────┬───────┘

                              ┌──────────────────────┼──────────────────────┐
                              ▼                      ▼                      ▼
                       ┌──────────────┐      ┌────────────┐         ┌──────────────┐
                       │  DynamoDB    │      │  LLM API   │         │ S3 (logs /   │
                       │ (sessions,   │      │  + tools   │         │   audits)    │
                       │  rate limits)│      └────────────┘         └──────────────┘
                       └──────────────┘

Two big simplifications that made everything else easier:

  1. DynamoDB single-table design. One table, three access patterns: by phone, by session, by date. No relational joins, no schema migrations.
  2. FastAPI behind API Gateway + Lambda. Pay per request, autoscale to zero overnight, no servers to babysit.

A small React app handles ops: live message queue, error logs, prompt/tool versioning, per-user spend caps.

Key decisions

1. Treat the LLM as untrusted

The LLM is a remote API that can be slow, flaky, or expensive. Everything around it is built to compensate:

  • Hard timeouts on every call
  • Cached canned responses for the top intents (greeting, help, language switch)
  • A "router" before the LLM so simple intents skip the model entirely
  • Per-user rate limits stored in DynamoDB with TTL

The router alone took the LLM call rate down by ~35% in early testing.

2. SMS-aware prompting

SYSTEM = """You answer over SMS. Reply in <=2 messages of 160 chars.
No markdown. No emoji unless asked. If you must trim,
trim explanations first, never the answer."""

Prompts were treated as code: versioned in Git, A/B tested, and tied to specific tool sets. A "summarize" prompt is not allowed to call the "send_payment" tool — that boundary lives in the registry, not in the model's head.

3. Observability before scale

Every message hop is logged with a correlation ID. Dashboards show:

  • Median + p95 latency per intent
  • Cost per message (LLM + Twilio)
  • Failure mode breakdown (timeout vs LLM error vs tool error)

When something regresses, the dashboard tells me what and where in under a minute.

Why DynamoDB and not Postgres

Most "SMS in, reply out" workloads are key-value lookups with TTL. DynamoDB on-demand pricing matches the load shape, and there's no idle RDS instance burning money at 3am.

Results

  • SMS-first AI assistant running end-to-end on AWS
  • Sub-2s p95 latency including LLM call + carrier round-trip
  • Predictable per-message cost through the router + cap system
  • Zero ops servers — full serverless on Lambda + DynamoDB + S3

What I would do differently

  • Start with a structured eval harness (offline test suite + scoring) on day one — I added it after the fact and wished it had been there for every prompt change
  • Push more logic into Step Functions for retry / fanout instead of inside FastAPI handlers
  • Consider on-device fallbacks for languages where the LLM was weak — sometimes a 10MB local model beats a 200B remote one for SMS-length answers

Stack at a glance

  • Frontend (ops): React (Next.js compatible), TypeScript
  • Backend: FastAPI (Python), Lambda, API Gateway
  • Storage: DynamoDB, S3 (audit logs)
  • AI: LangChain orchestration, hosted LLM API + tool routing
  • Comms: Twilio Programmable SMS
  • Infra: AWS, GitHub Actions

Next case study

Pattern Miner & ECAN — research engineering at SingularityNET

Built components of a Pattern Miner system for AGI research and contributed to ECAN (Economic Attention Allocation) mechanisms at SingularityNET.