Stateless API, Stateful Sessions: The Best of Both Worlds

Stateless API, Stateful Sessions: The Best of Both Worlds
Here's a dirty secret about most AI applications: they're either blazingly fast but amnesiac, or they remember everything but crawl like molasses. The industry has been selling you a false dichotomy—stateless speed versus stateful intelligence.
At JobInterview.live, we built the AI Board Room on a simple premise: why not both?
Our architecture reconstructs complete session state on every API call using Redis hot cache backed by Postgres persistence. It's stateless from the API's perspective, but your AI agents—Atlas, Cipher, Nova, and the rest—remember every nuance of your conversation. This isn't just clever engineering; it's the foundation that lets solo founders have genuinely productive conversations with AI, not just rapid-fire Q&A sessions.
Key Takeaways
- Stateless APIs don't mean stateless experiences—reconstruct rich context on-demand for the best of both worlds
- Redis + Postgres hybrid architecture delivers sub-100ms session loading with bulletproof persistence
- The "User Dossier" pattern enables true personalization without sacrificing horizontal scalability
- Deterministic backbones (Google ADK) combined with stateful context create reliable, context-aware AI agents
- Action Extraction turns conversational state into executable tasks, bridging talk and work
The Problem: Context is Expensive, Amnesia is Useless
Most chatbots are idiots with short-term memory loss.
You tell them your business context. They forget it three messages later. You explain your constraints. Gone. You share your goals. Vanished into the token void.
The typical "solution"? Cram everything into the prompt window. But context windows, even at 2M tokens, have a fatal flaw: they're linear cost structures. Every API call pays the full price of your entire conversation history. Your costs scale with engagement—the exact opposite of what you want as a founder building a product people actually use.
The alternative—truly stateful servers with sticky sessions—creates operational nightmares. Load balancing becomes complex. Horizontal scaling requires session migration. Server restarts mean lost state. It's the architecture that enterprise loves and startups die from.
The Insight: State is Data, Not Architecture
Here's where we get provocative: state doesn't belong in your API layer.
State is data. Treat it like data. Store it in data systems—fast ones for reads, durable ones for writes. Then reconstruct it on-demand, every single request.
The AI Board Room's architecture is radically simple:
- Every API call is stateless—no session affinity, no sticky routing, pure horizontal scale
- Every conversation has a session_id—a simple UUID that unlocks everything
- Redis holds the hot state—last 50 messages, User Dossier, active Skills, current agent assignments
- Postgres holds the truth—complete conversation history, extracted actions, agent performance metrics
- Reconstruction happens in <100ms—fast enough that it feels instant, cheap enough that it scales
When you talk to Atlas about your startup strategy, here's what actually happens:
- Your request hits any available API server (stateless routing)
- Server pulls
session:{uuid}from Redis (5-10ms) - If cache miss, reconstructs from Postgres (50-80ms, rare)
- Loads your User Dossier—your goals, constraints, communication style
- Activates relevant Skills via SKILL.md modules (marketing, finance, strategy)
- Passes enriched context to with the Deterministic Backbone
- Atlas responds with full awareness of your 47-message conversation history
- Critic Agent validates response quality before returning
- Session state updates to Redis (async, non-blocking)
- Postgres write queued for durability (also async)
Total overhead? Under 100ms. Total memory? Zero between requests.
The Architecture: Redis Hot, Postgres Cold
The Hot Layer: Redis for Speed
Redis is our working memory. Every active session lives here:
session:{uuid}:messages - Last 50 turns (circular buffer)
session:{uuid}:dossier - User context, goals, preferences
session:{uuid}:agents - Active agent assignments (Atlas, Cipher, Nova)
session:{uuid}:skills - Loaded expertise modules
session:{uuid}:actions - Extracted tasks awaiting execution
The magic number is 50 messages. Why? Because it covers 95% of meaningful conversation context while keeping memory footprint predictable. A typical session with full context weighs ~200KB in Redis—cheap enough to keep thousands hot simultaneously.
When you switch from talking strategy with Atlas to analyzing financials with Cipher, the system doesn't "remember" in the traditional sense. It reconstructs your conversation context, sees you were just discussing market positioning, and Cipher opens with relevant financial implications. It feels like memory. It's actually just really fast data retrieval.
The Cold Layer: Postgres for Truth
Postgres is our long-term memory and analytical engine. Everything gets written here:
- Complete conversation transcripts (searchable)
- Action Extraction results—every commitment, task, and decision
- Agent performance metrics for the Critic Agent
- User Dossier evolution over time
- A2A (Agent-to-Agent) delegation chains
This is where we get sophisticated. The User Dossier isn't static—it evolves. When you tell Nova about a new product direction, that updates your dossier. When you consistently ask Cipher to focus on runway, that preference persists. The system learns your communication style, your priorities, your constraints.
But here's the key: Postgres writes are async. Your conversation never waits for database commits. Redis serves the experience; Postgres ensures durability. If Redis evicts your session (rare, but possible), we reconstruct from Postgres in under 100ms. You never notice.
The Protocols: MCP, A2A, and Native Audio
This architecture enables something powerful: protocol diversity without complexity.
MCP: Tools Without Coupling
Model Context Protocol lets our agents use external tools (calendar, email, project management) without tight coupling. When Atlas needs to check your availability, it's an MCP call to your calendar. When Cipher needs market data, MCP to your data sources.
Because session state is reconstructed every turn, tool results get naturally incorporated into context. No special state management. No complex callbacks. Just: load session, call tool, update session, respond.
A2A: Agent Delegation
The really interesting part? Agent-to-Agent protocol for delegation. When you ask Atlas a complex question spanning strategy and execution, Atlas can delegate to Cipher (financial analysis) and Nova (technical feasibility) within a single conversation turn.
The stateless architecture makes this trivial:
- Atlas receives your question, sees it needs multi-agent input
- Spawns A2A requests to Cipher and Nova with current session context
- Each agent reconstructs state independently (parallel, fast)
- Results flow back to Atlas
- Atlas synthesizes and responds
- Single session update captures the entire multi-agent interaction
Traditional stateful architectures make this a nightmare of locks, transactions, and race conditions. With stateless reconstruction, it's just parallel data operations.
Native Audio: Voice Without Transcription
Here's where it gets wild. native audio input means voice conversations skip transcription entirely. You speak, The model processes audio directly, responses flow back.
But audio is big. Storing raw audio in session state would kill our architecture. Instead:
- Audio streams directly to (no intermediate storage)
- Only the semantic extraction gets added to session state
- User Dossier captures communication patterns from audio (pace, emphasis, emotion)
- Redis stays lean, Postgres gets rich conversational metadata
Voice feels continuous and natural, but architecturally, it's the same stateless reconstruction pattern.
The Intelligence Layer: Deterministic Backbone + Contextual State
Google ADK's Deterministic Backbone solves the "AI is unpredictable" problem. But determinism without context is just consistent mediocrity.
The breakthrough is combining them:
- Deterministic Backbone ensures reliability—Atlas doesn't hallucinate, Cipher doesn't make math errors
- Reconstructed state provides depth—every response draws on complete conversation context
- User Dossier adds personalization—responses match your communication style and goals
- Critic Agent validates quality—before any response ships, it's checked for relevance and accuracy
This is why conversations with the AI Board Room feel different. It's not just that the AI is smart—it's that it's smart about you and your situation, consistently, every single time.
The Outcome: Action Extraction and Task Bridges
Here's the payoff. All this architecture enables one critical feature: Action Extraction.
As you talk with your AI Board Room, the system identifies:
- Commitments you make ("I'll draft the pitch deck by Friday")
- Decisions you reach ("We're targeting B2B first")
- Tasks that emerge ("Need to research competitor pricing")
- Insights worth capturing ("Customer segment X values speed over features")
These get extracted into structured data, stored in both Redis (immediate access) and Postgres (long-term tracking). They become the bridge between conversation and execution.
Next time you open the AI Board Room, your session reconstructs with awareness of open actions. Atlas might open with: "You mentioned drafting that pitch deck—want to work on positioning?" Not because it's "remembering" in some mystical way, but because Action Extraction + session reconstruction makes context retrieval instant and automatic.
Why This Matters for Solo Founders
If you're building solo or with a small team, you can't afford enterprise complexity. You also can't afford amnesia AI that forgets your context.
This architecture gives you:
- Scalability without ops burden—stateless APIs scale horizontally, no session management
- Rich context without cost explosion—reconstruct on-demand, pay only for active conversations
- Personalization without privacy nightmares—your dossier stays in your session, isolated and secure
- Reliability without rigidity—deterministic responses that adapt to your evolving needs
You get the AI Board Room experience—Atlas, Cipher, Nova, and the team—that actually remembers your business, your goals, and your constraints. Not because of some architectural magic trick, but because we treat state as data and reconstruct it blazingly fast.
The Technical Bet: Context Reconstruction is Cheap, Stateful Servers are Expensive
We made a bet: context reconstruction would get faster and cheaper than stateful server management.
We were right. Redis operations are single-digit milliseconds. Postgres queries on indexed session data run in 50-80ms. Network latency dominates over computation. Meanwhile, managing stateful servers—session affinity, graceful shutdowns, state migration—consumes engineering time that could build features.
As models get better at processing context efficiently (and they are, rapidly), reconstruction costs drop. As infrastructure gets faster (and it does, continuously), latency shrinks. The economics favor stateless reconstruction over stateful servers, and the gap widens every quarter.
Call to Action: Experience Stateful Intelligence, Stateless Scale
This isn't just an architecture article. It's an invitation.
The AI Board Room at JobInterview.live is live, running this exact architecture. Talk to Atlas about your strategy. Work with Cipher on your financials. Brainstorm with Nova on product direction.
Notice how they remember your context. Notice how responses feel personalized. Notice how conversations flow naturally across sessions.
Then remember: it's all stateless reconstruction, Redis hot cache, and Postgres persistence. The best of both worlds, at scale, for founders who need AI that actually works.
Try the AI Board Room today. Your first conversation is free. Your context, always preserved.
Want to go deeper on AI architecture for startups? Follow JobInterview.live for weekly deep dives on building AI products that scale.