From Prototype to Production: What Nobody Tells You

The Demo Worked Perfectly

Every prototype works. That's the trap. The demo runs on localhost, the database has twelve rows, the only user is you, and the network is a loopback interface. Then someone says "ship it," and you discover that the distance between a working demo and a production system is measured in sleepless nights.

At TwilightCore, we've crossed that gap enough times to map the terrain. Here's what we wish someone had told us before our first production deploy.

Error Handling Is an Architecture Decision

The prototype has try/catch blocks that log to the console. Production needs a coherent error handling strategy that spans the entire stack. This is where most teams accumulate tech debt fastest, because error handling feels like cleanup work rather than design work.

We use a Result type pattern inspired by Rust. Every function that can fail returns either a success or a typed error — no thrown exceptions crossing module boundaries.

lib/result.ts

type Success<T> = { ok: true; value: T };
type Failure<E> = { ok: false; error: E };
type Result<T, E = AppError> = Success<T> | Failure<E>;
 
function ok<T>(value: T): Success<T> {
  return { ok: true, value };
}
 
function err<E>(error: E): Failure<E> {
  return { ok: false, error };
}
 
// Domain errors are explicit, not stringly typed
type AppError =
  | { code: "NOT_FOUND"; resource: string; id: string }
  | { code: "VALIDATION"; field: string; message: string }
  | { code: "UNAUTHORIZED"; reason: string }
  | { code: "RATE_LIMITED"; retryAfter: number }
  | { code: "UPSTREAM_FAILURE"; service: string; statusCode: number };
 
export { ok, err, type Result, type AppError };

services/project.ts

async function getProject(
  id: string,
  userId: string
): Promise<Result<Project>> {
  const project = await db.query.projects.findFirst({
    where: eq(schema.projects.id, id),
  });
 
  if (!project) {
    return err({ code: "NOT_FOUND", resource: "project", id });
  }
 
  if (project.organizationId !== userId) {
    return err({
      code: "UNAUTHORIZED",
      reason: "Project belongs to another organization",
    });
  }
 
  return ok(project);
}

This pattern makes error states visible in the type system. You can't forget to handle them — the compiler won't let you access .value without checking .ok first.

Exceptions still happen

The Result pattern handles expected failures. Unexpected failures — out of memory, network partitions, corrupted state — still throw. We wrap top-level handlers in try/catch as a safety net, but the goal is to make those catches fire rarely and loudly.

Database Migrations Will Break Your Confidence

The prototype uses db push to sync schema changes. Production uses migrations, and migrations are where optimism goes to die.

The Rules We Learned the Hard Way

Never rename a column in one step. Add the new column, backfill data, deploy code that reads both, drop the old column. Three deploys minimum.
Every migration must be reversible. If you can't write the down migration, you don't understand the up well enough.
Test migrations against a production-sized dataset. A migration that runs in 2ms on 100 rows can lock a table for 40 minutes on 2 million rows.

migrations/0024_add_project_slug.sql

-- Step 1: Add nullable column (safe, no lock)
ALTER TABLE projects ADD COLUMN slug TEXT;
 
-- Step 2: Backfill in batches (avoid long-running transactions)
-- Run this as a script, NOT in the migration file
-- UPDATE projects SET slug = generate_slug(title)
-- WHERE slug IS NULL
-- LIMIT 1000;
 
-- Step 3: After backfill is verified complete
-- ALTER TABLE projects ALTER COLUMN slug SET NOT NULL;
-- CREATE UNIQUE INDEX CONCURRENTLY idx_projects_slug ON projects(slug);

Migration Risk Level	Example	Strategy
Low	Add nullable column	Single migration, deploy freely
Medium	Add index on large table	Use `CONCURRENTLY`, run off-peak
High	Rename column	Multi-step expand/contract pattern
Critical	Change column type	New column + backfill + swap reads
Avoid	Drop column with no backup	Just don't. Add a `deprecated_at` flag

Monitoring Is Not Logging

The prototype has console.log. Production needs observability — and those are fundamentally different things.

Logging tells you what happened. Monitoring tells you what's happening. Alerting tells you what needs attention. Most teams conflate all three and end up with noisy logs that nobody reads and alerts that everyone ignores.

Our Observability Stack

We settled on a three-layer approach:

Structured logging with correlation IDs that trace a request across services
Metrics for business-critical numbers (response times, error rates, queue depths)
Alerting only on actionable conditions — if the alert doesn't have a runbook, it shouldn't exist

lib/logger.ts

import { createId } from "@paralleldrive/cuid2";
 
type LogContext = {
  requestId: string;
  userId?: string;
  service: string;
  [key: string]: unknown;
};
 
function createLogger(baseContext: Partial<LogContext>) {
  const context: LogContext = {
    requestId: baseContext.requestId ?? createId(),
    service: baseContext.service ?? "unknown",
    ...baseContext,
  };
 
  return {
    info(message: string, data?: Record<string, unknown>) {
      emit("info", message, { ...context, ...data });
    },
    warn(message: string, data?: Record<string, unknown>) {
      emit("warn", message, { ...context, ...data });
    },
    error(message: string, error?: Error, data?: Record<string, unknown>) {
      emit("error", message, {
        ...context,
        ...data,
        error: error
          ? { message: error.message, stack: error.stack, name: error.name }
          : undefined,
      });
    },
  };
}
 
function emit(level: string, message: string, data: Record<string, unknown>) {
  const entry = JSON.stringify({
    timestamp: new Date().toISOString(),
    level,
    message,
    ...data,
  });
  process.stdout.write(entry + "\n");
}

Correlation IDs are non-negotiable

Without a correlation ID threading through every log entry for a single request, debugging production issues becomes archaeology. Generate the ID at the edge (middleware or API gateway) and propagate it through every function call, database query, and external service request.

Security Hardening Isn't a Checklist

The prototype trusts all input. Production trusts nothing. The gap between those two postures is where breaches live.

The Minimum Viable Security We Ship With

Input validation at the boundary. Every API endpoint validates with Zod schemas. No raw req.body access anywhere.
Rate limiting per endpoint, not globally. Login gets 5 attempts per minute. Search gets 30. File upload gets 3.
CSRF tokens on every state-changing request. Even API routes behind authentication.
Content Security Policy headers. Strict enough to block inline scripts, permissive enough to not break the app.
Dependency auditing in CI. npm audit runs on every PR. Critical vulnerabilities block merge.

Performance Optimization: Measure First, Always

The prototype is fast because it's empty. Production is slow because it's real. The instinct is to optimize everything — resist it. Profile first, optimize the actual bottleneck, measure again.

The Three Wins That Matter Most

After profiling dozens of production apps, the same three issues account for roughly 80% of performance problems we see:

N+1 queries. The ORM fetches a list, then fetches related data one row at a time. Fix with eager loading or dataloaders.
Missing database indexes. A query plan scan on a 500K row table takes 400ms. Adding the right index drops it to 2ms.
Unoptimized images and fonts. The browser downloads 4MB of assets before rendering anything useful. Fix with proper formats, lazy loading, and font subsetting.

Everything else — memoization, code splitting, edge caching — is optimization. These three are bug fixes.

Team Handoffs Are a System Design Problem

The prototype was built by one person who holds the entire system in their head. Production is maintained by a team, and the system must be legible to people who didn't build it.

What We Include in Every Project Handoff

Architecture Decision Records (ADRs) for every non-obvious choice. Not what we built, but why we built it that way and what alternatives we rejected.
A RUNBOOK.md covering common operational tasks: how to deploy, how to rollback, how to restart services, how to investigate common alerts.
Seed scripts that create a realistic local development environment in one command. Not 12 rows — thousands, with realistic relationships and edge cases.

The Production Readiness Checklist

After shipping enough projects, we formalized what "production ready" means at TwilightCore. A project doesn't ship until every item on this list has an owner and a status.

Category	Item	Status Required
Reliability	Error handling covers all API boundaries	Complete
Reliability	Health check endpoint exists and is monitored	Complete
Data	Migration strategy documented and tested	Complete
Data	Backup and restore procedure verified	Complete
Security	Input validation on all endpoints	Complete
Security	Rate limiting configured per route	Complete
Performance	Core Web Vitals passing on key pages	Complete
Performance	Database queries profiled under load	Complete
Operations	Structured logging with correlation IDs	Complete
Operations	Alerting configured with runbooks	Complete
Team	Architecture decisions documented	Complete
Team	Local dev setup runs in under 5 minutes	Complete

The Real Gap

The distance from prototype to production isn't technical complexity — it's operational maturity. A prototype proves an idea works. Production proves it works reliably, securely, and maintainably, at scale, over time, when you're asleep.

Production readiness is a practice, not a phase

Treat production concerns as first-class requirements from day one, not as a hardening phase before launch. Error handling, monitoring, security, and documentation aren't polish — they're load-bearing structure. The teams that ship reliably are the ones that build these into their workflow, not the ones that bolt them on at the end.