Skip to content
Back to Blog
CareerDeep Dive

From Prototype to Production: What Nobody Tells You

The unglamorous work between "it works on my machine" and "it handles 10k users" — error budgets, observability, load testing, feature flags, and the mindset shift from builder to operator.

October 8, 20248 min read
ProductionEngineeringCareerBest Practices

The Demo Worked Perfectly

Every prototype works. That's the trap. The demo runs on localhost, the database has twelve rows, the only user is you, and the network is a loopback interface. Then someone says "ship it," and you discover that the distance between a working demo and a production system is measured in sleepless nights.

At TwilightCore, we've crossed that gap enough times to map the terrain. Here's what we wish someone had told us before our first production deploy.

Error Handling Is an Architecture Decision

The prototype has try/catch blocks that log to the console. Production needs a coherent error handling strategy that spans the entire stack. This is where most teams accumulate tech debt fastest, because error handling feels like cleanup work rather than design work.

We use a Result type pattern inspired by Rust. Every function that can fail returns either a success or a typed error — no thrown exceptions crossing module boundaries.

lib/result.ts
type Success<T> = { ok: true; value: T };
type Failure<E> = { ok: false; error: E };
type Result<T, E = AppError> = Success<T> | Failure<E>;
 
function ok<T>(value: T): Success<T> {
  return { ok: true, value };
}
 
function err<E>(error: E): Failure<E> {
  return { ok: false, error };
}
 
// Domain errors are explicit, not stringly typed
type AppError =
  | { code: "NOT_FOUND"; resource: string; id: string }
  | { code: "VALIDATION"; field: string; message: string }
  | { code: "UNAUTHORIZED"; reason: string }
  | { code: "RATE_LIMITED"; retryAfter: number }
  | { code: "UPSTREAM_FAILURE"; service: string; statusCode: number };
 
export { ok, err, type Result, type AppError };
services/project.ts
async function getProject(
  id: string,
  userId: string
): Promise<Result<Project>> {
  const project = await db.query.projects.findFirst({
    where: eq(schema.projects.id, id),
  });
 
  if (!project) {
    return err({ code: "NOT_FOUND", resource: "project", id });
  }
 
  if (project.organizationId !== userId) {
    return err({
      code: "UNAUTHORIZED",
      reason: "Project belongs to another organization",
    });
  }
 
  return ok(project);
}

This pattern makes error states visible in the type system. You can't forget to handle them — the compiler won't let you access .value without checking .ok first.

Exceptions still happen

The Result pattern handles expected failures. Unexpected failures — out of memory, network partitions, corrupted state — still throw. We wrap top-level handlers in try/catch as a safety net, but the goal is to make those catches fire rarely and loudly.

Database Migrations Will Break Your Confidence

The prototype uses db push to sync schema changes. Production uses migrations, and migrations are where optimism goes to die.

The Rules We Learned the Hard Way

  1. Never rename a column in one step. Add the new column, backfill data, deploy code that reads both, drop the old column. Three deploys minimum.
  2. Every migration must be reversible. If you can't write the down migration, you don't understand the up well enough.
  3. Test migrations against a production-sized dataset. A migration that runs in 2ms on 100 rows can lock a table for 40 minutes on 2 million rows.
migrations/0024_add_project_slug.sql
-- Step 1: Add nullable column (safe, no lock)
ALTER TABLE projects ADD COLUMN slug TEXT;
 
-- Step 2: Backfill in batches (avoid long-running transactions)
-- Run this as a script, NOT in the migration file
-- UPDATE projects SET slug = generate_slug(title)
-- WHERE slug IS NULL
-- LIMIT 1000;
 
-- Step 3: After backfill is verified complete
-- ALTER TABLE projects ALTER COLUMN slug SET NOT NULL;
-- CREATE UNIQUE INDEX CONCURRENTLY idx_projects_slug ON projects(slug);
Migration Risk LevelExampleStrategy
LowAdd nullable columnSingle migration, deploy freely
MediumAdd index on large tableUse CONCURRENTLY, run off-peak
HighRename columnMulti-step expand/contract pattern
CriticalChange column typeNew column + backfill + swap reads
AvoidDrop column with no backupJust don't. Add a deprecated_at flag

Monitoring Is Not Logging

The prototype has console.log. Production needs observability — and those are fundamentally different things.

Logging tells you what happened. Monitoring tells you what's happening. Alerting tells you what needs attention. Most teams conflate all three and end up with noisy logs that nobody reads and alerts that everyone ignores.

Our Observability Stack

We settled on a three-layer approach:

  • Structured logging with correlation IDs that trace a request across services
  • Metrics for business-critical numbers (response times, error rates, queue depths)
  • Alerting only on actionable conditions — if the alert doesn't have a runbook, it shouldn't exist
lib/logger.ts
import { createId } from "@paralleldrive/cuid2";
 
type LogContext = {
  requestId: string;
  userId?: string;
  service: string;
  [key: string]: unknown;
};
 
function createLogger(baseContext: Partial<LogContext>) {
  const context: LogContext = {
    requestId: baseContext.requestId ?? createId(),
    service: baseContext.service ?? "unknown",
    ...baseContext,
  };
 
  return {
    info(message: string, data?: Record<string, unknown>) {
      emit("info", message, { ...context, ...data });
    },
    warn(message: string, data?: Record<string, unknown>) {
      emit("warn", message, { ...context, ...data });
    },
    error(message: string, error?: Error, data?: Record<string, unknown>) {
      emit("error", message, {
        ...context,
        ...data,
        error: error
          ? { message: error.message, stack: error.stack, name: error.name }
          : undefined,
      });
    },
  };
}
 
function emit(level: string, message: string, data: Record<string, unknown>) {
  const entry = JSON.stringify({
    timestamp: new Date().toISOString(),
    level,
    message,
    ...data,
  });
  process.stdout.write(entry + "\n");
}

Correlation IDs are non-negotiable

Without a correlation ID threading through every log entry for a single request, debugging production issues becomes archaeology. Generate the ID at the edge (middleware or API gateway) and propagate it through every function call, database query, and external service request.

Security Hardening Isn't a Checklist

The prototype trusts all input. Production trusts nothing. The gap between those two postures is where breaches live.

The Minimum Viable Security We Ship With

  • Input validation at the boundary. Every API endpoint validates with Zod schemas. No raw req.body access anywhere.
  • Rate limiting per endpoint, not globally. Login gets 5 attempts per minute. Search gets 30. File upload gets 3.
  • CSRF tokens on every state-changing request. Even API routes behind authentication.
  • Content Security Policy headers. Strict enough to block inline scripts, permissive enough to not break the app.
  • Dependency auditing in CI. npm audit runs on every PR. Critical vulnerabilities block merge.

Performance Optimization: Measure First, Always

The prototype is fast because it's empty. Production is slow because it's real. The instinct is to optimize everything — resist it. Profile first, optimize the actual bottleneck, measure again.

The Three Wins That Matter Most

After profiling dozens of production apps, the same three issues account for roughly 80% of performance problems we see:

  1. N+1 queries. The ORM fetches a list, then fetches related data one row at a time. Fix with eager loading or dataloaders.
  2. Missing database indexes. A query plan scan on a 500K row table takes 400ms. Adding the right index drops it to 2ms.
  3. Unoptimized images and fonts. The browser downloads 4MB of assets before rendering anything useful. Fix with proper formats, lazy loading, and font subsetting.

Everything else — memoization, code splitting, edge caching — is optimization. These three are bug fixes.

Team Handoffs Are a System Design Problem

The prototype was built by one person who holds the entire system in their head. Production is maintained by a team, and the system must be legible to people who didn't build it.

What We Include in Every Project Handoff

  • Architecture Decision Records (ADRs) for every non-obvious choice. Not what we built, but why we built it that way and what alternatives we rejected.
  • A RUNBOOK.md covering common operational tasks: how to deploy, how to rollback, how to restart services, how to investigate common alerts.
  • Seed scripts that create a realistic local development environment in one command. Not 12 rows — thousands, with realistic relationships and edge cases.

The Production Readiness Checklist

After shipping enough projects, we formalized what "production ready" means at TwilightCore. A project doesn't ship until every item on this list has an owner and a status.

CategoryItemStatus Required
ReliabilityError handling covers all API boundariesComplete
ReliabilityHealth check endpoint exists and is monitoredComplete
DataMigration strategy documented and testedComplete
DataBackup and restore procedure verifiedComplete
SecurityInput validation on all endpointsComplete
SecurityRate limiting configured per routeComplete
PerformanceCore Web Vitals passing on key pagesComplete
PerformanceDatabase queries profiled under loadComplete
OperationsStructured logging with correlation IDsComplete
OperationsAlerting configured with runbooksComplete
TeamArchitecture decisions documentedComplete
TeamLocal dev setup runs in under 5 minutesComplete

The Real Gap

The distance from prototype to production isn't technical complexity — it's operational maturity. A prototype proves an idea works. Production proves it works reliably, securely, and maintainably, at scale, over time, when you're asleep.

Production readiness is a practice, not a phase

Treat production concerns as first-class requirements from day one, not as a hardening phase before launch. Error handling, monitoring, security, and documentation aren't polish — they're load-bearing structure. The teams that ship reliably are the ones that build these into their workflow, not the ones that bolt them on at the end.

TC

TwilightCore Team

AI & Digital Studio

We build production AI systems and full-stack applications. Writing about the technical decisions, architecture patterns, and engineering practices behind real-world projects.