Skip to content
Back to Blog
AI EngineeringComparisonDiagramsCode

Vector Databases Compared: pgvector vs Pinecone vs Qdrant

A practical benchmark of three vector databases across latency, recall, cost, and operational complexity — tested with real production workloads, not synthetic benchmarks.

July 22, 202511 min read
Vector DBpgvectorPineconeBenchmarks

Why This Comparison Exists

Choosing a vector database is one of those decisions that feels reversible but is not. Migration costs are real — re-indexing millions of embeddings, rewriting query logic, updating infrastructure. At TwilightCore, we have deployed all three of these solutions across different projects, and the right choice depends entirely on your constraints.

This is not a feature checklist. It is an honest account of what we have experienced running these systems in production with real workloads.

The Contenders

pgvector is a PostgreSQL extension that adds vector similarity search to your existing Postgres database. It is the "no new infrastructure" option.

Pinecone is a fully managed vector database service. You get an API, a dashboard, and zero operational burden.

Qdrant is an open-source vector search engine written in Rust. You can self-host it or use their managed cloud offering.

Benchmark Results

We ran benchmarks on a standardized workload: 1 million vectors at 1536 dimensions (OpenAI text-embedding-3-small output size), measured on comparable hardware.

Metricpgvector (HNSW)Pinecone (s1)Qdrant (self-hosted)
Index build time47 minN/A (managed)12 min
Query latency P508ms22ms4ms
Query latency P9945ms89ms18ms
Recall@10 (ef=128)0.970.980.99
Memory usage4.2 GBN/A3.1 GB
Throughput (QPS)8504002,200
Filtered query P5015ms28ms6ms

Benchmarks Are Context-Dependent

These numbers reflect our specific workload, hardware, and tuning. Pinecone's latency includes network round-trip since it is a remote service — for a fair comparison of raw engine performance, subtract ~15ms. Your results will vary based on dimensionality, index parameters, and query patterns.

pgvector: When Your Database Is Enough

Setup

pgvector_setup.sql
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;
 
-- Create a table with a vector column
CREATE TABLE documents (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT NOT NULL,
    embedding vector(1536),
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT now()
);
 
-- HNSW index — this is where the magic happens
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
 
-- Composite index for filtered queries
CREATE INDEX ON documents (created_at);
CREATE INDEX ON documents USING gin (metadata jsonb_path_ops);
 
-- Query with metadata filtering
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata @> '{"category": "engineering"}'
  AND created_at > now() - INTERVAL '30 days'
ORDER BY embedding <=> $1::vector
LIMIT 10;

Strengths

pgvector's killer advantage is no new infrastructure. If you already run Postgres — and you almost certainly do — you can add vector search without introducing a new database, new backups, new monitoring, or new failure modes. Your vectors live alongside your relational data, which means joins are trivial and transactional consistency is free.

Where It Struggles

pgvector starts to strain above 5-10 million vectors on a single instance. HNSW index builds are slow and memory-intensive. There is no built-in sharding — if you outgrow a single machine, you are on your own with Citus or manual partitioning.

Pinecone: Maximum Convenience

Setup

pinecone_setup.py
from pinecone import Pinecone, ServerlessSpec
 
pc = Pinecone(api_key="your-api-key")
 
# Create a serverless index
pc.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
 
index = pc.Index("documents")
 
# Upsert vectors with metadata
index.upsert(
    vectors=[
        {
            "id": "doc-001",
            "values": embedding,
            "metadata": {
                "category": "engineering",
                "author": "twilightcore-team",
                "word_count": 1500,
            },
        }
    ],
    namespace="blog-posts",
)
 
# Query with metadata filtering
results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={
        "category": {"$eq": "engineering"},
        "word_count": {"$gte": 500},
    },
    include_metadata=True,
    namespace="blog-posts",
)

Strengths

Pinecone is genuinely zero-ops. No capacity planning, no index tuning, no infrastructure management. Their serverless offering scales automatically and you pay per query. For teams without dedicated infrastructure engineers, this is significant.

Namespaces are elegant for multi-tenancy — each customer's vectors are isolated without managing separate indexes.

Where It Struggles

Cost scales faster than alternatives. At 10 million vectors with moderate query volume, we were paying roughly 4x what equivalent self-hosted Qdrant cost us. Network latency is unavoidable — every query is an HTTP round-trip. And vendor lock-in is real; there is no standard vector database migration format.

Qdrant: The Performance Champion

Strengths

Qdrant consistently delivers the best raw performance in our benchmarks. Its Rust implementation and HNSW optimizations produce remarkable throughput. The filtering system is particularly strong — it applies filters during the HNSW traversal rather than post-query, which means filtered queries are nearly as fast as unfiltered ones.

The API is well-designed, with first-class support for batch operations, named vectors (multiple vector types per point), and payload indexing.

Where It Struggles

Self-hosting means you own the operational burden: backups, monitoring, scaling, upgrades. Their managed cloud offering mitigates this but is still newer and less battle-tested than Pinecone's. Documentation, while improving, has gaps in advanced clustering configurations.

Cost Analysis at Scale

Cost is often the deciding factor. Here is what we have seen at the 5 million vector scale with moderate query loads (roughly 100 QPS average).

Cost ComponentpgvectorPinecone ServerlessQdrant (self-hosted)Qdrant Cloud
Compute/hosting$0 (existing DB)*Included~$200/mo (dedicated)~$150/mo
Storage~$10/mo~$75/mo~$15/mo~$40/mo
Query costs$0~$120/mo at 100 QPS$0Included
Operational overheadLowNoneMedium-HighLow
Estimated monthly total~$10~$195~$215~$190

*pgvector's compute cost is effectively zero if your Postgres instance has headroom. If you need to upgrade your instance to handle vector workloads, factor in that cost.

The Hidden Cost: Engineering Time

These numbers do not capture engineering time. Setting up monitoring, writing migration scripts, debugging index performance — these hours add up. Pinecone's higher dollar cost often pays for itself in reduced engineering overhead, especially for smaller teams.

Decision Framework

After deploying all three, we have settled on a simple framework:

Choose pgvector when:

  • You have fewer than 5 million vectors
  • Your queries always combine vector search with relational filters
  • You want zero additional infrastructure
  • Latency under 50ms is acceptable

Choose Pinecone when:

  • Your team has no dedicated infrastructure capacity
  • You need multi-tenant isolation (namespaces)
  • You are willing to pay more for operational simplicity
  • You need to scale past 10 million vectors without planning

Choose Qdrant when:

  • Raw query performance is critical
  • You have the engineering capacity for self-hosting (or use their cloud)
  • You need advanced features like named vectors or complex filtering
  • Cost at scale is a primary concern

Migration Strategies

When you do need to migrate — and we have done this twice — the process is:

  1. Dual-write first. Write new vectors to both old and new systems. Do not try a big-bang migration.
  2. Backfill in batches. Re-embed and insert historical data in chunks of 10,000. Rate-limit to avoid overwhelming either system.
  3. Shadow-read. Query both systems and compare results. Log discrepancies. We ran shadow reads for two weeks before cutting over.
  4. Gradual cutover. Route 10% of read traffic to the new system, then 50%, then 100%. Monitor recall and latency at each stage.
  5. Decommission after a soak period. Keep the old system running for 30 days after full cutover, just in case.

What We Use Today

For most projects, we start with pgvector. It eliminates an entire category of operational complexity, and for datasets under a few million vectors with moderate query loads, it performs admirably. When a project outgrows it — and we have a clear signal that it has, not a premature optimization hunch — we migrate to Qdrant.

We reserve Pinecone for client projects where the client's team will own operations after handoff and does not have infrastructure expertise. The managed experience is worth the premium in those cases.

Start Boring, Scale Intentionally

The best vector database is the one that matches your current scale and team capacity. pgvector is not glamorous, but it has saved us from managing additional infrastructure on a dozen projects. When you genuinely need more — and benchmarks on your actual workload prove it — migrate deliberately with dual-writes and shadow reads. The vector database landscape is evolving fast; avoid locking in prematurely.

TC

TwilightCore Team

AI & Digital Studio

We build production AI systems and full-stack applications. Writing about the technical decisions, architecture patterns, and engineering practices behind real-world projects.