Skip to content
LAB — EXPERIMENTS

Experiments & Research

Where we push boundaries with LLMs, multi-agent systems, RAG architectures, and computer vision. Not every experiment ships — but every one teaches.

10Total
8Live
2Research
All Experiments
LLM
Live

Local LLM Code Reviewer

Automated code review powered by locally-hosted LLMs. Privacy-first — zero data leaves your machine.

Models
Qwen 2.5 72BLlama 3.3 70BCodeLlama
Results
85% bug detection rate2.9s avg response0 API cost
OllamaPythonASTFastAPITree-sitter
LLMExplore
Agent
Live

Voice-to-SQL Agent

Speak natural language queries and get SQL results instantly. Function-calling agents translate intent to queries.

Models
GPT-4oWhisper Large-v3
Results
94% query accuracy1.2s voice-to-resultSchema-aware validation
WhisperFastAPISQLiteReactLangChain
AgentExplore
RAG
Live

RAG Evaluation Harness

Automated benchmark suite comparing chunking strategies, embedding models, and retrieval methods across 12 metrics.

Models
Claude SonnetGPT-4oall-MiniLM-L6
Results
12 eval metrics6 chunking strategies4 embedding models compared
PythonDeepEvalRAGASpgvectorStreamlit
RAGExplore
Agent
Live

Multi-Agent Debate System

Three AI agents argue opposing perspectives on any topic, then a judge agent synthesizes the strongest arguments.

Models
Claude SonnetGPT-4oGemini 2.5 Pro
Results
3 debate roundsReal-time streamingCross-model comparison
LangGraphNext.jsRedisWebSockets
AgentExplore
Vision
Live

Document Vision Parser

Extract structured data from receipts, invoices, and forms using vision models — no OCR templates required.

Models
GPT-4o VisionClaude Vision
Results
96% field accuracy1.8s processingZero template setup
GPT-4o VisionFastAPIPydanticReactCanvas API
VisionExplore
Agent
Research

Agent Memory Architecture

Persistent memory layer for AI agents — episodic, semantic, and procedural memory with forgetting curves.

Models
Claude Sonnetall-MiniLM-L6
Results
3 memory typesEbbinghaus forgetting curvesGraph-based retrieval
PythonpgvectorNetworkXLangGraphNeo4j
AgentExplore
LLM
Live

Real-time Sentiment Stream

Live sentiment analysis on streaming text — Twitter feeds, chat messages, or support tickets with sub-100ms latency.

Models
DistilBERT (fine-tuned)RoBERTa
Results
94% accuracy<100ms latency10k msgs/sec throughput
DistilBERTKafkaFastAPINext.jsClickHouse
LLMExplore
RAG
Live

PDF → Knowledge Graph

Upload any PDF and watch it transform into an interactive knowledge graph with entity relationships and citations.

Models
Claude SonnetGPT-4o
Results
Entity extraction F1: 0.89Interactive graph vizCitation linking
Neo4jLlamaParseD3.jsNext.jsFastAPI
RAGExplore
Automation
Live

Browser Automation Agent

Give it a goal in plain English — it navigates websites, fills forms, extracts data, and completes tasks autonomously.

Models
GPT-4o VisionClaude Sonnet
Results
78% task completionMulti-step navigationVisual DOM understanding
PlaywrightGPT-4o VisionPythonLangGraphRedis
AutomationExplore
LLM
Research

Prompt Optimization Lab

Automated prompt engineering — evolves prompts using genetic algorithms and A/B testing against eval suites.

Models
Claude SonnetGPT-4oGemini 2.5 Pro
Results
15% avg accuracy liftGenetic algorithm evolutionStatistical significance testing
PythonDSPyDeepEvalStreamlitRedis
LLMExplore