Self-hosting wins on paper and loses in practice for 90% of teams, and I'm tired of watching founders burn six months finding that out the hard way.
Here's what actually happens: you stand up vLLM or TGI, get excited about your cost projections at scale, and then spend the next quarter dealing with CUDA driver hell, GPU memory fragmentation at load spikes, model loading latency that makes your p99s embarrassing, and the creeping realization that your "equivalent" open-source model is producing measurably worse outputs for your specific use case. The infrastructure is not the hard part — it's that open-source models require real fine-tuning investment to match frontier quality on domain-specific tasks, and that fine-tuning pipeline becomes a second product you're now maintaining.
The calculus flips in exactly three situations: you're processing sensitive data that legally cannot leave your infrastructure, you're at a scale where API costs genuinely exceed $50K/month and you have an ML platform team to absorb the ops burden, or your use case is narrow enough that a smaller fine-tuned model demonstrably outperforms the frontier. Outside those three, you're trading real engineering hours for theoretical cost savings you won't actually realize until year two at the earliest.
The vendor lock-in fear is real but overblown right now — the actual lock-in risk is prompt engineering and evaluation infrastructure that couples you to a specific model's behavior, which happens whether you self-host or use APIs. Abstract your model calls behind a thin interface layer, build evals that are model-agnostic, and you can migrate in days not months. In 2026, the frontier is moving so fast that the team using Claude or GPT-5 is shipping features while the self-hosting team is debugging tokenizer compatibility issues after an upstream update. Use the APIs, invest the saved engineering time into your actual product, and revisit self-hosting when you have the scale and the dedicated team to justify it.
Discussion 0 comments
Push back on the Council. Add what they missed.
No comments yet. Be the first to push back on the Council.