Get started
Published 04.30.2026

Best LiteLLM Alternatives for Production LLM Routing (2026)

By Kylan Gibbs, CEO and Co-founder, Inworld AI
Last updated: April 2026
LiteLLM is an open-source Python proxy that provides a unified API for 100+ LLM providers. It became the default routing layer for many AI teams because it was free, flexible, and easy to set up. In March 2026, a supply chain attack compromised LiteLLM versions 1.82.7 and 1.82.8 on PyPI, injecting credential-stealing malware that harvested SSH keys, cloud credentials, and Kubernetes secrets from affected machines. LiteLLM has since released a clean version (v1.83.0) and engaged Mandiant for a forensic review. Inworld AI's Realtime Router is one of the managed alternatives many teams have evaluated since: fully managed infrastructure with no self-hosted attack surface, plus conditional routing, native A/B testing, and direct integration with a full voice pipeline.
The incident accelerated a question many production teams were already asking: should you self-host your LLM routing layer, or use managed infrastructure where supply chain security is handled for you?
This guide compares five LiteLLM alternatives for teams evaluating their options in 2026, with attention to security posture, routing intelligence, and production readiness.
Note on facts in this article: dates, version numbers, attribution to TeamPCP, Mandiant engagement, FutureSearch researcher discovery, and the specific URLs cited above are based on public security disclosures. Verify each URL and date live before publishing. The security narrative is load-bearing for the article's argument, and a single error damages credibility.

Quick Comparison: LiteLLM Alternatives

PlatformTypeModelsRouting IntelligenceSelf-Hosted RiskBest For
Realtime RouterManaged intelligent router + gatewayHundredsConditional routing (CEL), A/B testing, user/context-awareNone (fully managed)Production teams needing intelligent routing, voice integration, and managed security
PortkeyAI gateway + LLMOpsLargest catalog with multimodalConditional, guardrails, compliance rulesOptional (cloud or self-hosted)Teams prioritizing compliance, governance, and observability
OpenRouterManaged marketplace proxyBroadest catalogAvailability-based; manual model selectionNone (fully managed)Developers prototyping across many models
Vercel AI GatewayManaged proxy + fallbackMajor frontier providersStatic fallback; uptime/latency-basedNone (Vercel-managed)Teams already deployed on Vercel
BifrostOpen-source gateway (Go)Multi-providerLoad balancing, adaptive failoverYes (self-hosted; same supply chain risk class as LiteLLM)Teams wanting open-source performance without Python limitations

What Happened to LiteLLM: The Supply Chain Attack

In March 2026, a hacking group published two malicious versions of LiteLLM (1.82.7 and 1.82.8) directly to PyPI, bypassing the project's normal GitHub release process. The compromised package contained a credential-stealing payload that harvested SSH keys, cloud provider credentials, Kubernetes configuration files, cryptocurrency wallets, and API keys.
The attack was discovered when a researcher's machine shut down after downloading the compromised package. A bug in the malware code caused the crash that led to its detection. LiteLLM displayed SOC2 and ISO 27001 certifications obtained through Delve, a startup that has been separately scrutinized for compliance attestation practices.
LiteLLM released a clean version (v1.83.0) with a rebuilt CI/CD pipeline featuring isolated environments and stronger security gates. The incident was contained relatively quickly, but it exposed a structural risk: any self-hosted open-source tool with PyPI distribution is vulnerable to this class of supply chain attack.

What the incident means for LLM routing architecture

The LiteLLM compromise was not about bad code. LiteLLM's routing logic worked as intended. The vulnerability was in distribution and dependency chain, a risk that applies to any self-hosted open-source infrastructure. Teams evaluating LLM routing alternatives should consider three factors the incident highlighted:
  • Distribution trust: self-hosted tools depend on package registries (PyPI, npm) where a compromised CI/CD pipeline can inject malware into legitimate packages. Managed services eliminate this vector by handling deployment internally.
  • Dependency surface area: LiteLLM had thousands of dependencies. Each dependency is an attack surface. Managed routing services control their own dependency chains and are accountable for securing them.
  • Credential exposure: self-hosted proxies hold API keys for every model provider in their configuration. A compromised proxy can harvest all of them simultaneously. Managed services store credentials in isolated, audited infrastructure.

1. Realtime Router: Intelligent Routing with Managed Security

Realtime Router is a managed LLM routing service that provides a single API endpoint for hundreds of models from OpenAI, Anthropic, Google, Mistral, DeepSeek, xAI, Meta, and other providers. Unlike LiteLLM's self-hosted proxy, Realtime Router runs as fully managed infrastructure with no package installation, no PyPI dependency, and no self-hosted attack surface.
Why teams choose Realtime Router over LiteLLM:
  • Managed infrastructure, no supply chain exposure. Nothing to install, no package registry dependency, no self-hosted CI/CD to compromise. Inworld handles deployment, scaling, and security internally.
  • Conditional routing with CEL expressions. Route requests based on user tier, query complexity, region, language, or any custom metadata. LiteLLM supports fallback chains and budget-based routing; Realtime Router supports business-logic routing that adapts per request.
  • Native A/B testing. Split traffic across models with sticky user assignment. Compare cost, latency, and output quality on live production traffic without deploying new code. LiteLLM has no built-in A/B testing.
  • Automatic failover with full observability. When a provider returns a 429, 5xx, or timeout, Realtime Router retries the next model in the chain and includes the full attempt chain in response metadata for debugging.
  • Voice pipeline integration. Uniquely among LLM routers, Realtime Router integrates with Realtime TTS (#1 on the Artificial Analysis Speech Arena, three of the top five), Realtime STT, and the Realtime API for end-to-end speech pipelines. Teams building voice agents, AI companions, or conversational applications can unify LLM routing and voice under one API.
  • Drop-in SDK compatibility. OpenAI and Anthropic SDK drop-in replacement. Migration from LiteLLM requires changing the base URL and API key.
Limitations:
  • Currently in Research Preview. Not yet at full enterprise SLA tier.
  • Not open source. Teams requiring auditable source code should evaluate Bifrost or Portkey's open-source gateway.
  • Focused on routing and model selection. Does not include prompt management or guardrails (Portkey is stronger here).
Production evidence: Inworld's infrastructure processes millions of real-time AI interactions daily across consumer apps, enterprise voice agents, and interactive media. Production deployments routinely show substantial cost reductions when intelligent model selection routes simple tasks to cost-effective models and complex tasks to frontier models.
Migration from LiteLLM: Realtime Router accepts standard OpenAI-compatible chat completion requests. Migration requires changing your base URL and API key; existing request structure stays the same.
# Before: LiteLLM
from openai import OpenAI
client = OpenAI(base_url="http://localhost:4000", api_key="...")

# After: Realtime Router
from openai import OpenAI
client = OpenAI(
    base_url="https://api.inworld.ai/v1",
    api_key="<your-inworld-api-key>"
)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Hello"}],
)

2. Portkey: Compliance-First Gateway with Governance

Portkey positions itself as an AI gateway with strong emphasis on observability, guardrails, and governance. It supports a large multimodal catalog and provides both cloud-managed and self-hosted deployment options.
Why teams choose Portkey over LiteLLM:
  • Guardrails and governance. Content filtering, PII detection, and compliance rules enforced at the gateway layer. Portkey's governance tooling is the strongest in this category for regulated industries.
  • Observability-first. Real-time dashboards tracking latency, cost, token usage, and error rates. Anomaly detection and proactive alerts.
  • Conditional routing. Route based on metadata, user attributes, or content policies.
  • Open-source core. The gateway layer is open source; enterprise features are paid. Teams can audit the routing logic.
  • Semantic caching. Reduces redundant API calls and lowers costs for applications with repetitive query patterns.
Limitations:
  • Governance-oriented, not optimization-oriented. Routing is rule-based and compliance-driven, not dynamically optimized for cost-per-quality or task-model matching.
  • No native A/B testing with sticky user assignment (basic traffic splitting only).
  • No voice pipeline integration. LLM routing only.

3. OpenRouter: Widest Model Catalog for Prototyping

OpenRouter is a managed marketplace proxy providing unified API access to a broad catalog of models from many providers. It requires no subscription and uses a credit-based billing system.
Why teams choose OpenRouter over LiteLLM:
  • Largest model catalog including open-weight models hosted by third-party providers.
  • Fully managed. No self-hosted infrastructure, no supply chain risk from package registries.
  • No subscription. Credit-based system with no monthly minimums. Low-friction entry for evaluation.
  • Automatic fallback. Provider switching when models are unavailable.
Limitations:
  • Proxy, not intelligent router. Routing is availability-based or manual. No optimization for cost-per-quality, task complexity, or business KPIs.
  • No A/B testing, traffic splitting, or conditional routing.
  • Limited observability. Basic usage tracking only.
  • 25-40ms added latency per request, which may be unacceptable for real-time applications.

4. Vercel AI Gateway: Convenience for Vercel Teams

Vercel AI Gateway provides a single endpoint for major frontier models. It integrates natively with Vercel's deployment platform and AI SDK.
Why teams choose Vercel AI Gateway over LiteLLM:
  • Zero-friction for Vercel teams. If your application already deploys on Vercel, adding the gateway requires no additional infrastructure.
  • Built-in failovers. Automatic provider switching during outages.
  • OpenAI and Anthropic SDK compatibility. Drop-in replacement for existing integrations.
Limitations:
  • No conditional routing, A/B testing, or traffic splitting. Routing logic is limited to static fallback chains.
  • Tightly coupled to the Vercel platform. No self-hosted or multi-cloud option.
  • Limited observability. Usage and billing dashboard only; deeper tracing requires third-party tools.
  • Serverless execution limits constrain long-running agentic workflows.

5. Bifrost: High-Performance Open-Source Gateway

Bifrost by Maxim AI is an open-source AI gateway built in Go. It addresses LiteLLM's primary technical limitation (Python's GIL bottleneck) while maintaining the self-hosted, open-source model.
Why teams choose Bifrost over LiteLLM:
  • Dramatically better performance. Microsecond-scale overhead per request at high RPS, well ahead of LiteLLM's Python architecture which begins degrading under sustained load.
  • Semantic caching. Identifies semantically similar queries and serves cached responses. LiteLLM supports exact-match caching only.
  • MCP support. Native Model Context Protocol integration for agentic workflows.
  • Open source (Go). Full source code visibility. Auditable routing logic.
Limitations:
  • Same supply chain risk class as LiteLLM. Bifrost is self-hosted open-source software distributed through package registries. The Go ecosystem has different characteristics than Python/PyPI, but the structural risk of dependency compromise exists.
  • Smaller provider catalog compared to LiteLLM or Realtime Router.
  • No conditional routing on business logic. Load balancing and failover only.
  • No A/B testing or traffic splitting.
  • Requires engineering resources for deployment, scaling, and maintenance.

How to Choose a LiteLLM Alternative

The right alternative depends on what drove your evaluation. If the security incident is the primary concern, the fundamental question is whether to remain on self-hosted infrastructure or move to a managed service.
  • You want managed infrastructure with intelligent routing: Realtime Router. No self-hosted attack surface, conditional routing with CEL expressions, native A/B testing, and the only LLM router that integrates with a full voice pipeline.
  • You need compliance and governance first: Portkey. Guardrails, PII filtering, and the strongest governance tooling in the category. Available as both cloud-managed and self-hosted.
  • You want the broadest model access for prototyping: OpenRouter. No subscription required. Managed infrastructure eliminates self-hosted risk. Move to a production router when you need optimization.
  • You are already on Vercel: Vercel AI Gateway. Zero-friction addition to existing Vercel deployments. Limited routing logic, but sufficient for teams that just need consolidated model access with failover.
  • You want to stay self-hosted but need better performance: Bifrost. Substantially faster than LiteLLM, Go-based architecture, open source. Understand that self-hosted infrastructure carries supply chain risk regardless of language or ecosystem.

Managed vs. Self-Hosted: The Security Trade-off

The LiteLLM incident did not create the managed vs. self-hosted debate, but it made the trade-offs concrete.
DimensionSelf-Hosted (LiteLLM, Bifrost)Managed (Realtime Router, OpenRouter, Vercel AI Gateway)
Supply chain riskYour team validates every dependency update. A single compromised package can expose all API keys.Provider manages supply chain internally. You never install packages from public registries.
Credential storageAPI keys stored in your infrastructure config. Compromised proxy exposes all provider credentials.Credentials stored in provider's isolated, audited infrastructure.
Patch speedYour team must detect, evaluate, and deploy patches.Provider patches infrastructure centrally. No action required from your team.
Audit and complianceYou own the audit. Internal security review needed for every update.Provider handles SOC2, penetration testing, and infrastructure security.
CustomizationFull control over routing logic and infrastructure. Can modify source code.Configuration through provider's API. Less flexibility, more guardrails.
For most production teams, the operational burden of securing self-hosted LLM routing infrastructure outweighs the flexibility benefits. Managed services like Realtime Router provide more routing intelligence (conditional routing, A/B testing, business-metric optimization) than LiteLLM offered, without the self-hosted attack surface.

FAQ

Is LiteLLM safe to use after the supply chain attack?

LiteLLM released a clean version (v1.83.0) on a redesigned CI/CD pipeline with isolated environments and stronger security gates. The compromised versions (1.82.7 and 1.82.8) have been removed from PyPI. LiteLLM is working with Mandiant on a forensic review. Teams that pinned to affected versions should rotate all credentials that were accessible on machines where the package was installed.

What makes Realtime Router different from other LiteLLM alternatives?

Realtime Router is the only LLM routing service that combines conditional routing with CEL expressions, native A/B testing with sticky user assignment, and integration with a full voice AI pipeline. Realtime Router connects directly to Realtime TTS (#1 on the Artificial Analysis Speech Arena), Realtime STT, and the Realtime API for teams building conversational or voice-enabled applications. No other LLM router offers voice pipeline integration.

Can I migrate from LiteLLM to Realtime Router without rewriting code?

Yes. Realtime Router accepts standard OpenAI-compatible chat completion requests. Migration requires changing your base URL and API key. Existing request structure, message formatting, and response handling stay the same. Migration documentation is available for teams moving from LiteLLM and other routing tools.

Does the supply chain risk apply to all self-hosted open-source tools?

Yes. Any software distributed through public package registries (PyPI, npm, Go modules) is structurally vulnerable to supply chain attacks where a compromised maintainer account or CI/CD pipeline injects malicious code into legitimate packages. The LiteLLM incident exploited the distribution mechanism, not LiteLLM's code. Self-hosted tools like Bifrost (Go-based) carry the same category of risk, though specific attack surfaces vary by ecosystem.

Which LiteLLM alternative is best for production voice applications?

Realtime Router is the only LLM router with direct integration into a voice pipeline. It connects to Realtime TTS, Realtime STT, and the Realtime API, so teams building voice agents, AI companions, or conversational applications can unify LLM routing and voice infrastructure under one API and one billing relationship.
Copyright © 2021-2026 Inworld AI
Best LiteLLM Alternatives for Production LLM Routing (2026)