Provider Routing
BluesMinds intelligently routes each request to the best available upstream provider, giving you higher uptime and cost efficiency without changing your code.
How Routing Works
When you call POST /v1/chat/completions with "model": "gpt-4o":
- BluesMinds looks up all configured channels that serve
gpt-4o - It selects the optimal channel based on health, latency, and cost
- The request is forwarded; the response is normalized to OpenAI format
- If the selected channel fails, BluesMinds retries on the next best channel
Your App
│
▼
BluesMinds Gateway
│
├─► Channel A (OpenAI Direct) ← preferred if healthy
├─► Channel B (Azure OpenAI) ← failover #1
└─► Channel C (3rd-party proxy) ← failover #2
Automatic Failover
Failover is transparent — your application receives the same response shape whether it came from Channel A or Channel C. You never need to handle provider-specific errors in your business logic.
Failover triggers on:
5xxerrors from the upstream provider- Network timeouts
- Provider-reported rate limit (
429) that cannot be resolved locally
Load Balancing
When multiple healthy channels serve the same model, BluesMinds distributes traffic to:
- Minimize latency (prefer low-p50 channels)
- Spread rate-limit exposure across providers
- Maximize cost efficiency
Channel Configuration (Admin)
Channels are configured server-side by BluesMinds administrators. Enterprise customers can request custom channel configurations.
To view available channels (requires admin session token):
curl -H "Authorization: <session_token>" \
https://api.bluesminds.com/api/channel/
Best Practices
- Check
/v1/modelsregularly — model availability reflects channel health - Don't retry 404s — if a model isn't in
/v1/models, no channel serves it - Use backoff on 429 — BluesMinds propagates upstream rate limits when all channels are exhausted
Model Availability vs. Provider Availability
| Status | Meaning |
|---|---|
Model in /v1/models | At least one healthy channel serves this model |
Model absent from /v1/models | No channel currently available |
404 on inference call | Model was available at list time but channel went down between requests |
Always handle 404 gracefully by re-checking the model list.