Profiles
Profiles save your LLxprt Code configuration for quick switching between providers, models, and settings. There are two types of profiles: model profiles and load balancer profiles.
Model Profiles
Model profiles capture your current session configuration including provider, model, and settings.
Creating a Model Profile
/profile save model <name> [bucket1] [bucket2] ...
What gets saved:
- Current provider and model
- API base URL (if custom)
- Session settings (context limits, reasoning settings, etc.)
- OAuth bucket configuration (if specified)
Basic Example
/provider anthropic
/model claude-sonnet-4-5
/profile save model work-claude
Profile with OAuth Bucket
# Authenticate first
/auth anthropic login work@company.com
# Save profile using that bucket
/profile save model work-profile work@company.com
Multi-Bucket Profiles (Automatic Failover)
Save a profile with multiple OAuth buckets for automatic failover when rate limits or quota errors occur:
# Authenticate to multiple buckets
/auth anthropic login work1@company.com
/auth anthropic login work2@company.com
/auth anthropic login work3@company.com
# Save profile with failover chain
/profile save model high-availability work1@company.com work2@company.com work3@company.com
Failover behavior:
- Buckets are tried in the order specified
- On 429 (rate limit): advance to next bucket immediately
- On 402 (quota/payment): advance to next bucket immediately
- On 401 (auth failure): attempt token refresh, retry once, then advance
OpenAI-Compatible Endpoints
Profiles work with any OpenAI-compatible endpoint. Here's an example using Synthetic with Kimi K2:
{
"version": 1,
"provider": "openai",
"model": "hf:moonshotai/Kimi-K2-Thinking",
"modelParams": {
"temperature": 1
},
"ephemeralSettings": {
"auth-keyfile": "/path/to/api_key",
"context-limit": 190000,
"base-url": "https://api.synthetic.new/openai/v1",
"streaming": "enabled",
"reasoning.enabled": true,
"reasoning.includeInContext": true,
"reasoning.includeInResponse": true,
"reasoning.stripFromContext": "all"
}
}
Another example with MiniMax M2:
{
"version": 1,
"provider": "openai",
"model": "hf:MiniMaxAI/MiniMax-M2",
"modelParams": {
"temperature": 1
},
"ephemeralSettings": {
"auth-keyfile": "/path/to/api_key",
"context-limit": 190000,
"base-url": "https://api.synthetic.new/openai/v1",
"streaming": "enabled",
"reasoning.enabled": true,
"reasoning.includeInContext": true,
"reasoning.includeInResponse": true,
"reasoning.stripFromContext": "none"
}
}
Load Balancer Profiles
Load balancer profiles combine multiple model profiles and distribute requests across them.
Creating a Load Balancer Profile
/profile save loadbalancer <name> <roundrobin|failover> <profile1> <profile2> [profile3...]
Requires at least 2 existing model profiles.
Policies
roundrobin - Distributes requests evenly across backends. Each request goes to the next profile in sequence.
/profile save loadbalancer balanced roundrobin claude-work openai-work gemini-work
failover - Uses primary backend until it fails, then tries the next one.
/profile save loadbalancer resilient failover primary-claude backup-openai emergency-gemini
What Triggers Failover
By default, failover occurs on:
- HTTP 429 (rate limit)
- HTTP 500, 502, 503, 504 (server errors)
- Network/TCP errors
Combining Load Balancer with OAuth Buckets
Create profiles with buckets, then reference them in a load balancer:
# Create bucketed profiles
/profile save model claude-team1 bucket1@company.com
/profile save model claude-team2 bucket2@company.com
# Create load balancer over them
/profile save loadbalancer team-lb roundrobin claude-team1 claude-team2
Cache Considerations
When using load balancers, understanding cache behavior helps optimize performance.
Conversation History is Always Preserved
Important: LLxprt Code maintains conversation history at the application layer (in the HistoryService), not at the provider level. This means:
- Full conversation history is sent with every request to whichever backend handles it
- Switching backends does NOT lose conversation context - the history travels with the request
- Both round-robin and failover strategies preserve complete conversation continuity
# With round-robin, each backend receives the FULL conversation history
/profile save loadbalancer multi roundrobin claude-work openai-work gemini-work
# Request 1 → claude-work (receives: [user message 1])
# Request 2 → openai-work (receives: [user message 1, assistant response 1, user message 2])
# Request 3 → gemini-work (receives: [full history including messages 1-3])
# All backends see the complete conversation!
What IS Lost on Backend Switch: Provider-Side Caching
While conversation history is preserved, provider-side prompt caching may be lost:
- Anthropic prompt caching: Anthropic caches tokenized prefixes server-side for faster responses. Switching to a different backend (or even a different Anthropic bucket) invalidates this cache.
- OpenAI cached tokens: Similar server-side optimization that resets on backend switch.
- Gemini context caching: Google's cached context feature is tied to specific sessions.
This means:
- No loss of conversation context - the new backend receives full history
- Potential latency increase - first request to new backend may be slower (no cached tokens)
- Potential cost increase - provider may charge for re-tokenizing the conversation prefix
Best Practices for Cache-Optimized Configuration
1. Prefer failover over round-robin to maximize provider-side caching:
# Good for conversations - stays on one backend, maximizes prompt cache hits
/profile save loadbalancer chat-resilient failover primary backup
# Better for stateless batch jobs where caching doesn't matter
/profile save loadbalancer batch-jobs roundrobin worker1 worker2 worker3
2. Use bucket failover within a single profile for rate limit handling:
# Multiple buckets on same provider - preserves provider-side cache
/profile save model claude-multi bucket1 bucket2 bucket3
# Same provider = same tokenization = cache may still be valid
3. Set appropriate retry counts to avoid premature failover:
# Allow retries before switching backends (preserves cache longer)
/set failover_retry_count 3
/set failover_retry_delay_ms 2000
/profile save loadbalancer patient-lb failover primary backup
4. For long conversations, prefer single-provider setups:
Single-provider configurations maximize provider-side caching benefits. Load balancers are better suited for high-availability needs rather than routine conversations.
Failover Behavior
Understanding when and how failover occurs helps you configure resilient setups.
Default Failover Triggers
By default, these HTTP status codes trigger failover:
| Status Code | Meaning | Failover Behavior |
|---|---|---|
| 429 | Rate Limited | Immediate failover to next backend |
| 500 | Internal Server Error | Retry, then failover |
| 502 | Bad Gateway | Retry, then failover |
| 503 | Service Unavailable | Retry, then failover |
| 504 | Gateway Timeout | Retry, then failover |
Network errors (TCP connection failures, DNS resolution failures, timeouts) also trigger failover when failover_on_network_errors is enabled (default: true).
Customizing Failover Status Codes
Override the default status codes using failover_status_codes:
# Only failover on rate limits and service unavailable
/set failover_status_codes [429,503]
# Add 400 (bad request) to failover triggers
/set failover_status_codes [400,429,500,502,503,504]
# Failover on any 4xx or 5xx error
/set failover_status_codes [400,401,402,403,404,429,500,501,502,503,504]
Save the configuration to a profile:
/set failover_status_codes [429,500,502,503,504]
/profile save loadbalancer my-lb failover primary backup
Bucket Failover vs Load Balancer Failover
There are two distinct failover mechanisms that can work together:
Bucket Failover (within a single model profile):
- Rotates OAuth buckets on the same provider
- Preserves model context and conversation state
- Triggers on: 429 (rate limit), 402 (quota), 401 (auth failure with refresh)
# Bucket failover within a profile
/profile save model claude-buckets bucket1 bucket2 bucket3
Load Balancer Failover (across model profiles):
- Switches between entirely different backends (potentially different providers)
- Loses context on switch (see Cache Considerations above)
- Triggers on: configurable status codes (default: 429, 500, 502, 503, 504)
# LB failover across profiles
/profile save loadbalancer multi-provider failover claude-profile openai-profile
Combined failover chain:
When both are configured, bucket failover occurs first, then LB failover:
Request fails with 429
→ Try bucket2 (same profile)
→ Try bucket3 (same profile)
→ All buckets exhausted, LB failover
→ Try next profile in load balancer
Retry Configuration
Fine-tune retry behavior before failover occurs:
# Number of retries per backend before moving to next
/set failover_retry_count 3
# Delay between retries (milliseconds)
/set failover_retry_delay_ms 1000
# Disable network error failover (not recommended)
/set failover_on_network_errors false
Example with aggressive retry:
/set failover_retry_count 5
/set failover_retry_delay_ms 2000 # 2 seconds between retries
/profile save loadbalancer patient-failover failover primary backup
Example with immediate failover:
/set failover_retry_count 0 # No retries, immediate failover
/set failover_retry_delay_ms 0
/profile save loadbalancer fast-failover failover primary backup
Managing Profiles
Loading a Profile
/profile load <name>
Or open interactive selection:
/profile load
Listing Profiles
/profile list
Deleting a Profile
/profile delete <name>
Setting a Default Profile
/profile set-default <name>
The default profile loads automatically on startup.
Loading via CLI Flag
llxprt --profile-load my-profile
Advanced Load Balancer Settings
Configure these settings before saving a load balancer profile using /set:
| Setting | Default | Description |
|---|---|---|
failover_retry_count |
1 | Retries per backend before moving to next |
failover_retry_delay_ms |
0 | Delay between retries (milliseconds) |
failover_on_network_errors |
true | Failover on TCP/network errors |
failover_status_codes |
[429,500,502,503,504] | HTTP codes that trigger failover |
lb_tpm_failover_threshold |
(none) | Minimum TPM before triggering failover |
lb_circuit_breaker_threshold |
(none) | Failures before circuit opens |
lb_circuit_breaker_timeout_ms |
(none) | Time before half-open retry |
Example:
/set failover_retry_count 3
/set failover_retry_delay_ms 1000
/profile save loadbalancer my-lb failover profile1 profile2
Viewing Profile Statistics
/stats lb # Load balancer stats (requests per backend)
/stats buckets # OAuth bucket usage stats
/diagnostics # Full system status including active profile
Common Workflows
High-Availability Setup
# Create primary profile
/provider anthropic
/model claude-opus-4-5
/profile save model primary-claude
# Create backup profile
/provider openai
/model gpt-5.2
/profile save model backup-openai
# Create failover load balancer
/profile save loadbalancer ha-setup failover primary-claude backup-openai
# Set as default
/profile set-default ha-setup
Rate Limit Distribution with Multiple Buckets
# Authenticate multiple buckets
/auth anthropic login team-bucket1
/auth anthropic login team-bucket2
/auth anthropic login team-bucket3
# Create profile with all buckets
/provider anthropic
/model claude-sonnet-4-5
/profile save model claude-multi team-bucket1 team-bucket2 team-bucket3
Round-Robin Across Providers
# Create individual profiles
/provider anthropic
/model claude-sonnet-4-5
/profile save model claude-work
/provider openai
/model gpt-5.2
/profile save model openai-work
/provider gemini
/model gemini-3-flash-preview
/profile save model gemini-work
# Create round-robin load balancer
/profile save loadbalancer multi-provider roundrobin claude-work openai-work gemini-work
Resilient Multi-Provider Setup with OAuth Buckets
This example demonstrates a complete high-availability configuration combining:
- Multiple providers (Anthropic, OpenAI)
- OAuth bucket failover within each provider
- Load balancer failover across providers
- Custom failover status codes
Step 1: Set up OAuth buckets for each provider
# Anthropic buckets (team accounts)
/auth anthropic login team1@company.com
/auth anthropic login team2@company.com
/auth anthropic login personal@gmail.com
# OpenAI buckets
/auth codex login enterprise@company.com
/auth codex login backup@company.com
Step 2: Create model profiles with bucket chains
# Anthropic profile with 3-bucket failover
/provider anthropic
/model claude-sonnet-4-5
/profile save model claude-ha team1@company.com team2@company.com personal@gmail.com
# OpenAI profile with 2-bucket failover
/provider openai
/model gpt-5.2
/profile save model openai-ha enterprise@company.com backup@company.com
Step 3: Configure failover settings
# Retry 2 times with 1 second delay before failover
/set failover_retry_count 2
/set failover_retry_delay_ms 1000
# Trigger failover on rate limits and server errors
/set failover_status_codes [429,500,502,503,504]
# Enable network error failover
/set failover_on_network_errors true
Step 4: Create the load balancer
# Create failover load balancer: try Anthropic first, fall back to OpenAI
/profile save loadbalancer enterprise-ha failover claude-ha openai-ha
# Set as default
/profile set-default enterprise-ha
Complete failover chain in action:
Request hits 429 (rate limit)
↓
Bucket failover: team1@company.com → team2@company.com
↓ (still 429)
Bucket failover: team2@company.com → personal@gmail.com
↓ (still 429, all Anthropic buckets exhausted)
LB failover: claude-ha profile → openai-ha profile
↓
New provider (OpenAI) with fresh bucket chain:
enterprise@company.com → backup@company.com
↓
Request succeeds on OpenAI
Verify the configuration:
# Check load balancer stats
/stats lb
# Check bucket usage
/stats buckets
# View full diagnostics
/diagnostics
The complete profile JSON (enterprise-ha.json):
{
"version": 1,
"type": "loadbalancer",
"policy": "failover",
"backends": ["claude-ha", "openai-ha"],
"ephemeralSettings": {
"failover_retry_count": 2,
"failover_retry_delay_ms": 1000,
"failover_status_codes": [429, 500, 502, 503, 504],
"failover_on_network_errors": true
}
}
Profile Storage
Profiles are stored as JSON files in ~/.llxprt/profiles/. You can edit them directly if needed.
Troubleshooting
Profile not loading
- Check profile exists:
/profile list - Profile names cannot contain path separators (
/or\)
OAuth bucket errors
- Check bucket is authenticated:
/auth <provider> status - Re-authenticate expired bucket:
/auth <provider> login <bucket>
Load balancer not failing over
- Check settings:
failover_on_network_errors,failover_status_codes - Verify all referenced profiles exist:
/profile list
See Also
- Authentication - Setting up provider authentication and OAuth buckets
- Commands - Complete command reference
- Configuration - Configuration options