Ephemeral Settings Reference
Complete reference for all ephemeral settings. Set with /set <key> <value> during a session or --set <key>=<value> at startup. Ephemeral settings don't persist to settings.json — they live only for the current session unless saved to a profile with /profile save.
For guidance on tuning these for specific models, see Settings and Profiles.
Reasoning
Control extended thinking / chain-of-thought. Most models need reasoning.enabled true at minimum; the rest have sensible defaults.
| Setting |
Type |
Default |
Profile |
Description |
reasoning.enabled |
boolean |
false |
yes |
Turn on thinking mode. Required for models like Kimi K2-Thinking, Claude with thinking, o3. |
reasoning.effort |
enum |
provider default |
yes |
How hard the model thinks: minimal, low, medium, high, xhigh. Higher = slower + more tokens but better results. Anthropic Opus defaults to high. Codex defaults to medium. |
reasoning.maxTokens |
number |
— |
yes |
Cap the thinking token budget (OpenAI). Limits how much the model can think per turn. |
reasoning.budgetTokens |
number |
— |
yes |
Anthropic-specific thinking budget. Usually set automatically via reasoning.effort or adaptive thinking. |
reasoning.adaptiveThinking |
boolean |
false |
yes |
Let Anthropic auto-tune the thinking budget based on task complexity. Enabled by default for Claude via the anthropic provider alias. |
reasoning.includeInResponse |
boolean |
true |
yes |
Show thinking blocks in the terminal. Set false to get reasoning quality without the visual noise. |
reasoning.includeInContext |
boolean |
true |
yes |
Keep thinking in conversation history sent to the model. If false, the model can't reference its own prior reasoning — hurts multi-step tasks. |
reasoning.stripFromContext |
enum |
none |
yes |
Prune old thinking to manage context growth. none = keep all (best quality). allButLast = keep only latest thinking (good balance). all = discard all thinking from context (saves tokens). |
reasoning.format |
enum |
— |
yes |
API format: native or field. Leave unset unless you know your provider needs a specific format. |
reasoning.summary |
enum |
— |
yes |
OpenAI Responses API reasoning summary: auto, concise, detailed, none. Codex alias defaults to auto. |
text.verbosity |
enum |
— |
yes |
OpenAI Responses API text verbosity for thinking output: low, medium, high. |
Context and Compression
Control how much context the model sees and when/how history is compressed. These directly affect quality — too small and the model loses track; too large and it drowns in noise.
| Setting |
Type |
Default |
Profile |
Description |
context-limit |
number |
model default |
yes |
Max tokens for the entire context window (system prompt + history + tool output). Set lower than the model's max to leave headroom. |
compression-threshold |
number |
model default |
yes |
Fraction of context-limit that triggers compression (0.0–1.0). E.g., 0.7 means compress when 70% full. Lower = more frequent compression but more headroom. |
max-prompt-tokens |
number |
200000 |
yes |
Hard ceiling on any single prompt to the API. Safety net to prevent runaway costs. |
maxOutputTokens |
number |
— |
yes |
Max output tokens per response (generic, translated by provider). Anthropic alias sets this to 40000. Limits how much the model writes per turn. |
compression.strategy |
enum |
middle-out |
yes |
Compression algorithm: middle-out (LLM-summarizes middle turns) or top-down-truncation (drops oldest turns). |
compression.profile |
string |
— |
yes |
Profile to use for compression LLM calls. Lets you use a cheaper model for summarization. |
compression.density.readWritePruning |
boolean |
true |
yes |
Drop read-file results when the file was subsequently written. Reduces noise from obsolete reads. |
compression.density.fileDedupe |
boolean |
true |
yes |
Deduplicate repeated @file inclusions. |
compression.density.recencyPruning |
boolean |
false |
yes |
Keep only the N most recent results per tool type. Aggressive — enable only for very long sessions. |
compression.density.recencyRetention |
number |
3 |
yes |
How many recent results to keep per tool type when recencyPruning is on. |
compression.density.compressHeadroom |
number |
0.6 |
yes |
Multiplier for compression target (0–1). Lower = more aggressive compression. |
compression.density.optimizeThreshold |
number |
strategy default |
yes |
Context usage fraction that triggers density optimization. |
Tool Output Limits
Prevent tools from flooding the context. Applied to all tools via the batch scheduler. See Settings and Profiles for how these interact.
| Setting |
Type |
Default |
Profile |
Description |
tool-output-max-items |
number |
50 (read-many-files), 1000 (grep) |
yes |
Max files/matches per tool call. Lower to force the model to be more surgical. |
tool-output-max-tokens |
number |
50000 |
yes |
Max tokens across tool output in a batch. Split across concurrent tool calls. |
tool-output-truncate-mode |
enum |
warn |
yes |
What happens when output exceeds limits. warn = drop output entirely, tell model to narrow query. truncate = cut to fit silently. sample = pick representative lines. |
tool-output-item-size-limit |
number |
524288 (512KB) |
yes |
Max bytes per individual file/item. Prevents one huge file from consuming the budget. |
file-read-max-lines |
number |
2000 |
yes |
Default max lines when reading a text file with no explicit limit. Prevents accidentally reading massive files. |
Timeouts
Prevent commands and tasks from hanging indefinitely. In seconds (not milliseconds, despite older docs).
| Setting |
Type |
Default |
Profile |
Description |
shell-default-timeout-seconds |
number |
300 (5 min) |
yes |
Default timeout for shell commands. The model can request a specific timeout, but this applies when it doesn't. |
shell-max-timeout-seconds |
number |
900 (15 min) |
yes |
Hard ceiling — the model can't request longer than this. Increase for long builds/test suites. |
shell-inactivity-timeout-seconds |
number |
— (disabled) |
yes |
Kill commands that produce no output for this long. Resets on each output line. Good for catching commands that hang waiting for input. |
task-default-timeout-seconds |
number |
900 (15 min) |
yes |
Default timeout for subagent tasks. |
task-max-timeout-seconds |
number |
1800 (30 min) |
yes |
Hard ceiling for subagent tasks. |
socket-timeout |
number (ms) |
— |
yes |
HTTP request timeout for API calls, in milliseconds. Useful for slow local models. |
Loop Detection
Catch models that get stuck repeating the same action.
| Setting |
Type |
Default |
Profile |
Description |
maxTurnsPerPrompt |
number |
-1 (unlimited) |
yes |
Hard limit on turns per prompt. Set to a positive integer to cap runaway sessions. |
loopDetectionEnabled |
boolean |
true |
yes |
Master switch for all loop detection. Disable only if you're sure the model won't loop. |
toolCallLoopThreshold |
number |
50 |
yes |
Consecutive identical tool calls before intervention. -1 = unlimited. |
contentLoopThreshold |
number |
50 |
yes |
Consecutive identical content chunks before intervention. -1 = unlimited. |
Streaming and Network
| Setting |
Type |
Default |
Profile |
Description |
streaming |
enum |
enabled |
yes |
enabled or disabled. Disable for providers that don't support streaming or for debugging. |
api-version |
string |
— |
yes |
API version string. Required by some providers (e.g., Azure OpenAI). |
socket-keepalive |
boolean |
— |
yes |
TCP keepalive for local AI servers. Prevents idle connections from dropping. |
socket-nodelay |
boolean |
— |
yes |
TCP_NODELAY for local AI servers. Reduces latency at the cost of more packets. |
stream-options |
JSON |
— |
yes |
Extra stream options passed to the OpenAI API (e.g., {"include_usage": true}). |
retries |
number |
— |
yes |
Max retry attempts for failed API calls. |
retrywait |
number (ms) |
— |
yes |
Initial delay between retries. Exponential backoff applies. |
Rate Limiting
Proactive throttling to stay within provider rate limits.
| Setting |
Type |
Default |
Profile |
Description |
rate-limit-throttle |
enum |
— |
yes |
on or off. When on, LLxprt proactively slows down before hitting rate limits. |
rate-limit-throttle-threshold |
number |
— |
yes |
Percentage of rate limit (1–100) to start throttling at. |
rate-limit-max-wait |
number (ms) |
— |
yes |
Max time to wait for rate limit headroom before sending anyway. |
prompt-caching |
enum |
off |
yes |
Provider-side prompt caching: off, 5m, 1h, 24h. Saves costs when repeating similar prompts. Codex alias defaults to 24h. |
Load Balancer
Settings for multi-endpoint load balancing. Only apply when using load-balanced provider configurations.
| Setting |
Type |
Default |
Profile |
Description |
tpm_threshold |
number |
— |
yes |
Minimum tokens/minute before triggering failover to next endpoint. |
timeout_ms |
number (ms) |
— |
yes |
Max request duration before load balancer fails over. |
circuit_breaker_enabled |
boolean |
— |
yes |
Enable circuit breaker for failing backends. |
circuit_breaker_failure_threshold |
number |
3 |
yes |
Failures before opening the circuit (stop sending to that backend). |
circuit_breaker_failure_window_ms |
number (ms) |
60000 |
yes |
Time window for counting failures. |
circuit_breaker_recovery_timeout_ms |
number (ms) |
30000 |
yes |
Cooldown before retrying an opened circuit. |
Subagent and Task Control
| Setting |
Type |
Default |
Profile |
Description |
task-max-async |
number |
5 |
yes |
Max concurrent async subagent tasks. -1 = unlimited (up to 100). |
subagents.async.enabled |
boolean |
true |
yes |
Enable/disable async subagent execution. |
todo-continuation |
boolean |
— |
yes |
Enable todo continuation mode — model picks up where it left off from a todo list. |
Tool Control
| Setting |
Type |
Default |
Profile |
Description |
tools.disabled |
string[] |
— |
yes |
List of tool names to disable. The model won't see these tools at all. |
tools.allowed |
string[] |
— |
yes |
Allowlist — if set, only these tools are available. Overrides tools.disabled. |
tool_choice |
string |
— |
yes |
Tool choice strategy sent to the API: auto, required, none. |
Prompt Configuration
| Setting |
Type |
Default |
Profile |
Description |
enable-tool-prompts |
boolean |
false |
yes |
Load tool-specific prompt files from ~/.llxprt/prompts/tools/. Adds specialized instructions per tool. |
include-folder-structure |
boolean |
— |
yes |
Include the workspace folder tree in the system prompt. Helps the model navigate, but costs tokens. |
Custom Headers
| Setting |
Type |
Default |
Profile |
Description |
custom-headers |
JSON |
— |
yes |
Custom HTTP headers as a JSON object. Applied to all API requests. |
user-agent |
string |
— |
yes |
Override the User-Agent header. Some providers (e.g., Kimi) require specific user agents. |
Shell Behavior
| Setting |
Type |
Default |
Profile |
Description |
shell-replacement |
string |
— |
yes |
Command substitution mode: allowlist (safe subset), all (everything), none/false (disabled). Controls whether $() and backticks work in shell commands. |
Authentication
| Setting |
Type |
Default |
Profile |
Description |
auth.noBrowser |
boolean |
false |
yes |
Skip automatic browser launch for OAuth. Use manual code entry instead. Useful for SSH/headless environments. |
authOnly |
boolean |
— |
yes |
Force OAuth-only authentication. |
Memory
| Setting |
Type |
Default |
Profile |
Description |
model.canSaveCore |
boolean |
false |
no |
Allow the model to write to .LLXPRT_SYSTEM (core system memory). Unsafe — the model can override your own directives. Not saved to profiles deliberately. |
model.allMemoriesAreCore |
boolean |
false |
yes |
Load LLXPRT.md files as part of the system prompt instead of user context. Makes the model treat your memories as hard directives rather than suggestions. |
Debugging
| Setting |
Type |
Default |
Profile |
Description |
emojifilter |
enum |
auto |
yes |
Emoji handling: allowed, auto (detect terminal support), warn, error. |
dumponerror |
enum |
— |
yes |
Dump API request body to ~/.llxprt/dumps/ on errors: enabled or disabled. |
dumpcontext |
enum |
— |
yes |
Context dumping: now (dump immediately), status, on (every turn), error (on errors), off. |
Model Parameters
These are passed directly to the provider API as-is. LLxprt doesn't validate them. Set with /set modelparam <name> <value>.
| Parameter |
Type |
Description |
temperature |
number |
Sampling temperature (0.0–2.0). Lower = more deterministic. |
max_tokens |
number |
Max tokens to generate (OpenAI/Anthropic). Alias: maxTokens. |
max_output_tokens |
number |
Max output tokens (Gemini native param). |
top_p |
number |
Nucleus sampling threshold. |
top_k |
number |
Top-k sampling. |
frequency_penalty |
number |
Penalize repeated tokens. |
presence_penalty |
number |
Penalize tokens that appeared at all. |
seed |
number |
Random seed for deterministic output (OpenAI only). |
stop |
string[] |
Stop sequences — model stops generating when it produces any of these. |
response_format |
JSON |
Response format (e.g., {"type": "json_object"}). |
logit_bias |
JSON |
Per-token bias. |
reasoning |
JSON |
OpenAI reasoning config object. Usually set via reasoning.* settings instead. |