Ephemeral Settings Reference

Complete reference for all ephemeral settings. Set with /set <key> <value> during a session or --set <key>=<value> at startup. Ephemeral settings don't persist to settings.json — they live only for the current session unless saved to a profile with /profile save.

For guidance on tuning these for specific models, see Settings and Profiles.

Reasoning

Control extended thinking / chain-of-thought. Most models need reasoning.enabled true at minimum; the rest have sensible defaults.

Setting	Type	Default	Profile	Description
`reasoning.enabled`	boolean	`false`	yes	Turn on thinking mode. Required for models like Kimi K2-Thinking, Claude with thinking, o3.
`reasoning.effort`	enum	provider default	yes	How hard the model thinks: `minimal`, `low`, `medium`, `high`, `xhigh`. Higher = slower + more tokens but better results. Anthropic Opus defaults to `high`. Codex defaults to `medium`.
`reasoning.maxTokens`	number	—	yes	Cap the thinking token budget (OpenAI). Limits how much the model can think per turn.
`reasoning.budgetTokens`	number	—	yes	Anthropic-specific thinking budget. Usually set automatically via `reasoning.effort` or adaptive thinking.
`reasoning.adaptiveThinking`	boolean	`false`	yes	Let Anthropic auto-tune the thinking budget based on task complexity. Enabled by default for Claude via the `anthropic` provider alias.
`reasoning.includeInResponse`	boolean	`true`	yes	Show thinking blocks in the terminal. Set `false` to get reasoning quality without the visual noise.
`reasoning.includeInContext`	boolean	`true`	yes	Keep thinking in conversation history sent to the model. If `false`, the model can't reference its own prior reasoning — hurts multi-step tasks.
`reasoning.stripFromContext`	enum	`none`	yes	Prune old thinking to manage context growth. `none` = keep all (best quality). `allButLast` = keep only latest thinking (good balance). `all` = discard all thinking from context (saves tokens).
`reasoning.format`	enum	—	yes	API format: `native` or `field`. Leave unset unless you know your provider needs a specific format.
`reasoning.summary`	enum	—	yes	OpenAI Responses API reasoning summary: `auto`, `concise`, `detailed`, `none`. Codex alias defaults to `auto`.
`text.verbosity`	enum	—	yes	OpenAI Responses API text verbosity for thinking output: `low`, `medium`, `high`.

Context and Compression

Control how much context the model sees and when/how history is compressed. These directly affect quality — too small and the model loses track; too large and it drowns in noise.

Setting	Type	Default	Profile	Description
`context-limit`	number	model default	yes	Max tokens for the entire context window (system prompt + history + tool output). Set lower than the model's max to leave headroom.
`compression-threshold`	number	model default	yes	Fraction of `context-limit` that triggers compression (0.0–1.0). E.g., `0.7` means compress when 70% full. Lower = more frequent compression but more headroom.
`max-prompt-tokens`	number	`200000`	yes	Hard ceiling on any single prompt to the API. Safety net to prevent runaway costs.
`maxOutputTokens`	number	—	yes	Max output tokens per response (generic, translated by provider). Anthropic alias sets this to `40000`. Limits how much the model writes per turn.
`compression.strategy`	enum	`middle-out`	yes	Compression algorithm: `middle-out` (LLM-summarizes middle turns) or `top-down-truncation` (drops oldest turns).
`compression.profile`	string	—	yes	Profile to use for compression LLM calls. Lets you use a cheaper model for summarization.
`compression.density.readWritePruning`	boolean	`true`	yes	Drop read-file results when the file was subsequently written. Reduces noise from obsolete reads.
`compression.density.fileDedupe`	boolean	`true`	yes	Deduplicate repeated `@file` inclusions.
`compression.density.recencyPruning`	boolean	`false`	yes	Keep only the N most recent results per tool type. Aggressive — enable only for very long sessions.
`compression.density.recencyRetention`	number	`3`	yes	How many recent results to keep per tool type when `recencyPruning` is on.
`compression.density.compressHeadroom`	number	`0.6`	yes	Multiplier for compression target (0–1). Lower = more aggressive compression.
`compression.density.optimizeThreshold`	number	strategy default	yes	Context usage fraction that triggers density optimization.

Tool Output Limits

Prevent tools from flooding the context. Applied to all tools via the batch scheduler. See Settings and Profiles for how these interact.

Setting	Type	Default	Profile	Description
`tool-output-max-items`	number	50 (read-many-files), 1000 (grep)	yes	Max files/matches per tool call. Lower to force the model to be more surgical.
`tool-output-max-tokens`	number	`50000`	yes	Max tokens across tool output in a batch. Split across concurrent tool calls.
`tool-output-truncate-mode`	enum	`warn`	yes	What happens when output exceeds limits. `warn` = drop output entirely, tell model to narrow query. `truncate` = cut to fit silently. `sample` = pick representative lines.
`tool-output-item-size-limit`	number	`524288` (512KB)	yes	Max bytes per individual file/item. Prevents one huge file from consuming the budget.
`file-read-max-lines`	number	`2000`	yes	Default max lines when reading a text file with no explicit limit. Prevents accidentally reading massive files.

Timeouts

Prevent commands and tasks from hanging indefinitely. In seconds (not milliseconds, despite older docs).

Setting	Type	Default	Profile	Description
`shell-default-timeout-seconds`	number	`300` (5 min)	yes	Default timeout for shell commands. The model can request a specific timeout, but this applies when it doesn't.
`shell-max-timeout-seconds`	number	`900` (15 min)	yes	Hard ceiling — the model can't request longer than this. Increase for long builds/test suites.
`shell-inactivity-timeout-seconds`	number	— (disabled)	yes	Kill commands that produce no output for this long. Resets on each output line. Good for catching commands that hang waiting for input.
`task-default-timeout-seconds`	number	`900` (15 min)	yes	Default timeout for subagent tasks.
`task-max-timeout-seconds`	number	`1800` (30 min)	yes	Hard ceiling for subagent tasks.
`socket-timeout`	number (ms)	—	yes	HTTP request timeout for API calls, in milliseconds. Useful for slow local models.

Loop Detection

Catch models that get stuck repeating the same action.

Setting	Type	Default	Profile	Description
`maxTurnsPerPrompt`	number	`-1` (unlimited)	yes	Hard limit on turns per prompt. Set to a positive integer to cap runaway sessions.
`loopDetectionEnabled`	boolean	`true`	yes	Master switch for all loop detection. Disable only if you're sure the model won't loop.
`toolCallLoopThreshold`	number	`50`	yes	Consecutive identical tool calls before intervention. `-1` = unlimited.
`contentLoopThreshold`	number	`50`	yes	Consecutive identical content chunks before intervention. `-1` = unlimited.

Streaming and Network

Setting	Type	Default	Profile	Description
`streaming`	enum	`enabled`	yes	`enabled` or `disabled`. Disable for providers that don't support streaming or for debugging.
`api-version`	string	—	yes	API version string. Required by some providers (e.g., Azure OpenAI).
`socket-keepalive`	boolean	—	yes	TCP keepalive for local AI servers. Prevents idle connections from dropping.
`socket-nodelay`	boolean	—	yes	TCP_NODELAY for local AI servers. Reduces latency at the cost of more packets.
`stream-options`	JSON	—	yes	Extra stream options passed to the OpenAI API (e.g., `{"include_usage": true}`).
`retries`	number	—	yes	Max retry attempts for failed API calls.
`retrywait`	number (ms)	—	yes	Initial delay between retries. Exponential backoff applies.

Rate Limiting

Proactive throttling to stay within provider rate limits.

Setting	Type	Default	Profile	Description
`rate-limit-throttle`	enum	—	yes	`on` or `off`. When on, LLxprt proactively slows down before hitting rate limits.
`rate-limit-throttle-threshold`	number	—	yes	Percentage of rate limit (1–100) to start throttling at.
`rate-limit-max-wait`	number (ms)	—	yes	Max time to wait for rate limit headroom before sending anyway.
`prompt-caching`	enum	`off`	yes	Provider-side prompt caching: `off`, `5m`, `1h`, `24h`. Saves costs when repeating similar prompts. Codex alias defaults to `24h`.

Load Balancer

Settings for multi-endpoint load balancing. Only apply when using load-balanced provider configurations.

Setting	Type	Default	Profile	Description
`tpm_threshold`	number	—	yes	Minimum tokens/minute before triggering failover to next endpoint.
`timeout_ms`	number (ms)	—	yes	Max request duration before load balancer fails over.
`circuit_breaker_enabled`	boolean	—	yes	Enable circuit breaker for failing backends.
`circuit_breaker_failure_threshold`	number	`3`	yes	Failures before opening the circuit (stop sending to that backend).
`circuit_breaker_failure_window_ms`	number (ms)	`60000`	yes	Time window for counting failures.
`circuit_breaker_recovery_timeout_ms`	number (ms)	`30000`	yes	Cooldown before retrying an opened circuit.

Subagent and Task Control

Setting	Type	Default	Profile	Description
`task-max-async`	number	`5`	yes	Max concurrent async subagent tasks. `-1` = unlimited (up to 100).
`subagents.async.enabled`	boolean	`true`	yes	Enable/disable async subagent execution.
`todo-continuation`	boolean	—	yes	Enable todo continuation mode — model picks up where it left off from a todo list.

Tool Control

Setting	Type	Default	Profile	Description
`tools.disabled`	string[]	—	yes	List of tool names to disable. The model won't see these tools at all.
`tools.allowed`	string[]	—	yes	Allowlist — if set, only these tools are available. Overrides `tools.disabled`.
`tool_choice`	string	—	yes	Tool choice strategy sent to the API: `auto`, `required`, `none`.

Prompt Configuration

Setting	Type	Default	Profile	Description
`enable-tool-prompts`	boolean	`false`	yes	Load tool-specific prompt files from `~/.llxprt/prompts/tools/`. Adds specialized instructions per tool.
`include-folder-structure`	boolean	—	yes	Include the workspace folder tree in the system prompt. Helps the model navigate, but costs tokens.

Custom Headers

Setting	Type	Default	Profile	Description
`custom-headers`	JSON	—	yes	Custom HTTP headers as a JSON object. Applied to all API requests.
`user-agent`	string	—	yes	Override the User-Agent header. Some providers (e.g., Kimi) require specific user agents.

Shell Behavior

Setting	Type	Default	Profile	Description
`shell-replacement`	string	—	yes	Command substitution mode: `allowlist` (safe subset), `all` (everything), `none`/`false` (disabled). Controls whether `$()` and backticks work in shell commands.

Authentication

Setting	Type	Default	Profile	Description
`auth.noBrowser`	boolean	`false`	yes	Skip automatic browser launch for OAuth. Use manual code entry instead. Useful for SSH/headless environments.
`authOnly`	boolean	—	yes	Force OAuth-only authentication.

Memory

Setting	Type	Default	Profile	Description
`model.canSaveCore`	boolean	`false`	no	Allow the model to write to `.LLXPRT_SYSTEM` (core system memory). Unsafe — the model can override your own directives. Not saved to profiles deliberately.
`model.allMemoriesAreCore`	boolean	`false`	yes	Load `LLXPRT.md` files as part of the system prompt instead of user context. Makes the model treat your memories as hard directives rather than suggestions.

Debugging

Setting	Type	Default	Profile	Description
`emojifilter`	enum	`auto`	yes	Emoji handling: `allowed`, `auto` (detect terminal support), `warn`, `error`.
`dumponerror`	enum	—	yes	Dump API request body to `~/.llxprt/dumps/` on errors: `enabled` or `disabled`.
`dumpcontext`	enum	—	yes	Context dumping: `now` (dump immediately), `status`, `on` (every turn), `error` (on errors), `off`.

Model Parameters

These are passed directly to the provider API as-is. LLxprt doesn't validate them. Set with /set modelparam <name> <value>.

Parameter	Type	Description
`temperature`	number	Sampling temperature (0.0–2.0). Lower = more deterministic.
`max_tokens`	number	Max tokens to generate (OpenAI/Anthropic). Alias: `maxTokens`.
`max_output_tokens`	number	Max output tokens (Gemini native param).
`top_p`	number	Nucleus sampling threshold.
`top_k`	number	Top-k sampling.
`frequency_penalty`	number	Penalize repeated tokens.
`presence_penalty`	number	Penalize tokens that appeared at all.
`seed`	number	Random seed for deterministic output (OpenAI only).
`stop`	string[]	Stop sequences — model stops generating when it produces any of these.
`response_format`	JSON	Response format (e.g., `{"type": "json_object"}`).
`logit_bias`	JSON	Per-token bias.
`reasoning`	JSON	OpenAI reasoning config object. Usually set via `reasoning.*` settings instead.