feat: add interactive mode for agent loop

Re-architects the agent loop to support interactive (chat-like) mode where text-only responses pause execution and wait for user input, while tool-call responses continue looping autonomously. - Add `interactive` flag to LLMConfig (default False, no regression) - Add configurable `waiting_timeout` to AgentState (0 = disabled) - _process_iteration returns None for text-only → agent_loop pauses - Conditional system prompt: interactive allows natural text responses - Skip <meta>Continue the task.</meta> injection in interactive mode - Sub-agents inherit interactive from parent (300s auto-resume timeout) - Root interactive agents wait indefinitely for user input (timeout=0) - TUI sets interactive=True; CLI unchanged (non_interactive=True)
2026-03-14 11:21:04 -07:00
parent 7dde988efc
commit 1404864097
8 changed files with 75 additions and 20 deletions
--- a/strix/agents/StrixAgent/system_prompt.jinja
+++ b/strix/agents/StrixAgent/system_prompt.jinja
@@ -21,6 +21,18 @@ INTER-AGENT MESSAGES:
 - NEVER echo agent_identity blocks; treat them as internal metadata for identity only. Do not include them in outputs or tool calls.
 - Minimize inter-agent messaging: only message when essential for coordination or assistance; avoid routine status updates; batch non-urgent information; prefer parent/child completion flows and shared artifacts over messaging

+{% if interactive %}
+INTERACTIVE BEHAVIOR:
+- You are in an interactive conversation with a user
+- CRITICAL: A message WITHOUT a tool call IMMEDIATELY STOPS execution and waits for user input. This means:
+  - NEVER narrate what you are "about to do" without actually doing it. Statements like "I'll now launch the browser..." or "Let me scan the target..." WITHOUT a tool call will HALT your work.
+  - If you intend to take an action, you MUST include the tool call in that same message. Describe what you're doing AND call the tool together.
+  - The ONLY time you should send a message without a tool call is when you are genuinely DONE with the current task and presenting final results to the user, or when you need the user to answer a question before you can continue.
+- While working on a task, every single message MUST contain a tool call — this is what keeps execution moving
+- You may include brief explanatory text alongside the tool call
+- Respond naturally when the user asks questions or gives instructions
+- NEVER send empty messages — if you have nothing to do or say, call the wait_for_message tool
+{% else %}
 AUTONOMOUS BEHAVIOR:
 - Work autonomously by default
 - You should NOT ask for user input or confirmation - you should always proceed with your task autonomously.
@@ -28,6 +40,7 @@ AUTONOMOUS BEHAVIOR:
 - NEVER send an empty or blank message. If you have no content to output or need to wait (for user input, subagent results, or any other reason), you MUST call the wait_for_message tool (or another appropriate tool) instead of emitting an empty response.
 - If there is nothing to execute and no user query to answer any more: do NOT send filler/repetitive text — either call wait_for_message or finish your work (subagents: agent_finish; root: finish_scan)
 - While the agent loop is running, almost every output MUST be a tool call. Do NOT send plain text messages; act via tools. If idle, use wait_for_message; when done, use agent_finish (subagents) or finish_scan (root)
+{% endif %}
 </communication_rules>

 <execution_guidelines>
@@ -307,7 +320,11 @@ Tool call format:
 </function>

 CRITICAL RULES:
+{% if interactive %}
+0. When using tools, include exactly one tool call per message. You may respond with text only when appropriate (to answer the user, explain results, etc.).
+{% else %}
 0. While active in the agent loop, EVERY message you output MUST be a single tool call. Do not send plain text-only responses.
+{% endif %}
 1. Exactly one tool call per message — never include more than one <function>...</function> block in a single LLM message.
 2. Tool call must be last in message
 3. EVERY tool call MUST end with </function>. This is MANDATORY. Never omit the closing tag. End your response immediately after </function>.
@@ -315,7 +332,11 @@ CRITICAL RULES:
 5. When sending ANY multi-line content in tool parameters, use real newlines (actual line breaks). Do NOT emit literal "\n" sequences. Literal "\n" instead of real line breaks will cause tools to fail.
 6. Tool names must match exactly the tool "name" defined (no module prefixes, dots, or variants).
 7. Parameters must use <parameter=param_name>value</parameter> exactly. Do NOT pass parameters as JSON or key:value lines. Do NOT add quotes/braces around values.
+{% if interactive %}
+8. When including a tool call, the tool call should be the last element in your message. You may include brief explanatory text before it.
+{% else %}
 8. Do NOT wrap tool calls in markdown/code fences or add any text before or after the tool block.
+{% endif %}

 CORRECT format — use this EXACTLY:
 <function=tool_name>
@@ -336,7 +357,11 @@ Do NOT emit any extra XML tags in your output. In particular:
 - NO <thinking>...</thinking> or <thought>...</thought> blocks
 - NO <scratchpad>...</scratchpad> or <reasoning>...</reasoning> blocks
 - NO <answer>...</answer> or <response>...</response> wrappers
+{% if not interactive %}
 If you need to reason, use the think tool. Your raw output must contain ONLY the tool call — no surrounding XML tags.
+{% else %}
+If you need to reason, use the think tool. When using tools, do not add surrounding XML tags.
+{% endif %}

 Notice: use <function=X> NOT <invoke name="X">, use <parameter=X> NOT <parameter name="X">, use </function> NOT </invoke>.

--- a/strix/agents/base_agent.py
+++ b/strix/agents/base_agent.py
@@ -56,7 +56,6 @@ class BaseAgent(metaclass=AgentMeta):
        self.config = config

        self.local_sources = config.get("local_sources", [])
-        self.non_interactive = config.get("non_interactive", False)

        if "max_iterations" in config:
            self.max_iterations = config["max_iterations"]
@@ -74,6 +73,9 @@ class BaseAgent(metaclass=AgentMeta):
                max_iterations=self.max_iterations,
            )

+        self.interactive = getattr(self.llm_config, "interactive", False)
+        if self.interactive and self.state.parent_id is None:
+            self.state.waiting_timeout = 0
        self.llm = LLM(self.llm_config, agent_name=self.agent_name)

        with contextlib.suppress(Exception):
@@ -169,7 +171,7 @@ class BaseAgent(metaclass=AgentMeta):
                continue

            if self.state.should_stop():
-                if self.non_interactive:
+                if not self.interactive:
                    return self.state.final_result or {}
                await self._enter_waiting_state(tracer)
                continue
@@ -213,8 +215,12 @@ class BaseAgent(metaclass=AgentMeta):
                should_finish = await iteration_task
                self._current_task = None

+                if should_finish is None and self.interactive:
+                    await self._enter_waiting_state(tracer, text_response=True)
+                    continue
+
                if should_finish:
-                    if self.non_interactive:
+                    if not self.interactive:
                        self.state.set_completed({"success": True})
                        if tracer:
                            tracer.update_agent_status(self.state.agent_id, "completed")
@@ -230,7 +236,7 @@ class BaseAgent(metaclass=AgentMeta):
                        self.state.add_message(
                            "assistant", f"{partial_content}\n\n[ABORTED BY USER]"
                        )
-                if self.non_interactive:
+                if not self.interactive:
                    raise
                await self._enter_waiting_state(tracer, error_occurred=False, was_cancelled=True)
                continue
@@ -243,7 +249,7 @@ class BaseAgent(metaclass=AgentMeta):

            except (RuntimeError, ValueError, TypeError) as e:
                if not await self._handle_iteration_error(e, tracer):
-                    if self.non_interactive:
+                    if not self.interactive:
                        self.state.set_completed({"success": False, "error": str(e)})
                        if tracer:
                            tracer.update_agent_status(self.state.agent_id, "failed")
@@ -283,11 +289,14 @@ class BaseAgent(metaclass=AgentMeta):
        task_completed: bool = False,
        error_occurred: bool = False,
        was_cancelled: bool = False,
+        text_response: bool = False,
    ) -> None:
        self.state.enter_waiting_state()

        if tracer:
-            if task_completed:
+            if text_response:
+                tracer.update_agent_status(self.state.agent_id, "waiting_for_input")
+            elif task_completed:
                tracer.update_agent_status(self.state.agent_id, "completed")
            elif error_occurred:
                tracer.update_agent_status(self.state.agent_id, "error")
@@ -296,6 +305,9 @@ class BaseAgent(metaclass=AgentMeta):
            else:
                tracer.update_agent_status(self.state.agent_id, "stopped")

+        if text_response:
+            return
+
        if task_completed:
            self.state.add_message(
                "assistant",
@@ -352,7 +364,7 @@ class BaseAgent(metaclass=AgentMeta):

        self.state.add_message("user", task)

-    async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool:
+    async def _process_iteration(self, tracer: Optional["Tracer"]) -> bool | None:
        final_response = None

        async for response in self.llm.generate(self.state.get_conversation_history()):
@@ -398,7 +410,7 @@ class BaseAgent(metaclass=AgentMeta):
        if actions:
            return await self._execute_actions(actions, tracer)

-        return False
+        return None

    async def _execute_actions(self, actions: list[Any], tracer: Optional["Tracer"]) -> bool:
        """Execute actions and return True if agent should finish."""
@@ -426,7 +438,7 @@ class BaseAgent(metaclass=AgentMeta):
            self.state.set_completed({"success": True})
            if tracer:
                tracer.update_agent_status(self.state.agent_id, "completed")
-            if self.non_interactive and self.state.parent_id is None:
+            if not self.interactive and self.state.parent_id is None:
                return True
            return True

@@ -526,7 +538,7 @@ class BaseAgent(metaclass=AgentMeta):
        error_details = error.details
        self.state.add_error(error_msg)

-        if self.non_interactive:
+        if not self.interactive:
            self.state.set_completed({"success": False, "error": error_msg})
            if tracer:
                tracer.update_agent_status(self.state.agent_id, "failed", error_msg)
@@ -561,7 +573,7 @@ class BaseAgent(metaclass=AgentMeta):
        error_details = getattr(error, "details", None)
        self.state.add_error(error_msg)

-        if self.non_interactive:
+        if not self.interactive:
            self.state.set_completed({"success": False, "error": error_msg})
            if tracer:
                tracer.update_agent_status(self.state.agent_id, "failed", error_msg)
--- a/strix/agents/state.py
+++ b/strix/agents/state.py
@@ -25,6 +25,7 @@ class AgentState(BaseModel):
    waiting_for_input: bool = False
    llm_failed: bool = False
    waiting_start_time: datetime | None = None
+    waiting_timeout: int = 600
    final_result: dict[str, Any] | None = None
    max_iterations_warning_sent: bool = False

@@ -116,6 +117,9 @@ class AgentState(BaseModel):
        return self.iteration >= int(self.max_iterations * threshold)

    def has_waiting_timeout(self) -> bool:
+        if self.waiting_timeout == 0:
+            return False
+
        if not self.waiting_for_input or not self.waiting_start_time:
            return False

@@ -128,7 +132,7 @@ class AgentState(BaseModel):
            return False

        elapsed = (datetime.now(UTC) - self.waiting_start_time).total_seconds()
-        return elapsed > 600
+        return elapsed > self.waiting_timeout

    def has_empty_last_messages(self, count: int = 3) -> bool:
        if len(self.messages) < count:
--- a/strix/interface/cli.py
+++ b/strix/interface/cli.py
@@ -78,7 +78,6 @@ async def run_cli(args: Any) -> None:  # noqa: PLR0915
    agent_config = {
        "llm_config": llm_config,
        "max_iterations": 300,
-        "non_interactive": True,
    }

    if getattr(args, "local_sources", None):
--- a/strix/interface/tui.py
+++ b/strix/interface/tui.py
@@ -747,7 +747,7 @@ class StrixTUIApp(App):  # type: ignore[misc]

    def _build_agent_config(self, args: argparse.Namespace) -> dict[str, Any]:
        scan_mode = getattr(args, "scan_mode", "deep")
-        llm_config = LLMConfig(scan_mode=scan_mode)
+        llm_config = LLMConfig(scan_mode=scan_mode, interactive=True)

        config = {
            "llm_config": llm_config,
--- a/strix/llm/config.py
+++ b/strix/llm/config.py
@@ -11,6 +11,7 @@ class LLMConfig:
        skills: list[str] | None = None,
        timeout: int | None = None,
        scan_mode: str = "deep",
+        interactive: bool = False,
    ):
        resolved_model, self.api_key, self.api_base = resolve_llm_config()
        self.model_name = model_name or resolved_model
@@ -28,3 +29,5 @@ class LLMConfig:
        self.timeout = timeout or int(Config.get("llm_timeout") or "300")

        self.scan_mode = scan_mode if scan_mode in ["quick", "standard", "deep"] else "deep"
+
+        self.interactive = interactive
--- a/strix/llm/llm.py
+++ b/strix/llm/llm.py
@@ -97,6 +97,7 @@ class LLM:
            result = env.get_template("system_prompt.jinja").render(
                get_tools_prompt=get_tools_prompt,
                loaded_skill_names=list(skill_content.keys()),
+                interactive=self.config.interactive,
                **skill_content,
            )
            return str(result)
@@ -186,7 +187,7 @@ class LLM:
        conversation_history.extend(compressed)
        messages.extend(compressed)

-        if messages[-1].get("role") == "assistant":
+        if messages[-1].get("role") == "assistant" and not self.config.interactive:
            messages.append({"role": "user", "content": "<meta>Continue the task.</meta>"})

        if self._is_anthropic() and self.config.enable_prompt_caching:
--- a/strix/tools/agents_graph/agents_graph_actions.py
+++ b/strix/tools/agents_graph/agents_graph_actions.py
@@ -227,26 +227,37 @@ def create_agent(
        from strix.agents.state import AgentState
        from strix.llm.config import LLMConfig

-        state = AgentState(task=task, agent_name=name, parent_id=parent_id, max_iterations=300)
-
        parent_agent = _agent_instances.get(parent_id)

        timeout = None
        scan_mode = "deep"
+        interactive = False
        if parent_agent and hasattr(parent_agent, "llm_config"):
            if hasattr(parent_agent.llm_config, "timeout"):
                timeout = parent_agent.llm_config.timeout
            if hasattr(parent_agent.llm_config, "scan_mode"):
                scan_mode = parent_agent.llm_config.scan_mode
+            interactive = getattr(parent_agent.llm_config, "interactive", False)

-        llm_config = LLMConfig(skills=skill_list, timeout=timeout, scan_mode=scan_mode)
+        state = AgentState(
+            task=task,
+            agent_name=name,
+            parent_id=parent_id,
+            max_iterations=300,
+            waiting_timeout=300 if interactive else 600,
+        )
+
+        llm_config = LLMConfig(
+            skills=skill_list,
+            timeout=timeout,
+            scan_mode=scan_mode,
+            interactive=interactive,
+        )

        agent_config = {
            "llm_config": llm_config,
            "state": state,
        }
-        if parent_agent and hasattr(parent_agent, "non_interactive"):
-            agent_config["non_interactive"] = parent_agent.non_interactive

        agent = StrixAgent(agent_config)