fix(screentracker): make writeStabilize Phase 1 non-fatal when agents don't echo input#208
fix(screentracker): make writeStabilize Phase 1 non-fatal when agents don't echo input#208
Conversation
…agents don't echo input Phase 1 of writeStabilize waited 15s for the screen to change after writing message text (echo detection), returning HTTP 500 if it didn't. Many TUI agents using bracketed paste don't echo input until Enter is pressed, causing every message send to fail. Phase 1 timeout is now non-fatal (2s) — if the screen doesn't change, we log at Info level and proceed to Phase 2 (send carriage return). Phase 2 remains the real indicator of agent responsiveness. Key changes: - Guard non-fatal path with errors.Is(err, util.WaitTimedOut) so context cancellation still propagates as a fatal error - Reduce Phase 1 timeout from 15s to 2s (echo is near-instant) - Extract named constants for both timeouts - Add tests for no-echo-success and no-echo-no-react-failure - Add documentation test for TUI selection prompt ESC limitation Closes #123
|
✅ Preview binaries are ready! To test with modules: |
…tests Restructure test comments to follow Cucumber-style Given/When/Then pattern for clarity. Also fix send-no-echo-agent-reacts assertion to scan for the user message instead of assuming it's the last message in the conversation (the snapshot loop may append an agent message after Send returns).
mafredri
left a comment
There was a problem hiding this comment.
Clean design. Making Phase 1 non-fatal for WaitTimedOut while preserving context cancellation as fatal is the right call. The errors.Is guard, extracted constants, and 2s timeout are all well-calibrated. Two P2 findings (missing test coverage for the key invariant, gofmt failure), two P3s (doc accuracy, test layering), and a handful of notes.
"Oh, this test suite looks lovely! Fifty rows, full coverage, green across the board. It's fake. Every row hits the same code path. You dressed up one test in fifty outfits." -- Bisky, on a different test. The new tests here are mostly genuine.
Severity count: 0 P0, 0 P1, 2 P2, 2 P3, 3 Notes.
🤖 This review was automatically generated with Coder Agents.
- Add send-message-no-echo-context-cancelled test: verifies the errors.Is(WaitTimedOut) guard by cancelling context during Phase 1 and asserting context.Canceled propagates (P2) - Fix gofmt: correct indentation, proper brace placement (P2) - Fix constant comment: describe WaitFor timeout semantics accurately, note 1s stability check can extend past timeout, add TODO tag (P3) - Drop send-tui-selection-esc-cancels test from screentracker, add ESC limitation comment to formatPaste in claude.go instead (P3) - Shorten log message to match codebase style (Note) - Rename tests to send-message-* prefix, use newConversation helper with opts callbacks (Note)
The test had a race: advanceFor could complete before the Send() goroutine enqueued, so the stableSignal never fired, and sendCancel ran while the message was still queued (never reaching writeStabilize). Fix: use onWrite callback as a synchronization point. advanceUntil waits for writeStabilize to start writing (onWrite fires), then cancels. This guarantees Phase 1 WaitFor is running when ctx is cancelled, and its sleep select sees ctx.Done() immediately.
There was a problem hiding this comment.
Pull request overview
This PR updates the screentracker PTY conversation send pipeline to avoid failing requests when an agent doesn’t echo typed input during writeStabilize Phase 1 (echo detection), and adds tests to codify the new behavior.
Changes:
- Make
writeStabilizePhase 1 (echo detection) timeout non-fatal and proceed to Phase 2 (processing detection), while still propagating context cancellation. - Reduce Phase 1 timeout and extract Phase 1/2 timeouts into constants.
- Add new test coverage for non-echoing agents (reacting vs unresponsive) and context-cancellation behavior; add clarifying documentation comment about bracketed paste and TUI selection cancellation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| lib/screentracker/pty_conversation.go | Makes echo-detection timeout non-fatal, adds timeouts as constants, and adjusts logging/error handling. |
| lib/screentracker/pty_conversation_test.go | Adds tests for non-echoing agents, unresponsive agents, and context cancellation during Phase 1. |
| lib/httpapi/claude.go | Documents bracketed-paste ESC interaction and suggests MessageTypeRaw for TUI selection prompts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fixes #123.
Changes
writeStabilizenon-fatal on timeout — proceed to Phase 2 instead of returning HTTP 500errors.Is(err, util.WaitTimedOut)so context cancellation still propagateswriteStabilizeEchoTimeoutandwriteStabilizeProcessTimeoutconstantssend-message-no-echo-agent-reactstest: agent does not echo but reacts to Enter → successsend-message-no-echo-no-reacttest: agent is unresponsive → error from Phase 2send-message-no-echo-context-cancelledtest: context cancellation during Phase 1 propagates as fatal (validateserrors.Isguard)formatPasteinclaude.godocumenting the ESC limitation with TUI selection promptsKnown limitation
For TUI selection prompts (numbered/arrow-key lists), this fix eliminates the 500 but does not deliver the correct selection — the
\x1b(ESC) in bracketed paste cancels the selection widget. The correct approach isMessageTypeRaw. Documented via a comment onformatPasteinlib/httpapi/claude.go.Also discovered a separate issue during smoke-testing: #209
Implementation plan and decision log
Root cause
writeStabilizePhase 1 assumes the screen will change after writing message text (echo detection). TUI agents using bracketed paste buffer input internally and do not render until Enter. Phase 1 waited 15s for a change that never came → timeout → HTTP 500.Key decisions
WaitTimedOutctx.Err()must still propagate — otherwise context cancellation logs a misleading warning and writes a spurious\rtimeoutfield carries the duration.Behavioral changes