* Address CodeRabbit review feedback for chat context and compaction.
- Track tool-use turns as groups instead of one-to-one pairs, so
multi-tool assistant messages don't leave orphaned results.
- Add fallback to shrink the recent window when protected messages
alone exceed the token budget, preventing compaction no-ops.
- Fix low-value test fixtures to keep transient messages short so
they actually classify as low-importance.
- Guard Clear button against in-flight stream race conditions by
adding a clearedRef flag and cancelling active streams.
- Assert that conversation history is actually passed through to
chat_with_database in the "With History" test.
* Address remaining CodeRabbit review feedback for compaction module.
- Expand protected set to cover full tool groups, preventing orphaned
tool call/result messages when a turn straddles the recent window.
- Add input validation in deserialize_history() for non-list/non-dict data.
- Strengthen test assertion for preserved recent window tail.
* Fix CI test failures in compaction and NLQ chat tests.
- Lower max_tokens budget in test_drops_low_value to reliably force
compaction (500 was borderline, use 200).
- Consume SSE response data before asserting mock calls in NLQ chat
test, since Flask's streaming generator only executes on iteration.
* Clarify mock patch target in NLQ chat test.
Add comment explaining why we patch the source module rather than the
use site: the endpoint uses a local import inside the function body,
so there is no module-level binding to patch.