pgadmin4/web/pgadmin/llm/tests
Dave Page 990b4f5e54
Fix an issue where the AI Assistant was not retaining conversation context between messages, with chat history compaction to manage token budgets.
* Address CodeRabbit review feedback for chat context and compaction.

- Track tool-use turns as groups instead of one-to-one pairs, so
  multi-tool assistant messages don't leave orphaned results.
- Add fallback to shrink the recent window when protected messages
  alone exceed the token budget, preventing compaction no-ops.
- Fix low-value test fixtures to keep transient messages short so
  they actually classify as low-importance.
- Guard Clear button against in-flight stream race conditions by
  adding a clearedRef flag and cancelling active streams.
- Assert that conversation history is actually passed through to
  chat_with_database in the "With History" test.

* Address remaining CodeRabbit review feedback for compaction module.

- Expand protected set to cover full tool groups, preventing orphaned
  tool call/result messages when a turn straddles the recent window.
- Add input validation in deserialize_history() for non-list/non-dict data.
- Strengthen test assertion for preserved recent window tail.


* Fix CI test failures in compaction and NLQ chat tests.

- Lower max_tokens budget in test_drops_low_value to reliably force
  compaction (500 was borderline, use 200).
- Consume SSE response data before asserting mock calls in NLQ chat
  test, since Flask's streaming generator only executes on iteration.


* Clarify mock patch target in NLQ chat test.

Add comment explaining why we patch the source module rather than the
use site: the endpoint uses a local import inside the function body,
so there is no module-level binding to patch.
2026-03-16 19:02:36 +05:30
..
README.md Fix an issue where the AI Assistant was not retaining conversation context between messages, with chat history compaction to manage token budgets. 2026-03-16 19:02:36 +05:30
__init__.py Core LLM integration infrastructure to allow pgAdmin to connect to AI providers. #9641 2026-02-17 17:16:06 +05:30
test_compaction.py Fix an issue where the AI Assistant was not retaining conversation context between messages, with chat history compaction to manage token budgets. 2026-03-16 19:02:36 +05:30
test_llm_status.py Core LLM integration infrastructure to allow pgAdmin to connect to AI providers. #9641 2026-02-17 17:16:06 +05:30
test_report_endpoints.py Core LLM integration infrastructure to allow pgAdmin to connect to AI providers. #9641 2026-02-17 17:16:06 +05:30

README.md

LLM Module Tests

This directory contains comprehensive tests for the pgAdmin LLM/AI functionality.

Test Files

Python Tests

test_client.py - LLM Client Tests

Tests the core LLM client functionality including:

  • Provider initialization (Anthropic, OpenAI, Ollama)
  • API key loading from files and environment variables
  • Graceful handling of missing API keys
  • User preference overrides
  • Provider selection logic
  • Whitespace handling in API keys

Key Features:

  • Tests pass even without API keys configured
  • Mocks external API calls
  • Tests all three provider types

test_reports.py - Report Generation Tests

Tests report generation functionality including:

  • Security, performance, and design report types
  • Server, database, and schema level reports
  • Report request validation
  • Progress callback functionality
  • Error handling during generation
  • Markdown formatting

Key Features:

  • Tests data collection from PostgreSQL
  • Validates report structure
  • Tests streaming progress updates

test_chat.py - Chat Session Tests

Tests interactive chat functionality including:

  • Chat session initialization
  • Message history management
  • Context passing (database, SQL queries)
  • Streaming responses
  • Token counting for context management
  • Maximum history limits
  • Error handling

Key Features:

  • Tests conversation flow
  • Validates context integration
  • Tests memory management

test_compaction.py - Conversation Compaction Tests

Tests the conversation history compaction module including:

  • Token estimation with provider-specific ratios
  • SQL content token multiplier
  • History compaction with token budget enforcement
  • First message and recent window preservation
  • Low-value message dropping by importance classification
  • Tool call/result pair integrity during compaction
  • History deserialization from frontend JSON format
  • Conversational message filtering (stripping tool internals)

Key Features:

  • Tests all five importance classification tiers
  • Validates tool pair preservation (no orphaned tool results)
  • Tests round-trip serialization/deserialization
  • Tests edge cases (empty history, within-budget, unknown roles)

test_views.py - API Endpoint Tests

Tests Flask endpoints including:

  • /llm/status - LLM availability check
  • /llm/reports/security/* - Security report endpoints
  • /llm/reports/performance/* - Performance report endpoints
  • /llm/reports/design/* - Design review endpoints
  • /llm/chat - Chat endpoint
  • Streaming endpoints with SSE

Key Features:

  • Tests authentication and permissions
  • Tests API error responses
  • Tests SSE streaming format

JavaScript Tests

AIReport.spec.js - AIReport Component Tests

Tests the React component for AI report display including:

  • Component rendering in light and dark modes
  • Theme detection from body styles
  • Progress display during generation
  • Error handling
  • Markdown rendering
  • Download functionality
  • SSE event handling
  • Support for all report categories and types

Key Features:

  • Tests with React Testing Library
  • Mocks EventSource for SSE
  • Tests theme transitions
  • Validates accessibility

Running the Tests

Python Tests

From the web directory:

# Run all LLM tests
python -m pytest pgadmin/llm/tests/

# Run specific test file
python -m pytest pgadmin/llm/tests/test_client.py

# Run specific test case
python -m pytest pgadmin/llm/tests/test_client.py::LLMClientTestCase::test_anthropic_provider_with_api_key

# Run with coverage
python -m pytest --cov=pgadmin/llm pgadmin/llm/tests/

JavaScript Tests

From the web directory:

# Run all JavaScript tests
yarn run test:karma

# Run specific test file
yarn run test:karma -- --file regression/javascript/llm/AIReport.spec.js

Test Coverage

What's Tested

LLM client initialization with all providers API key loading from files and environment Graceful handling of missing API keys User preference overrides Report generation for all categories (security, performance, design) Report generation for all levels (server, database, schema) Chat session management and history Conversation history compaction and token budgets Conversational message filtering History serialization/deserialization round-trip Streaming progress updates via SSE API endpoint authentication and authorization React component rendering in both themes Dark mode text color detection Error handling throughout the stack

What's Mocked

  • External LLM API calls (Anthropic, OpenAI, Ollama)
  • PostgreSQL database connections
  • File system access for API keys
  • EventSource for SSE streaming
  • Theme detection (window.getComputedStyle)

Environment Variables for Testing

These environment variables can be set for integration testing with real APIs:

# For Anthropic
export ANTHROPIC_API_KEY="your-api-key"

# For OpenAI
export OPENAI_API_KEY="your-api-key"

# For Ollama
export OLLAMA_API_URL="http://localhost:11434"

Note: Tests are designed to pass without these variables set. They will mock API responses when keys are not available.

Test Philosophy

  1. Graceful Degradation: All tests pass even without API keys configured
  2. Mocking by Default: External APIs are mocked to avoid dependencies
  3. Comprehensive Coverage: Tests cover happy paths, error cases, and edge cases
  4. Documentation: Tests serve as documentation for expected behavior
  5. Integration Ready: Tests can be run with real APIs when keys are provided

Adding New Tests

When adding new functionality to the LLM module:

  1. Add unit tests to the appropriate test file
  2. Mock external dependencies
  3. Test both success and failure cases
  4. Test with and without API keys/configuration
  5. Update this README with new test coverage

Troubleshooting

Common Issues

Import errors: Make sure you're running tests from the web directory

API key warnings: These are expected - tests should pass without API keys

Theme mocking errors: Ensure fake_theme.js is available in regression/javascript/

EventSource not found: This is mocked in JavaScript tests, ensure mocks are properly set up