Commit Graph

15 Commits (sql-difference-fn)

Author SHA1 Message Date
Jason Stirnaman c2093c8212
Feature: Generate documentation in LLM-friendly Markdown (#6555)
* feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.

This enables LLM-friendly documentation for entire sections,
allowing users to copy complete documentation sections with a single click.

Lambda@Edge now generates .md files on-demand with:
- Evaluated Hugo shortcodes
- Proper YAML frontmatter with product metadata
- Clean markdown without UI elements
- Section aggregation (parent + children in single file)

The llms.txt files are now generated automatically during build from
content structure and product metadata in data/products.yml, eliminating
the need for hardcoded files and ensuring maintainability.

**Testing**:
- Automated markdown generation in test setup via cy.exec()
- Implement dynamic content validation that extracts HTML content and
  verifies it appears in markdown version

**Documentation**:
Documents LLM-friendly markdown generation

**Details**:
Add gzip decompression for S3 HTML files in Lambda markdown generator

HTML files stored in S3 are gzip-compressed but the Lambda was attempting
to parse compressed data as UTF-8, causing JSDOM to fail to find article
elements. This resulted in 404 errors for .md and .section.md requests.

- Add zlib gunzip decompression in s3-utils.js fetchHtmlFromS3()
- Detect gzip via ContentEncoding header or magic bytes (0x1f 0x8b)
- Add configurable DEBUG constant for verbose logging
- Add debug logging for buffer sizes and decompression in both files

The decompression adds ~1-5ms per request but is necessary to parse
HTML correctly. CloudFront caching minimizes Lambda invocations.

Await async markdown conversion functions

The convertToMarkdown and convertSectionToMarkdown functions are async
but weren't being awaited, causing the Lambda to return a Promise object
instead of a string. This resulted in CloudFront validation errors:
"The body is not a string, is not an object, or exceeds the maximum size"

**Troubleshooting**:

- Set DEBUG for troubleshooting in lambda

* feat(llms): Add build-time LLM-friendly Markdown generation

Implements static Markdown generation during Hugo build.

**Key Features:**
- Two-phase generation: HTML→MD (memory-bounded), MD→sections (fast)
- Automatic redirect detection via file size check (skips Hugo aliases)
- Product detection using compiled TypeScript product-mappings module
- Token estimation for LLM context planning (4 chars/token heuristic)
- YAML serialization with description sanitization

**Performance:**
- ~105 seconds for 5,000 pages + 500 sections
- ~300MB peak memory (safe for 2GB CircleCI environment)
- 23 files/sec conversion rate with controlled concurrency

**Configuration Parameters:**
- MIN_HTML_SIZE_BYTES (default: 1024) - Skip files below threshold
- CHARS_PER_TOKEN (default: 4) - Token estimation ratio
- Concurrency: 10 workers (CI), 20 workers (local)

**Output:**
- Single pages: public/*/index.md (with frontmatter + content)
- Section bundles: public/*/index.section.md (aggregated child pages)

**Files Changed:**
- scripts/build-llm-markdown.js (new) - Main build script
- scripts/lib/markdown-converter.cjs (renamed from .js) - Core conversion
- scripts/html-to-markdown.js - Updated import path
- package.json - Updated exports for .cjs module

Related: Replaces Lambda@Edge on-demand generation (5s response time)
with build-time static generation for production deployment.

feat(deploy): Add staging deployment workflow and update CI

Integrates LLM markdown generation into deployment workflows with
a complete staging deployment solution.

**CircleCI Updates:**
- Switch from legacy html-to-markdown.js to optimized build:md
- 2x performance improvement (105s vs 200s+ for 5000 pages)
- Better memory management (300MB vs variable)
- Enables section bundle generation (index.section.md files)

**Staging Deployment:**
- New scripts/deploy-staging.sh for local staging deploys
- Complete workflow: Hugo build → markdown gen → S3 upload
- Environment variable driven configuration
- Optional step skipping for faster iteration
- CloudFront cache invalidation support

**NPM Scripts:**
- Added deploy:staging command for convenience
- Wraps deploy-staging.sh script

**Documentation:**
- Updated DOCS-DEPLOYING.md with comprehensive guide
- Merged staging/production workflows with Lambda@Edge docs
- Build-time generation now primary, Lambda@Edge fallback
- Troubleshooting section with common issues
- Environment variable reference
- Performance metrics and optimization tips

**Benefits:**
- Manual staging validation before production
- Consistent markdown generation across environments
- Faster CI builds with optimized script
- Better error handling and progress reporting
- Section aggregation for improved LLM context

**Usage:**
```bash
export STAGING_BUCKET="test2.docs.influxdata.com"
export AWS_REGION="us-east-1"
export STAGING_CF_DISTRIBUTION_ID="E1XXXXXXXXXX"

yarn deploy:staging
```

Related: Completes build-time markdown generation implementation

refactor: Remove Lambda@Edge implementation

Build-time markdown generation has replaced Lambda@Edge on-demand
generation as the primary method. Removed Lambda code and updated
documentation to focus on build-time generation and testing.

Removed:
- deploy/llm-markdown/ directory (Lambda@Edge code)
- Lambda@Edge section from DOCS-DEPLOYING.md

Added:
- Testing and Validation section in DOCS-DEPLOYING.md
- Focus on build-time generation workflow

* feat: Add Rust HTML-to-Markdown prototype

Implements core markdown-converter.cjs functions in Rust for performance comparison.

Performance results:
- Rust: ~257 files/sec (10× faster)
- JavaScript: ~25 files/sec average

Recommendation: Keep JavaScript for now, implement incremental builds first.
Rust migration provides 10× speedup but requires 3-4 weeks integration effort.

Files:
- Cargo.toml: Rust dependencies (html2md, scraper, serde_yaml, clap)
- src/main.rs: Core conversion logic + CLI benchmark tool
- benchmark-comparison.js: Side-by-side performance testing
- README.md: Comprehensive findings and recommendations

* fix(ui): improve dropdown positioning on viewport resize

- Ensure dropdown stays within viewport bounds (min 8px padding)
- Reposition dropdown on window resize and scroll events
- Clean up event listeners when dropdown closes

* chore(deps): add remark and unified packages for markdown processing

Add remark-parse, remark-frontmatter, remark-gfm, and unified for
enhanced markdown processing capabilities.

* fix(edge): add return to prevent trailing-slash redirect for valid extensions

Without the return statement, the Lambda@Edge function would continue
executing after the callback, eventually hitting the trailing-slash
redirect logic. This caused .md files to redirect to URLs with trailing
slashes, which returned 404 from S3.

* fix(md): add built-in product mappings and full URL support

- Add URL_PATTERN_MAP and PRODUCT_NAME_MAP constants directly in the
  CommonJS module (ESM product-mappings.js cannot be require()'d)
- Update generateFrontmatter() to accept baseUrl parameter and construct
  full URLs for the frontmatter url field
- Update generateSectionFrontmatter() similarly for section pages
- Update all call sites to pass baseUrl parameter

This fixes empty product fields and relative URLs in generated markdown
frontmatter when served via Lambda@Edge.

* feat(md): add environment flag for base URL control

Add -e, --env flag to html-to-markdown.js to control the base URL
in generated markdown frontmatter. This matches Hugo's -e flag behavior
and allows generating markdown with staging or production URLs.

Also update build-llm-markdown.js with similar environment support.

* feat(md): add Rust markdown converter and improve validation

- Add Rust-based HTML-to-Markdown converter with NAPI-RS bindings
- Update Cypress markdown validation tests
- Update deploy-staging.sh with force upload flag

* deploy-staging.sh:
  - Defaults STAGING_URL to https://test2.docs.influxdata.com
  if not set
  - Exports it so yarn build:md -e staging can use it
  - Displays it in the summary

* Delete scripts/prototypes/rust-markdown/benchmark-comparison.js

* Delete scripts/prototypes directory

* fix(llms): Include full URL for section page Markdown and list of child pages

* feat(llms): clarify format selector text for AI use case

Update button and dropdown text to make the AI/LLM purpose clearer:
- Button: "Copy page for AI" / "Copy section for AI"
- Sublabel: "Clean Markdown optimized for AI assistants"
- Section sublabel: "{N} pages combined as clean Markdown for AI assistants"

Cypress tests updated and passing (13/13).

---------

Co-authored-by: Scott Anderson <scott@influxdata.com>
2025-12-01 12:32:28 -06:00
Jason Stirnaman e93e78be0a feat(influxdb): Version detector shortcode triggers a modal
Creates an interactive InfluxDB version detector component in TypeScript and a shortcode that generates a button to trigger
the version detector modal.
The shortcode takes a parameter that displays a predefined set of links for results.
- Support URL pattern matching and ping header analysis
- Add questionnaire-based product identification logic
- Adds the shortcode in a note in /influxdb3/core/visualize-data/grafana/
- Set up TypeScript configuration for the project
  - Configure automatic TypeScript compilation in pre-commit hooks
- Add to Grafana documentation pages
- Remove last remnants of old Cypress link checker
- Add Cypress tests, but many are still broken

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestions from code review

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

Update layouts/shortcodes/influxdb-version-detector.html

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

Update assets/js/influxdb-version-detector.ts

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

Update assets/styles/components/_influxdb-version-detector.scss

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

Fixes:
- Fix Hugo template to include product names in detector config
- Change elimination scores from -100 to -1000 for proper filtering
- Add scoring logic for generic "InfluxDB" product (OSS v2.x)
- Exclude generic "InfluxDB" from results (too vague)
- Add comprehensive test scenario checklist to Cypress tests
- Free license now correctly excludes Enterprise, Clustered, Dedicated
- Self-hosted now correctly excludes all Cloud products
- SQL language now correctly excludes v1 and v2 products
- Results now show only specific products (OSS 1.x, OSS 2.x, etc.)
Changes:
- When users answer "I'm not sure" to all questions, show a helpful
message directing them to the reference table instead of showing
a weak ranking with low confidence.
- Detect when all questionnaire answers are "unknown"
- Display custom message explaining lack of information
- Auto-expand reference table for easy product identification
- Hide ranked results when insufficient information provided
- Make product names clickable in the quick reference table to allow
users to quickly navigate to product documentation after identifying
their InfluxDB version.
2025-09-30 19:01:21 -05:00
Jason Stirnaman 717ec5cd1d chore(js): Extract Hugo params imports to single-purpose modules, fix environment-specific Hugo configs, use Hugo environments instead of specifying the config file, configure source maps and ESM for development and testing 2025-06-09 16:46:26 -05:00
Jason Stirnaman cd087af85d chore(js): Format fixes 2025-06-09 14:42:38 -05:00
Jason Stirnaman 4973026adf Merge pull request #6079 from influxdata/chore-js-refactor-footer-scripts-modules
Chore: JavaScript: refactor footer scripts modules
2025-06-09 14:40:37 -05:00
Jason Stirnaman f819f48de9
Revert "Chore: JavaScript: refactor footer scripts modules" 2025-06-05 09:46:37 -05:00
Jason Stirnaman 4bbf1d7094 Improve DocSearch loading
1. Refactor JavaScript from search.html into a new DocSearch Component Module. The component needs to include: - Asynchronous loading of Algolia docsearch.min.js - Proper Algolia search facets:

  - Nested arrays for AND conditions: In Algolia, filters in the same array are combined with OR, while arrays in an array are combined with AND
  - Space after colon: Algolia's parser expects this specific format
2. Update Search HTML Template: refactor the search.html to use the new component:

3. Register the Component in main.js: update main.js to import and register our new DocSearch component:

Improve SearchInteractions event handler coordination with DocSearch:

- Improve SearchInteractions to dynamically detect the dropdown added by Algolia DocSearch and ensure that blur and focus event handlers are registered after the dropdown loads.

Benefits of This Approach:

Asynchronous Loading: DocSearch.js is now loaded asynchronously, improving page load performance
Search functionality is encapsulated as a component
Clean Separation: HTML template only contains configuration data, not implementation
Event Communication: Components can react to DocSearch initialization via events
SearchInteractions dynamically detects dropdown: Instead of waiting for an event or using setTimeout, we actively observe DOM changes to detect when the dropdown is created.
Nested Observation: A second observer monitors changes to the dropdown itself, allowing code to respond to style changes.
Better Event Handling: The blur handler checks if the related target is inside the search components, preventing unwanted dropdown closing.
Resource Cleanup: The cleanup function disconnects all observers, preventing memory leaks.
Additional Interaction Control: Adds an event listener to the dropdown to prevent unwanted blur events when interacting with search results.
2025-06-02 13:05:02 -05:00
Jason Stirnaman f7a5c53ecf WIP: removed remaining scripts from footer/javascript.html. Migrated to components and utilities. Conditionally imports Mermaid if page has a diagram. 2025-06-02 12:51:52 -05:00
Jason Stirnaman ccf74df759 WIP: Refactored all scripts from footer.js into components. Removed footer.js. Exports jQuery ($) to a window global for use by other scripts. Scripts bundled into footer.js didn't reliably load after jQuery. 2025-06-02 12:51:52 -05:00
Jason Stirnaman 4cfff239f3 End-to-end testing, CI script, and JavaScript QoL improvements:
- **Environment variable formatting** - Updated environment variable configuration from array format to object format to comply with Lefthook schema validation requirements.
- **Unified link testing** - Consolidated multiple product-specific link testing commands into a single `e2e-links` command that processes all staged Markdown and HTML files across content directories.
- **Package script integration** - Modified commands to use centralized yarn scripts instead of direct execution, improving maintainability and consistency.
- **Source information extraction** - Enhanced to correctly extract and report source information from frontmatter.
- **URL and source mapping** - Improved handling of URL to source path mapping for better reporting.
- **Ignored anchor links configuration** - Added proper exclusion of behavior-triggering anchor links (like tab navigation) to prevent false positives.
- **Request options correction** - Fixed Cypress request options to ensure `failOnStatusCode` is properly set when `retryOnStatusCodeFailure` is enabled.
- **Improved error reporting** - Enhanced error reporting with more context about broken links.
- **New test scripts added** - Added centralized testing scripts for link checking and codeblock validation.
- **Product-specific test commands** - Added commands for each product version (InfluxDB v2, v3 Core, Enterprise, Cloud, etc.).
- **API docs testing** - Added specialized commands for testing API documentation links.
- **Comprehensive test runners** - Added commands to run all tests of a specific type (`test:links:all`, `test:codeblocks:all`).
- Fix Docker build command and update CONTRIBUTING.

chore(js): JavaScript QoL improvements:

- Refactor main.js with a componentRegistry object and clear initialization of components and globals
- Add a standard index.js with all necessary exports.
- Update javascript.html to use the index.js
- Remove jQuery script tag from header javascript.html (remains in footer)
- Update package file to improve module discovery.
- Improve Hugo and ESLint config for module discovery and ES6 syntax
2025-05-19 09:50:33 -05:00
Jason Stirnaman bf7b395b06 chore(code-placeholders): Support path-like patterns:
- Assigns the match expression to a data attribute (data-id) to use in eleement selection. Previously, it assigned the regex expression to input.id and used the ID to select the element when updating the placeholder value. A regex that starts with a slash causes an error; input#/path/to/match isn't a valid id.
- Supports placeholders like '\/path\/to\/file.parquet'
- Refactors code-placeholders to the component pattern.
2025-04-23 10:27:20 -05:00
Jason Stirnaman ae054efed7
Revert "chore(code-placeholders): Support path-like patterns:" 2025-04-23 10:21:02 -05:00
Jason Stirnaman 6287e0aa76 chore(code-placeholders): Support path-like patterns:
- Assigns the match expression to a data attribute (data-id) to use in eleement selection. Previously, it assigned the regex expression to input.id and used the ID to select the element when updating the placeholder value. A regex that starts with a slash causes an error; input#/path/to/match isn't a valid id.
- Supports placeholders like '\/path\/to\/file.parquet'
- Refactors code-placeholders to the component pattern.
2025-04-23 10:14:45 -05:00
Jason Stirnaman 5a8571915d fix(flux-influxdb-versions): Fixes modal trigger for flux-influxdb-versions and refactors the trigger to a component. 2025-03-27 20:10:13 -05:00
Jason Stirnaman f9f81ae8f9 Initial Kapa.ai chat integration.
Continue refactoring JavaScript into a component pattern and ESM.
Replaces some jQuery with native DOM API.

chore(ai): reference documentation and instructions for training AI

chore(ai): implement Kapa AI chat widget
- Move script tag to HTML template to make it obvious.
- Cleanup javascript to make it more component-like
- Set Kapa attributes, support setting userid

chore(js): add JS dependencies, previously referenced in script tags, to package.json for JS builds.

fix(api): indents

chore(js): package Mermaid diagram library

chore(js): refactor JS for AIChat and Theme as examples of using the component pattern for HTML/CSS/JS

chore(js): Use the new local-storage API in refactored module code and in code not yet ported. Cleanup syntax in local-storage and make functions available from window.LocalStorageAPI.

fix(js): theme.js name-change

chore(js): fix ai-chat.js file name

fix(js): refactor:
- componentNames are snakecase in HTML
- replace DOM selection method and jQuery eventhandler assignment
- remove old theme.js references

chore(ai): configure chat window overlay, size, and position:
- removes overlay and scroll lock
- positions chat to the right and bottom
- expands sample question width to 12 cols

chore(ai): edit disclaimer

fix(ai): size and position

chore(js): make ai-chat specific to configuration and and setting userid (for testing and future use).

fix(js): copy referrerHost variable to v3-wayfinding instead of relying on influxdb-url to assign it.

chore(ai): add a footer div at page bottom to contain modal triggers for custom-time and ask-ai. Still needs some CSS help. Moves tooltip text from CSS to HTML data attribute.

chore(ai): dynamically load AI script tag after DOMContentLoaded to avoid race conditions. Call initialization from the modal trigger module and pass the show trigger function to the onload handler.

fix(ai): fix modal triggers to viewport

fix(modal-triggers): stack the triggers into a single column.

restyle footer widgets

updated time selector modal to use correct storage term

minor style update

WIP(ai-chat): get product data

chore(js): Factor out pageContext module from influxdb-url.js

chore(js): Refactor helpers.js out of inflluxdb-url.js

WIP: refactor influxdburl - minimal changes for module conversions

feat(ai): Custom AI chat example questions product and version.
Ask AI example questions:
- Adds support for customizing example Ask AI questions per product or version.
- Configure questions in site `data/products.yml`; otherwise, it uses default questions from `ask-ai.js`

Context, page, and product data:
- Adds sample URLs for remaining versions in influxdb_urls
- `page-context.js` consolidates and exports constants for page context (protocol, host, path, referrer) and path-to-data mappings for product and influxdb_url site data

Module refactor:
- Refactors some JavaScript into ES6 modules, and refactors some of those further into a Component pattern--just vanilla JS and no shadow DOM stuff. The Component pattern that uses data attributes to "bind" JavaScript modules with CSS and HTML is a popular approach in modern web development. This pattern enhances modularity, reusability, and maintainability by associating behavior (JavaScript), structure (HTML), and style (CSS) through the use of data attributes.
- `assets/main.js` is the entrypoint
- Passes pageParams from the Hugo page to modules that import `@params`.
- Moves most external dependencies out of `script` tags and into package.json to be managed with `yarn`.
- Adds `eslint`.
- For modules that aren't yet components, wraps execution statements inside an `initialize()` function and calls the function from `main.js` on `DOMContentLoaded`.
- For components, if the page contains the `data-component="<component-name>"`, the matching element is passed to the component function on `DOMContentLoaded`.
- I tried to avoid changing logic where it wasn't necessary.

Update DOC_GPT_PROFILE.md

customize ai chat modal styles

fix(influxdb-url): Rename to cloud_dedicated in influxdb_urls.yml, remove newly added placeholder URL and use the extant default, refactor
- Rename  to  in influxdb_urls.yml
- Fix influxdb-url.js and data provision in local-storage.js to use the new name, mapping it to  to retain the existing local storage key

chore(api-lib): Use local-storage import instead of window global

chore(js): cleanup

fix(js): Ensure feature-callout initializes on page load

fix(theme): Load preferred theme before making the page visible. Execute a predefined function by specifying the function name in data-theme-callback

fix(search-toggle): Restores toggling the search field when sidebar is collapsed. Moves the event handler to a new search-button component

fix(ai): Fix custom attribute assignment. Rename property to ai_example_questions

Include the word `Bearer` or `Token`, a space, and your **token** value (all case-sensitive). Fix TOC links.
Fixes #5781

fix(api-docs): Update API reference directories and generation script for influxdb3 URL paths, update links and names in reference content

fix(api-ref): Update getswagger.sh destination paths to use the new directory structure when fetching spec files. Update the redocly  plugin module path.

hotfix: fix hlevel bug in children shortcode

Remove underline from custom time widget

add color to custom time widget styling
2025-02-03 08:31:55 -06:00