* feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.
This enables LLM-friendly documentation for entire sections,
allowing users to copy complete documentation sections with a single click.
Lambda@Edge now generates .md files on-demand with:
- Evaluated Hugo shortcodes
- Proper YAML frontmatter with product metadata
- Clean markdown without UI elements
- Section aggregation (parent + children in single file)
The llms.txt files are now generated automatically during build from
content structure and product metadata in data/products.yml, eliminating
the need for hardcoded files and ensuring maintainability.
**Testing**:
- Automated markdown generation in test setup via cy.exec()
- Implement dynamic content validation that extracts HTML content and
verifies it appears in markdown version
**Documentation**:
Documents LLM-friendly markdown generation
**Details**:
Add gzip decompression for S3 HTML files in Lambda markdown generator
HTML files stored in S3 are gzip-compressed but the Lambda was attempting
to parse compressed data as UTF-8, causing JSDOM to fail to find article
elements. This resulted in 404 errors for .md and .section.md requests.
- Add zlib gunzip decompression in s3-utils.js fetchHtmlFromS3()
- Detect gzip via ContentEncoding header or magic bytes (0x1f 0x8b)
- Add configurable DEBUG constant for verbose logging
- Add debug logging for buffer sizes and decompression in both files
The decompression adds ~1-5ms per request but is necessary to parse
HTML correctly. CloudFront caching minimizes Lambda invocations.
Await async markdown conversion functions
The convertToMarkdown and convertSectionToMarkdown functions are async
but weren't being awaited, causing the Lambda to return a Promise object
instead of a string. This resulted in CloudFront validation errors:
"The body is not a string, is not an object, or exceeds the maximum size"
**Troubleshooting**:
- Set DEBUG for troubleshooting in lambda
* feat(llms): Add build-time LLM-friendly Markdown generation
Implements static Markdown generation during Hugo build.
**Key Features:**
- Two-phase generation: HTML→MD (memory-bounded), MD→sections (fast)
- Automatic redirect detection via file size check (skips Hugo aliases)
- Product detection using compiled TypeScript product-mappings module
- Token estimation for LLM context planning (4 chars/token heuristic)
- YAML serialization with description sanitization
**Performance:**
- ~105 seconds for 5,000 pages + 500 sections
- ~300MB peak memory (safe for 2GB CircleCI environment)
- 23 files/sec conversion rate with controlled concurrency
**Configuration Parameters:**
- MIN_HTML_SIZE_BYTES (default: 1024) - Skip files below threshold
- CHARS_PER_TOKEN (default: 4) - Token estimation ratio
- Concurrency: 10 workers (CI), 20 workers (local)
**Output:**
- Single pages: public/*/index.md (with frontmatter + content)
- Section bundles: public/*/index.section.md (aggregated child pages)
**Files Changed:**
- scripts/build-llm-markdown.js (new) - Main build script
- scripts/lib/markdown-converter.cjs (renamed from .js) - Core conversion
- scripts/html-to-markdown.js - Updated import path
- package.json - Updated exports for .cjs module
Related: Replaces Lambda@Edge on-demand generation (5s response time)
with build-time static generation for production deployment.
feat(deploy): Add staging deployment workflow and update CI
Integrates LLM markdown generation into deployment workflows with
a complete staging deployment solution.
**CircleCI Updates:**
- Switch from legacy html-to-markdown.js to optimized build:md
- 2x performance improvement (105s vs 200s+ for 5000 pages)
- Better memory management (300MB vs variable)
- Enables section bundle generation (index.section.md files)
**Staging Deployment:**
- New scripts/deploy-staging.sh for local staging deploys
- Complete workflow: Hugo build → markdown gen → S3 upload
- Environment variable driven configuration
- Optional step skipping for faster iteration
- CloudFront cache invalidation support
**NPM Scripts:**
- Added deploy:staging command for convenience
- Wraps deploy-staging.sh script
**Documentation:**
- Updated DOCS-DEPLOYING.md with comprehensive guide
- Merged staging/production workflows with Lambda@Edge docs
- Build-time generation now primary, Lambda@Edge fallback
- Troubleshooting section with common issues
- Environment variable reference
- Performance metrics and optimization tips
**Benefits:**
- Manual staging validation before production
- Consistent markdown generation across environments
- Faster CI builds with optimized script
- Better error handling and progress reporting
- Section aggregation for improved LLM context
**Usage:**
```bash
export STAGING_BUCKET="test2.docs.influxdata.com"
export AWS_REGION="us-east-1"
export STAGING_CF_DISTRIBUTION_ID="E1XXXXXXXXXX"
yarn deploy:staging
```
Related: Completes build-time markdown generation implementation
refactor: Remove Lambda@Edge implementation
Build-time markdown generation has replaced Lambda@Edge on-demand
generation as the primary method. Removed Lambda code and updated
documentation to focus on build-time generation and testing.
Removed:
- deploy/llm-markdown/ directory (Lambda@Edge code)
- Lambda@Edge section from DOCS-DEPLOYING.md
Added:
- Testing and Validation section in DOCS-DEPLOYING.md
- Focus on build-time generation workflow
* feat: Add Rust HTML-to-Markdown prototype
Implements core markdown-converter.cjs functions in Rust for performance comparison.
Performance results:
- Rust: ~257 files/sec (10× faster)
- JavaScript: ~25 files/sec average
Recommendation: Keep JavaScript for now, implement incremental builds first.
Rust migration provides 10× speedup but requires 3-4 weeks integration effort.
Files:
- Cargo.toml: Rust dependencies (html2md, scraper, serde_yaml, clap)
- src/main.rs: Core conversion logic + CLI benchmark tool
- benchmark-comparison.js: Side-by-side performance testing
- README.md: Comprehensive findings and recommendations
* fix(ui): improve dropdown positioning on viewport resize
- Ensure dropdown stays within viewport bounds (min 8px padding)
- Reposition dropdown on window resize and scroll events
- Clean up event listeners when dropdown closes
* chore(deps): add remark and unified packages for markdown processing
Add remark-parse, remark-frontmatter, remark-gfm, and unified for
enhanced markdown processing capabilities.
* fix(edge): add return to prevent trailing-slash redirect for valid extensions
Without the return statement, the Lambda@Edge function would continue
executing after the callback, eventually hitting the trailing-slash
redirect logic. This caused .md files to redirect to URLs with trailing
slashes, which returned 404 from S3.
* fix(md): add built-in product mappings and full URL support
- Add URL_PATTERN_MAP and PRODUCT_NAME_MAP constants directly in the
CommonJS module (ESM product-mappings.js cannot be require()'d)
- Update generateFrontmatter() to accept baseUrl parameter and construct
full URLs for the frontmatter url field
- Update generateSectionFrontmatter() similarly for section pages
- Update all call sites to pass baseUrl parameter
This fixes empty product fields and relative URLs in generated markdown
frontmatter when served via Lambda@Edge.
* feat(md): add environment flag for base URL control
Add -e, --env flag to html-to-markdown.js to control the base URL
in generated markdown frontmatter. This matches Hugo's -e flag behavior
and allows generating markdown with staging or production URLs.
Also update build-llm-markdown.js with similar environment support.
* feat(md): add Rust markdown converter and improve validation
- Add Rust-based HTML-to-Markdown converter with NAPI-RS bindings
- Update Cypress markdown validation tests
- Update deploy-staging.sh with force upload flag
* deploy-staging.sh:
- Defaults STAGING_URL to https://test2.docs.influxdata.com
if not set
- Exports it so yarn build:md -e staging can use it
- Displays it in the summary
* Delete scripts/prototypes/rust-markdown/benchmark-comparison.js
* Delete scripts/prototypes directory
* fix(llms): Include full URL for section page Markdown and list of child pages
* feat(llms): clarify format selector text for AI use case
Update button and dropdown text to make the AI/LLM purpose clearer:
- Button: "Copy page for AI" / "Copy section for AI"
- Sublabel: "Clean Markdown optimized for AI assistants"
- Section sublabel: "{N} pages combined as clean Markdown for AI assistants"
Cypress tests updated and passing (13/13).
---------
Co-authored-by: Scott Anderson <scott@influxdata.com>
- Build resources if not cached
- Ensure node_modules dependencies are available for asset processing
- Be more precise in the template when building assets in production mode and avoid conflicts with SCSS and CSS processing.
- Ignore node_modules when loading source maps
- Add .vscode/launch.json with debugging configuration for localhost:1313. In VSCode, go to Run and select the site to launch Chrome, connect to Developer Tools, and start debugging.
- Add support for debugging in VS Code without using source maps. Adds a debug helpers module for developers to use in IDE debugging and interact with the browser console. This is a workaround for lack of good source map support with js.Build and the Hugo asset pipeline.
Background:
Hugo and js.Build don't have good support for external source maps.
Internal source mapping is also unreliable; the base64 in the source map reference for some files is too long for the browser console to keep on 1 line--remaining characters are printed on the next line, resulting in a syntax error.
- Extracted platform detection to its own reusable utility module
- Eliminates deprecated Navigator.platform API usage; uses the User-Agent Client Hints API as the primary method when available
- Handles iOS and Android devices correctly
1. Refactor JavaScript from search.html into a new DocSearch Component Module. The component needs to include: - Asynchronous loading of Algolia docsearch.min.js - Proper Algolia search facets:
- Nested arrays for AND conditions: In Algolia, filters in the same array are combined with OR, while arrays in an array are combined with AND
- Space after colon: Algolia's parser expects this specific format
2. Update Search HTML Template: refactor the search.html to use the new component:
3. Register the Component in main.js: update main.js to import and register our new DocSearch component:
Improve SearchInteractions event handler coordination with DocSearch:
- Improve SearchInteractions to dynamically detect the dropdown added by Algolia DocSearch and ensure that blur and focus event handlers are registered after the dropdown loads.
Benefits of This Approach:
Asynchronous Loading: DocSearch.js is now loaded asynchronously, improving page load performance
Search functionality is encapsulated as a component
Clean Separation: HTML template only contains configuration data, not implementation
Event Communication: Components can react to DocSearch initialization via events
SearchInteractions dynamically detects dropdown: Instead of waiting for an event or using setTimeout, we actively observe DOM changes to detect when the dropdown is created.
Nested Observation: A second observer monitors changes to the dropdown itself, allowing code to respond to style changes.
Better Event Handling: The blur handler checks if the related target is inside the search components, preventing unwanted dropdown closing.
Resource Cleanup: The cleanup function disconnects all observers, preventing memory leaks.
Additional Interaction Control: Adds an event listener to the dropdown to prevent unwanted blur events when interacting with search results.