docs-v2

Commit Graph

Author	SHA1	Message	Date
Jason Stirnaman	4cb455b1ae	feat(ci): add incremental builds and shared content-utils (#6582 ) - Incremental Markdown build for PRs, full build for production - Shared content-utils library for: - Mapping shared content to consuming pages (Markdown generation, Cypress) - Listing changed content pages (committed, uncommitted, staged) - Extracting source frontmatter (docs edit) - Fix CSS parsing warnings with JSDOM VirtualConsole - Remove unused imports and variables	2025-12-01 19:45:42 -05:00
Jason Stirnaman	c2093c8212	Feature: Generate documentation in LLM-friendly Markdown (#6555 ) * feat(llms): LLM-friendly Markdown, ChatGPT and Claude links. This enables LLM-friendly documentation for entire sections, allowing users to copy complete documentation sections with a single click. Lambda@Edge now generates .md files on-demand with: - Evaluated Hugo shortcodes - Proper YAML frontmatter with product metadata - Clean markdown without UI elements - Section aggregation (parent + children in single file) The llms.txt files are now generated automatically during build from content structure and product metadata in data/products.yml, eliminating the need for hardcoded files and ensuring maintainability. Testing: - Automated markdown generation in test setup via cy.exec() - Implement dynamic content validation that extracts HTML content and verifies it appears in markdown version Documentation: Documents LLM-friendly markdown generation Details: Add gzip decompression for S3 HTML files in Lambda markdown generator HTML files stored in S3 are gzip-compressed but the Lambda was attempting to parse compressed data as UTF-8, causing JSDOM to fail to find article elements. This resulted in 404 errors for .md and .section.md requests. - Add zlib gunzip decompression in s3-utils.js fetchHtmlFromS3() - Detect gzip via ContentEncoding header or magic bytes (0x1f 0x8b) - Add configurable DEBUG constant for verbose logging - Add debug logging for buffer sizes and decompression in both files The decompression adds ~1-5ms per request but is necessary to parse HTML correctly. CloudFront caching minimizes Lambda invocations. Await async markdown conversion functions The convertToMarkdown and convertSectionToMarkdown functions are async but weren't being awaited, causing the Lambda to return a Promise object instead of a string. This resulted in CloudFront validation errors: "The body is not a string, is not an object, or exceeds the maximum size" Troubleshooting: - Set DEBUG for troubleshooting in lambda * feat(llms): Add build-time LLM-friendly Markdown generation Implements static Markdown generation during Hugo build. Key Features: - Two-phase generation: HTML→MD (memory-bounded), MD→sections (fast) - Automatic redirect detection via file size check (skips Hugo aliases) - Product detection using compiled TypeScript product-mappings module - Token estimation for LLM context planning (4 chars/token heuristic) - YAML serialization with description sanitization Performance: - ~105 seconds for 5,000 pages + 500 sections - ~300MB peak memory (safe for 2GB CircleCI environment) - 23 files/sec conversion rate with controlled concurrency Configuration Parameters: - MIN_HTML_SIZE_BYTES (default: 1024) - Skip files below threshold - CHARS_PER_TOKEN (default: 4) - Token estimation ratio - Concurrency: 10 workers (CI), 20 workers (local) Output: - Single pages: public//index.md (with frontmatter + content) - Section bundles: public//index.section.md (aggregated child pages) Files Changed: - scripts/build-llm-markdown.js (new) - Main build script - scripts/lib/markdown-converter.cjs (renamed from .js) - Core conversion - scripts/html-to-markdown.js - Updated import path - package.json - Updated exports for .cjs module Related: Replaces Lambda@Edge on-demand generation (5s response time) with build-time static generation for production deployment. feat(deploy): Add staging deployment workflow and update CI Integrates LLM markdown generation into deployment workflows with a complete staging deployment solution. CircleCI Updates: - Switch from legacy html-to-markdown.js to optimized build:md - 2x performance improvement (105s vs 200s+ for 5000 pages) - Better memory management (300MB vs variable) - Enables section bundle generation (index.section.md files) Staging Deployment: - New scripts/deploy-staging.sh for local staging deploys - Complete workflow: Hugo build → markdown gen → S3 upload - Environment variable driven configuration - Optional step skipping for faster iteration - CloudFront cache invalidation support NPM Scripts: - Added deploy:staging command for convenience - Wraps deploy-staging.sh script Documentation: - Updated DOCS-DEPLOYING.md with comprehensive guide - Merged staging/production workflows with Lambda@Edge docs - Build-time generation now primary, Lambda@Edge fallback - Troubleshooting section with common issues - Environment variable reference - Performance metrics and optimization tips Benefits: - Manual staging validation before production - Consistent markdown generation across environments - Faster CI builds with optimized script - Better error handling and progress reporting - Section aggregation for improved LLM context Usage: ```bash export STAGING_BUCKET="test2.docs.influxdata.com" export AWS_REGION="us-east-1" export STAGING_CF_DISTRIBUTION_ID="E1XXXXXXXXXX" yarn deploy:staging ``` Related: Completes build-time markdown generation implementation refactor: Remove Lambda@Edge implementation Build-time markdown generation has replaced Lambda@Edge on-demand generation as the primary method. Removed Lambda code and updated documentation to focus on build-time generation and testing. Removed: - deploy/llm-markdown/ directory (Lambda@Edge code) - Lambda@Edge section from DOCS-DEPLOYING.md Added: - Testing and Validation section in DOCS-DEPLOYING.md - Focus on build-time generation workflow * feat: Add Rust HTML-to-Markdown prototype Implements core markdown-converter.cjs functions in Rust for performance comparison. Performance results: - Rust: ~257 files/sec (10× faster) - JavaScript: ~25 files/sec average Recommendation: Keep JavaScript for now, implement incremental builds first. Rust migration provides 10× speedup but requires 3-4 weeks integration effort. Files: - Cargo.toml: Rust dependencies (html2md, scraper, serde_yaml, clap) - src/main.rs: Core conversion logic + CLI benchmark tool - benchmark-comparison.js: Side-by-side performance testing - README.md: Comprehensive findings and recommendations * fix(ui): improve dropdown positioning on viewport resize - Ensure dropdown stays within viewport bounds (min 8px padding) - Reposition dropdown on window resize and scroll events - Clean up event listeners when dropdown closes * chore(deps): add remark and unified packages for markdown processing Add remark-parse, remark-frontmatter, remark-gfm, and unified for enhanced markdown processing capabilities. * fix(edge): add return to prevent trailing-slash redirect for valid extensions Without the return statement, the Lambda@Edge function would continue executing after the callback, eventually hitting the trailing-slash redirect logic. This caused .md files to redirect to URLs with trailing slashes, which returned 404 from S3. * fix(md): add built-in product mappings and full URL support - Add URL_PATTERN_MAP and PRODUCT_NAME_MAP constants directly in the CommonJS module (ESM product-mappings.js cannot be require()'d) - Update generateFrontmatter() to accept baseUrl parameter and construct full URLs for the frontmatter url field - Update generateSectionFrontmatter() similarly for section pages - Update all call sites to pass baseUrl parameter This fixes empty product fields and relative URLs in generated markdown frontmatter when served via Lambda@Edge. * feat(md): add environment flag for base URL control Add -e, --env flag to html-to-markdown.js to control the base URL in generated markdown frontmatter. This matches Hugo's -e flag behavior and allows generating markdown with staging or production URLs. Also update build-llm-markdown.js with similar environment support. * feat(md): add Rust markdown converter and improve validation - Add Rust-based HTML-to-Markdown converter with NAPI-RS bindings - Update Cypress markdown validation tests - Update deploy-staging.sh with force upload flag * deploy-staging.sh: - Defaults STAGING_URL to https://test2.docs.influxdata.com if not set - Exports it so yarn build:md -e staging can use it - Displays it in the summary * Delete scripts/prototypes/rust-markdown/benchmark-comparison.js * Delete scripts/prototypes directory * fix(llms): Include full URL for section page Markdown and list of child pages * feat(llms): clarify format selector text for AI use case Update button and dropdown text to make the AI/LLM purpose clearer: - Button: "Copy page for AI" / "Copy section for AI" - Sublabel: "Clean Markdown optimized for AI assistants" - Section sublabel: "{N} pages combined as clean Markdown for AI assistants" Cypress tests updated and passing (13/13). --------- Co-authored-by: Scott Anderson <scott@influxdata.com>	2025-12-01 12:32:28 -06:00
Jason Stirnaman	a576480246	test(e2e): add --no-mapping flag to e2e test runner (#6532 ) Allow running functionality tests without requiring content file paths. The --no-mapping flag skips content-to-URL mapping, making it easier to run tests that don't depend on specific content files. Usage: # With content mapping (for content-specific tests) node run-e2e-specs.js content/influxdb3/core/_index.md # Without content mapping (for functionality tests) node run-e2e-specs.js --spec cypress/e2e/page-context.cy.js --no-mapping Benefits: - Simplifies running functionality tests like page-context.cy.js - Reduces test startup time by skipping unnecessary file mapping - Makes test commands clearer about their purpose The page-context test was updated to work correctly with this flag.	2025-11-13 13:37:01 -06:00
Jason Stirnaman	e93e78be0a	feat(influxdb): Version detector shortcode triggers a modal Creates an interactive InfluxDB version detector component in TypeScript and a shortcode that generates a button to trigger the version detector modal. The shortcode takes a parameter that displays a predefined set of links for results. - Support URL pattern matching and ping header analysis - Add questionnaire-based product identification logic - Adds the shortcode in a note in /influxdb3/core/visualize-data/grafana/ - Set up TypeScript configuration for the project - Configure automatic TypeScript compilation in pre-commit hooks - Add to Grafana documentation pages - Remove last remnants of old Cypress link checker - Add Cypress tests, but many are still broken Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Apply suggestions from code review Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> Update layouts/shortcodes/influxdb-version-detector.html Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> Update assets/js/influxdb-version-detector.ts Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> Update assets/styles/components/_influxdb-version-detector.scss Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> Fixes: - Fix Hugo template to include product names in detector config - Change elimination scores from -100 to -1000 for proper filtering - Add scoring logic for generic "InfluxDB" product (OSS v2.x) - Exclude generic "InfluxDB" from results (too vague) - Add comprehensive test scenario checklist to Cypress tests - Free license now correctly excludes Enterprise, Clustered, Dedicated - Self-hosted now correctly excludes all Cloud products - SQL language now correctly excludes v1 and v2 products - Results now show only specific products (OSS 1.x, OSS 2.x, etc.) Changes: - When users answer "I'm not sure" to all questions, show a helpful message directing them to the reference table instead of showing a weak ranking with low confidence. - Detect when all questionnaire answers are "unknown" - Display custom message explaining lack of information - Auto-expand reference table for easy product identification - Hide ranked results when insufficient information provided - Make product names clickable in the quick reference table to allow users to quickly navigate to product documentation after identifying their InfluxDB version.	2025-09-30 19:01:21 -05:00
Jason Stirnaman	a8578bb0af	chore(ci): Removes old Cypress link checker test code	2025-08-18 10:51:57 -05:00
Jason Stirnaman	c781182163	Update cypress/support/link-cache.js Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-29 10:48:05 -05:00
Jason Stirnaman	9bd8c978a8	Update cypress/support/link-cache.js Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-29 10:48:05 -05:00
Jason Stirnaman	70026432ac	ci: rearchitect caching to work at the URL-level and support content/shared shared content files. Fix the cache reporting.Link Validation Cache Performance: ======================================= Cache hit rate: 100% Cache hits: 54 Cache misses: 0 Total validations: 54 New entries stored: 0 ✨ Cache optimization saved 54 link validations This demonstrates that all 54 link validations were served from cache, which greatly speeds up the test execution. Summary I've successfully fixed the cache statistics reporting issue in the Cypress link validation tests. Here's what was implemented: Changes Made: 1. Modified the Cypress test (cypress/e2e/content/article-links.cy.js): - Added a new task call saveCacheStatsForReporter in the after() hook to save cache statistics to a file that the main reporter can read 2. Updated Cypress configuration (cypress.config.js): - Added the saveCacheStatsForReporter task that calls the reporter's saveCacheStats function - Imported the saveCacheStats function from the link reporter 3. Enhanced the link reporter (cypress/support/link-reporter.js): - Improved the displayBrokenLinksReport function to show comprehensive cache performance statistics - Added better formatting and informative messages about cache optimization benefits 4. Fixed missing constant (cypress/support/hugo-server.js): - Added the missing HUGO_SHUTDOWN_TIMEOUT constant and exported it - Updated the import in run-e2e-specs.js to include this constant Result: The cache statistics are now properly displayed in the terminal output after running link validation tests, showing: - Cache hit rate (percentage) - Cache hits (number of cached validations) - Cache misses (number of fresh validations) - Total validations performed - New entries stored in cache - Expired entries cleaned (when applicable) - Optimization message showing how many validations were saved by caching	2025-07-29 10:48:05 -05:00
Jason Stirnaman	0fc2efc938	Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-28 21:58:09 -05:00
Jason Stirnaman	660170435f	ci: convert cypress scripts to CommonJS: Verification Results - Direct module loading: ✅ Works perfectly - Incremental validation: ✅ Processes files correctly - Subprocess calls: ✅ No EPIPE errors - Cache functionality: ✅ Operating normally 🔧 Technical Details - All modules now use CommonJS require() statements - Proper module.exports for compatibility - File extensions changed to .cjs to work with type: module in package.json - Maintained all existing functionality and error handling	2025-07-28 21:58:09 -05:00
Jason Stirnaman	eac1acfdf8	ci: fix timeout command, set timeout options, enhance logging: 1. Timeout Handling: Proper timeout command syntax with specific timeout error detection 2. Process Management: Environment variable for Hugo shutdown timeout 3. Debugging: Enhanced log collection and artifact uploads 4. Error Context: Better error messages that help identify root causes (EPIPE, timeouts, etc.) 5. Resource Constraints: Memory limits and CI-specific optimizations	2025-07-28 21:03:31 -05:00
Jason Stirnaman	f873562667	Update cypress/support/run-e2e-specs.js Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-28 17:41:57 -05:00
Jason Stirnaman	0d38db18e3	Update cypress/support/run-e2e-specs.js Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-07-28 17:41:57 -05:00
Jason Stirnaman	215ecfb7f9	ci: improve Hugo process management to address EPIPE errors in Github Actions. - Proactive monitoring: Detecting when Hugo dies during execution - Resource management: Reducing memory pressure in CI that causes process termination - Signal handling: Properly cleaning up processes on unexpected termination - Timeout adjustments: Giving more time for operations in CI environments 3. The Test Setup Validation Failure: This was happening because the before() hook was failing when it couldn't communicate with a dead Hugo process. Your health monitoring will catch this earlier and provide better error messages.	2025-07-28 17:41:57 -05:00
Jason Stirnaman	8a26400577	ci: Enhance logging for troubleshooting test failures not due to broken links	2025-07-28 16:54:01 -05:00
Jason Stirnaman	d762e7800e	fix: apply code review suggestions: High Priority Issues (Fixed): 1. Script execution detection in matrix-generator.js - Added fileURLToPath import and updated comparison 2. Script execution detection in incremental-validator.js - Added fileURLToPath import and updated comparison 3. Script execution detection in link-extractor.js - Added fileURLToPath import and updated comparison 4. Script execution detection in comment-generator.js - Added fileURLToPath import and updated comparison Medium Priority Issues (Fixed): 5. Extracted duplicated URL transformation logic - Created shared utility module and updated both files to use it 6. Fixed cache key strategy - Updated GitHub workflow to use content-based hashing instead of base SHA Changes Made: - 4 JavaScript files: Updated with robust script execution detection using fileURLToPath - 1 utility module: Created /.github/scripts/utils/url-transformer.js for shared logic - 2 files: Updated to use the shared URL transformation utility - 1 workflow file: Improved cache key strategy for better cache hit rates	2025-07-28 16:24:24 -05:00
Jason Stirnaman	6a4e8827eb	feat(testing): add link validation automation and improvements - Add GitHub Actions for automated link validation on PRs - Implement incremental validation with caching (30-day TTL, configurable) - Add matrix generator for parallel validation strategy - Create comprehensive TESTING.md documentation - Add cache manager with configurable TTL via env var or CLI - Implement smart link extraction and validation - Add PR comment generator for broken link reports - Update Cypress tests to use incremental validation - Consolidate testing docs from CONTRIBUTING.md to TESTING.md Key improvements: - Cache-aware validation only checks changed content - Parallel execution for large changesets - Detailed PR comments with broken link reports - Support for LINK_CACHE_TTL_DAYS env var - Local testing with yarn test:links - Reduced false positives through intelligent caching	2025-07-28 16:24:24 -05:00
Jason Stirnaman	717ec5cd1d	chore(js): Extract Hugo params imports to single-purpose modules, fix environment-specific Hugo configs, use Hugo environments instead of specifying the config file, configure source maps and ESM for development and testing	2025-06-09 16:46:26 -05:00
Jason Stirnaman	da767f5228	chore(test): e2e test improvements: - Link checker should report the first broken link - Link checker should only test external links if the domains are in the allowed list - If test subjects don't start with 'content/', treat them as URL paths and don't send them to map-files-to-urls.js.	2025-05-19 09:55:06 -05:00
Jason Stirnaman	02e10068ad	- Improved successor handling in stable-version.html - Fixed logic to maintain current product context while using successor for version info - Added appropriate successor product display in callout warning - Fixed conditions for when callout should appear - Fixed potential `isset <nil>` problems with proper nil checks before accessing properties - Added comprehensive Cypress tests for successor relationship - Created `stable-version-callout.cy.js` for testing successor behavior - Tests that `/influxdb/v1/` and `/influxdb/v2/` pages show successor callout - Verifies "InfluxDB 3 Core" appears in callout with correct links - Checks product data configuration in `products.yml` - Includes JavaScript error detection - Configured environment-specific Hugo settings - Added `/config/testing/config.yml` for test environment - Configured port 1315 for test environment - Added build parameters specific to testing context - Separated test environment from development environment - Updated Docker configuration - Added new package.json script for building Docker image - Improved Docker-based testing commands	2025-05-19 09:55:06 -05:00
Jason Stirnaman	4cfff239f3	End-to-end testing, CI script, and JavaScript QoL improvements: - Environment variable formatting - Updated environment variable configuration from array format to object format to comply with Lefthook schema validation requirements. - Unified link testing - Consolidated multiple product-specific link testing commands into a single `e2e-links` command that processes all staged Markdown and HTML files across content directories. - Package script integration - Modified commands to use centralized yarn scripts instead of direct execution, improving maintainability and consistency. - Source information extraction - Enhanced to correctly extract and report source information from frontmatter. - URL and source mapping - Improved handling of URL to source path mapping for better reporting. - Ignored anchor links configuration - Added proper exclusion of behavior-triggering anchor links (like tab navigation) to prevent false positives. - Request options correction - Fixed Cypress request options to ensure `failOnStatusCode` is properly set when `retryOnStatusCodeFailure` is enabled. - Improved error reporting - Enhanced error reporting with more context about broken links. - New test scripts added - Added centralized testing scripts for link checking and codeblock validation. - Product-specific test commands - Added commands for each product version (InfluxDB v2, v3 Core, Enterprise, Cloud, etc.). - API docs testing - Added specialized commands for testing API documentation links. - Comprehensive test runners - Added commands to run all tests of a specific type (`test:links:all`, `test:codeblocks:all`). - Fix Docker build command and update CONTRIBUTING. chore(js): JavaScript QoL improvements: - Refactor main.js with a componentRegistry object and clear initialization of components and globals - Add a standard index.js with all necessary exports. - Update javascript.html to use the index.js - Remove jQuery script tag from header javascript.html (remains in footer) - Update package file to improve module discovery. - Improve Hugo and ESLint config for module discovery and ES6 syntax	2025-05-19 09:50:33 -05:00
Jason Stirnaman	3d4f78f5c4	fix(cloudv2): Doesn't support environment references in templates	2025-03-14 18:11:16 -05:00
Jason Stirnaman	9755033970	chore(ci): closes #5887 Improve and automate pre-commit link-checking	2025-03-13 14:24:36 -05:00
Jason Stirnaman	ef106dd3a1	chore(e2e): Add Cypress for link checking and end-to-end tests. Fix broken links revealed by tests. - Adds Cypress and a few basic tests for the global topnav, the home page, and link-checking. - For link-checking, pass a comma-delimited list of URLs in an exported cypress_test_subjects environment variable. For examples, see the convenience commands in package.json	2025-02-03 17:52:04 -06:00

24 Commits (0ba60f7d698fc441ef2fe3fc79a8da9ca8cc5566)