# Documentation Build Scripts ## html-to-markdown.js Converts Hugo-generated HTML files to fully-rendered Markdown with evaluated shortcodes, dereferenced shared content, and removed comments. ### Purpose This script generates production-ready Markdown output for LLM consumption and user downloads. The generated Markdown: - Has all Hugo shortcodes evaluated to text (e.g., `{{% product-name %}}` → "InfluxDB 3 Core") - Includes dereferenced shared content in the body - Removes HTML/Markdown comments - Adds product context to frontmatter - Mirrors the HTML version but in clean Markdown format ### Usage ```bash # Generate all markdown files (run after Hugo build) yarn build:md # Generate with verbose logging yarn build:md:verbose # Generate for specific path node scripts/html-to-markdown.js --path influxdb3/core # Generate limited number for testing node scripts/html-to-markdown.js --limit 10 # Combine options node scripts/html-to-markdown.js --path telegraf/v1 --verbose ``` ### Options - `--path `: Process specific path within `public/` (default: process all) - `--limit `: Limit number of files to process (useful for testing) - `--verbose`: Enable detailed logging of conversion progress ### Build Process 1. **Hugo generates HTML** (with all shortcodes evaluated): ```bash npx hugo --quiet ``` 2. **Script converts HTML to Markdown**: ```bash yarn build:md ``` 3. **Generated files**: - Location: `public/**/index.md` (alongside `index.html`) - Git status: Ignored (entire `public/` directory is gitignored) - Deployment: Generated at build time, like API docs ### Features #### Product Context Detection Automatically detects and adds product information to frontmatter: ```yaml --- title: Set up InfluxDB 3 Core description: Install, configure, and set up authorization... url: /influxdb3/core/get-started/setup/ product: InfluxDB 3 Core product_version: core date: 2025-11-13 lastmod: 2025-11-13 --- ``` Supported products: - InfluxDB 3 Core, Enterprise, Cloud Dedicated, Cloud Serverless, Clustered - InfluxDB v2, v1, Cloud (TSM), Enterprise v1 - Telegraf, Chronograf, Kapacitor, Flux #### Turndown Configuration Custom Turndown rules for InfluxData documentation: - **Code blocks**: Preserves language identifiers - **GitHub callouts**: Converts to `> [!Note]` format - **Tables**: GitHub-flavored markdown tables - **Lists**: Preserves nested lists and formatting - **Links**: Keeps relative links intact - **Images**: Preserves alt text and paths #### Content Extraction Extracts only article content (removes navigation, footer, etc.): - Target selector: `article.article--content` - Skips files without article content (with warning) ### Integration **Local Development:** ```bash # After making content changes npx hugo --quiet && yarn build:md ``` **CircleCI Build Pipeline:** The script runs automatically in the CircleCI build pipeline after Hugo generates HTML: ```yaml # .circleci/config.yml - run: name: Hugo Build command: yarn hugo --environment production --logLevel info --gc --destination workspace/public - run: name: Generate LLM-friendly Markdown command: node scripts/html-to-markdown.js ``` **Build order:** 1. Hugo builds HTML → `workspace/public/**/*.html` 2. `html-to-markdown.js` converts HTML → `workspace/public/**/*.md` 3. All files deployed to S3 **Production Build (Manual):** ```bash npx hugo --quiet yarn build:md ``` **Watch Mode:** For development with auto-regeneration, run Hugo server and regenerate markdown after content changes: ```bash # Terminal 1: Hugo server npx hugo server # Terminal 2: After making changes yarn build:md ``` ### Performance - **Processing speed**: ~10-20 files/second - **Full site**: 5,581 HTML files in ~5 minutes - **Memory usage**: Minimal (processes files sequentially) - **Caching**: None (regenerates from HTML each time) ### Troubleshooting **No article content found:** ``` ⚠️ No article content found in /path/to/file.html ``` - File doesn't have `article.article--content` selector - Usually navigation pages or redirects - Safe to ignore **Shortcodes still present:** - Run after Hugo has generated HTML, not before - Hugo must complete its build first **Missing product context:** - Check that URL path matches patterns in `PRODUCT_MAP` - Add new products to the map if needed ### See Also - [Plan document](../.context/PLAN-markdown-rendering.md) - Architecture decisions - [API docs generation](../api-docs/README.md) - Similar pattern for API reference - [Package.json scripts](../package.json) - Build commands