docs-v2/scripts
Jason Stirnaman 4cb455b1ae
feat(ci): add incremental builds and shared content-utils (#6582)
- Incremental Markdown build for PRs, full build for production
- Shared content-utils library for:
  - Mapping shared content to consuming pages (Markdown generation, Cypress)
  - Listing changed content pages (committed, uncommitted, staged)
  - Extracting source frontmatter (docs edit)
- Fix CSS parsing warnings with JSDOM VirtualConsole
- Remove unused imports and variables
2025-12-01 19:45:42 -05:00
..
lib feat(ci): add incremental builds and shared content-utils (#6582) 2025-12-01 19:45:42 -05:00
rust-markdown-converter Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
schemas Jts agentsmd (#6486) 2025-10-28 07:20:13 -05:00
templates chore(docs): Redesign docs CLI tools for creating and editing content, add content/create.md tutorial page for the How to creat… (#6506) 2025-11-03 10:18:15 -06:00
README.md Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
add-placeholders.js chore(docs): Redesign docs CLI tools for creating and editing content, add content/create.md tutorial page for the How to creat… (#6506) 2025-11-03 10:18:15 -06:00
build-llm-markdown.js feat(ci): add incremental builds and shared content-utils (#6582) 2025-12-01 19:45:42 -05:00
deploy-staging.sh Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
docs-cli.js chore(docs): Redesign docs CLI tools for creating and editing content, add content/create.md tutorial page for the How to creat… (#6506) 2025-11-03 10:18:15 -06:00
docs-create.js chore(docs): Redesign docs CLI tools for creating and editing content, add content/create.md tutorial page for the How to creat… (#6506) 2025-11-03 10:18:15 -06:00
docs-edit.js feat(ci): add incremental builds and shared content-utils (#6582) 2025-12-01 19:45:42 -05:00
html-to-markdown.js Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
setup-local-bin.js chore(docs): Redesign docs CLI tools for creating and editing content, add content/create.md tutorial page for the How to creat… (#6506) 2025-11-03 10:18:15 -06:00

README.md

Documentation Build Scripts

html-to-markdown.js

Converts Hugo-generated HTML files to fully-rendered Markdown with evaluated shortcodes, dereferenced shared content, and removed comments.

Purpose

This script generates production-ready Markdown output for LLM consumption and user downloads. The generated Markdown:

  • Has all Hugo shortcodes evaluated to text (e.g., {{% product-name %}} → "InfluxDB 3 Core")
  • Includes dereferenced shared content in the body
  • Removes HTML/Markdown comments
  • Adds product context to frontmatter
  • Mirrors the HTML version but in clean Markdown format

Usage

# Generate all markdown files (run after Hugo build)
yarn build:md

# Generate with verbose logging
yarn build:md:verbose

# Generate for specific path
node scripts/html-to-markdown.js --path influxdb3/core

# Generate limited number for testing
node scripts/html-to-markdown.js --limit 10

# Combine options
node scripts/html-to-markdown.js --path telegraf/v1 --verbose

Options

  • --path <path>: Process specific path within public/ (default: process all)
  • --limit <n>: Limit number of files to process (useful for testing)
  • --verbose: Enable detailed logging of conversion progress

Build Process

  1. Hugo generates HTML (with all shortcodes evaluated):

    npx hugo --quiet
    
  2. Script converts HTML to Markdown:

    yarn build:md
    
  3. Generated files:

    • Location: public/**/index.md (alongside index.html)
    • Git status: Ignored (entire public/ directory is gitignored)
    • Deployment: Generated at build time, like API docs

Features

Product Context Detection

Automatically detects and adds product information to frontmatter:

---
title: Set up InfluxDB 3 Core
description: Install, configure, and set up authorization...
url: /influxdb3/core/get-started/setup/
product: InfluxDB 3 Core
product_version: core
date: 2025-11-13
lastmod: 2025-11-13
---

Supported products:

  • InfluxDB 3 Core, Enterprise, Cloud Dedicated, Cloud Serverless, Clustered
  • InfluxDB v2, v1, Cloud (TSM), Enterprise v1
  • Telegraf, Chronograf, Kapacitor, Flux

Turndown Configuration

Custom Turndown rules for InfluxData documentation:

  • Code blocks: Preserves language identifiers
  • GitHub callouts: Converts to > [!Note] format
  • Tables: GitHub-flavored markdown tables
  • Lists: Preserves nested lists and formatting
  • Links: Keeps relative links intact
  • Images: Preserves alt text and paths

Content Extraction

Extracts only article content (removes navigation, footer, etc.):

  • Target selector: article.article--content
  • Skips files without article content (with warning)

Integration

Local Development:

# After making content changes
npx hugo --quiet && yarn build:md

CircleCI Build Pipeline:

The script runs automatically in the CircleCI build pipeline after Hugo generates HTML:

# .circleci/config.yml
- run:
    name: Hugo Build
    command: yarn hugo --environment production --logLevel info --gc --destination workspace/public
- run:
    name: Generate LLM-friendly Markdown
    command: node scripts/html-to-markdown.js

Build order:

  1. Hugo builds HTML → workspace/public/**/*.html
  2. html-to-markdown.js converts HTML → workspace/public/**/*.md
  3. All files deployed to S3

Production Build (Manual):

npx hugo --quiet
yarn build:md

Watch Mode: For development with auto-regeneration, run Hugo server and regenerate markdown after content changes:

# Terminal 1: Hugo server
npx hugo server

# Terminal 2: After making changes
yarn build:md

Performance

  • Processing speed: ~10-20 files/second
  • Full site: 5,581 HTML files in ~5 minutes
  • Memory usage: Minimal (processes files sequentially)
  • Caching: None (regenerates from HTML each time)

Troubleshooting

No article content found:

⚠️  No article content found in /path/to/file.html
  • File doesn't have article.article--content selector
  • Usually navigation pages or redirects
  • Safe to ignore

Shortcodes still present:

  • Run after Hugo has generated HTML, not before
  • Hugo must complete its build first

Missing product context:

  • Check that URL path matches patterns in PRODUCT_MAP
  • Add new products to the map if needed

See Also