docs-v2/scripts
Jason Stirnaman b4e4e37099 refactor(ci): use shared resolve-shared-content.sh script
- Replace Node.js detect-test-products.js with shell-based approach
- Add .github/scripts/resolve-shared-content.sh (from docs-v2-jts-vale-ci)
- Remove Node.js setup step from detect-changes job
- No external dependencies required for shared content resolution
2026-02-11 18:22:27 -06:00
..
docs-cli chore: improve docs-cli with unified flag syntax and YAML config (#6778) 2026-02-04 16:44:35 -06:00
lib fix: tab names in generated Markdown (#6698) 2026-01-06 09:57:47 -06:00
puppeteer chore(scripts): JS style fix (#6780) 2026-02-04 17:07:14 -06:00
rust-markdown-converter Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
schemas Jts agentsmd (#6486) 2025-10-28 07:20:13 -05:00
templates chore(docs): Redesign docs CLI tools for creating and editing content, add content/create.md tutorial page for the How to creat… (#6506) 2025-11-03 10:18:15 -06:00
README.md Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
add-placeholders.js chore(docs): Redesign docs CLI tools for creating and editing content, add content/create.md tutorial page for the How to creat… (#6506) 2025-11-03 10:18:15 -06:00
build-llm-markdown.js feat(ci): add incremental builds and shared content-utils (#6582) 2025-12-01 19:45:42 -05:00
deploy-staging.sh Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
html-to-markdown.js Feature: Generate documentation in LLM-friendly Markdown (#6555) 2025-12-01 12:32:28 -06:00
setup-local-bin.js fix(cli): Make docs edit non-blocking and reorganize CLI code (#6721) 2026-01-13 21:47:09 -06:00

README.md

Documentation Build Scripts

html-to-markdown.js

Converts Hugo-generated HTML files to fully-rendered Markdown with evaluated shortcodes, dereferenced shared content, and removed comments.

Purpose

This script generates production-ready Markdown output for LLM consumption and user downloads. The generated Markdown:

  • Has all Hugo shortcodes evaluated to text (e.g., {{% product-name %}} → "InfluxDB 3 Core")
  • Includes dereferenced shared content in the body
  • Removes HTML/Markdown comments
  • Adds product context to frontmatter
  • Mirrors the HTML version but in clean Markdown format

Usage

# Generate all markdown files (run after Hugo build)
yarn build:md

# Generate with verbose logging
yarn build:md:verbose

# Generate for specific path
node scripts/html-to-markdown.js --path influxdb3/core

# Generate limited number for testing
node scripts/html-to-markdown.js --limit 10

# Combine options
node scripts/html-to-markdown.js --path telegraf/v1 --verbose

Options

  • --path <path>: Process specific path within public/ (default: process all)
  • --limit <n>: Limit number of files to process (useful for testing)
  • --verbose: Enable detailed logging of conversion progress

Build Process

  1. Hugo generates HTML (with all shortcodes evaluated):

    npx hugo --quiet
    
  2. Script converts HTML to Markdown:

    yarn build:md
    
  3. Generated files:

    • Location: public/**/index.md (alongside index.html)
    • Git status: Ignored (entire public/ directory is gitignored)
    • Deployment: Generated at build time, like API docs

Features

Product Context Detection

Automatically detects and adds product information to frontmatter:

---
title: Set up InfluxDB 3 Core
description: Install, configure, and set up authorization...
url: /influxdb3/core/get-started/setup/
product: InfluxDB 3 Core
product_version: core
date: 2025-11-13
lastmod: 2025-11-13
---

Supported products:

  • InfluxDB 3 Core, Enterprise, Cloud Dedicated, Cloud Serverless, Clustered
  • InfluxDB v2, v1, Cloud (TSM), Enterprise v1
  • Telegraf, Chronograf, Kapacitor, Flux

Turndown Configuration

Custom Turndown rules for InfluxData documentation:

  • Code blocks: Preserves language identifiers
  • GitHub callouts: Converts to > [!Note] format
  • Tables: GitHub-flavored markdown tables
  • Lists: Preserves nested lists and formatting
  • Links: Keeps relative links intact
  • Images: Preserves alt text and paths

Content Extraction

Extracts only article content (removes navigation, footer, etc.):

  • Target selector: article.article--content
  • Skips files without article content (with warning)

Integration

Local Development:

# After making content changes
npx hugo --quiet && yarn build:md

CircleCI Build Pipeline:

The script runs automatically in the CircleCI build pipeline after Hugo generates HTML:

# .circleci/config.yml
- run:
    name: Hugo Build
    command: yarn hugo --environment production --logLevel info --gc --destination workspace/public
- run:
    name: Generate LLM-friendly Markdown
    command: node scripts/html-to-markdown.js

Build order:

  1. Hugo builds HTML → workspace/public/**/*.html
  2. html-to-markdown.js converts HTML → workspace/public/**/*.md
  3. All files deployed to S3

Production Build (Manual):

npx hugo --quiet
yarn build:md

Watch Mode: For development with auto-regeneration, run Hugo server and regenerate markdown after content changes:

# Terminal 1: Hugo server
npx hugo server

# Terminal 2: After making changes
yarn build:md

Performance

  • Processing speed: ~10-20 files/second
  • Full site: 5,581 HTML files in ~5 minutes
  • Memory usage: Minimal (processes files sequentially)
  • Caching: None (regenerates from HTML each time)

Troubleshooting

No article content found:

⚠️  No article content found in /path/to/file.html
  • File doesn't have article.article--content selector
  • Usually navigation pages or redirects
  • Safe to ignore

Shortcodes still present:

  • Run after Hugo has generated HTML, not before
  • Hugo must complete its build first

Missing product context:

  • Check that URL path matches patterns in PRODUCT_MAP
  • Add new products to the map if needed

See Also