History

Scott Anderson 7f7387ae9c feat(llms): LLM-friendly Markdown, ChatGPT and Claude links. This enables LLM-friendly documentation for entire sections, allowing users to copy complete documentation sections with a single click. Lambda@Edge now generates .md files on-demand with: - Evaluated Hugo shortcodes - Proper YAML frontmatter with product metadata - Clean markdown without UI elements - Section aggregation (parent + children in single file) The llms.txt files are now generated automatically during build from content structure and product metadata in data/products.yml, eliminating the need for hardcoded files and ensuring maintainability. Testing: - Automated markdown generation in test setup via cy.exec() - Implement dynamic content validation that extracts HTML content and verifies it appears in markdown version Documentation: Documents LLM-friendly markdown generation Details: Add gzip decompression for S3 HTML files in Lambda markdown generator HTML files stored in S3 are gzip-compressed but the Lambda was attempting to parse compressed data as UTF-8, causing JSDOM to fail to find article elements. This resulted in 404 errors for .md and .section.md requests. - Add zlib gunzip decompression in s3-utils.js fetchHtmlFromS3() - Detect gzip via ContentEncoding header or magic bytes (0x1f 0x8b) - Add configurable DEBUG constant for verbose logging - Add debug logging for buffer sizes and decompression in both files The decompression adds ~1-5ms per request but is necessary to parse HTML correctly. CloudFront caching minimizes Lambda invocations. Await async markdown conversion functions The convertToMarkdown and convertSectionToMarkdown functions are async but weren't being awaited, causing the Lambda to return a Promise object instead of a string. This resulted in CloudFront validation errors: "The body is not a string, is not an object, or exceeds the maximum size" Troubleshooting: - Set DEBUG for troubleshooting in lambda	2025-11-21 13:49:36 -06:00
..
lambda-edge/markdown-generator	feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.	2025-11-21 13:49:36 -06:00
README.md	feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.	2025-11-21 13:49:36 -06:00

Scott Anderson 7f7387ae9c feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.

This enables LLM-friendly documentation for entire sections,
allowing users to copy complete documentation sections with a single click.

Lambda@Edge now generates .md files on-demand with:
- Evaluated Hugo shortcodes
- Proper YAML frontmatter with product metadata
- Clean markdown without UI elements
- Section aggregation (parent + children in single file)

The llms.txt files are now generated automatically during build from
content structure and product metadata in data/products.yml, eliminating
the need for hardcoded files and ensuring maintainability.

**Testing**:
- Automated markdown generation in test setup via cy.exec()
- Implement dynamic content validation that extracts HTML content and
  verifies it appears in markdown version

**Documentation**:
Documents LLM-friendly markdown generation

**Details**:
Add gzip decompression for S3 HTML files in Lambda markdown generator

HTML files stored in S3 are gzip-compressed but the Lambda was attempting
to parse compressed data as UTF-8, causing JSDOM to fail to find article
elements. This resulted in 404 errors for .md and .section.md requests.

- Add zlib gunzip decompression in s3-utils.js fetchHtmlFromS3()
- Detect gzip via ContentEncoding header or magic bytes (0x1f 0x8b)
- Add configurable DEBUG constant for verbose logging
- Add debug logging for buffer sizes and decompression in both files

The decompression adds ~1-5ms per request but is necessary to parse
HTML correctly. CloudFront caching minimizes Lambda invocations.

Await async markdown conversion functions

The convertToMarkdown and convertSectionToMarkdown functions are async
but weren't being awaited, causing the Lambda to return a Promise object
instead of a string. This resulted in CloudFront validation errors:
"The body is not a string, is not an object, or exceeds the maximum size"

**Troubleshooting**:

- Set DEBUG for troubleshooting in lambda

2025-11-21 13:49:36 -06:00

lambda-edge/markdown-generator

feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.

2025-11-21 13:49:36 -06:00

README.md

feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.

2025-11-21 13:49:36 -06:00

README.md

Lambda@Edge Markdown Generator

This directory contains the Lambda@Edge function that generates LLM-friendly Markdown versions of documentation pages on-demand for docs.influxdata.com.

Overview

When users request .md files (e.g., https://docs.influxdata.com/influxdb3/core/get-started/index.md), CloudFront triggers this Lambda function at the origin request stage. The function:

Fetches the corresponding HTML from S3
Converts it to clean Markdown using the shared library (scripts/lib/markdown-converter.js)
Returns Markdown with proper frontmatter and caching headers

Architecture

docs-v2/ (this repo)
├── scripts/
│   ├── lib/markdown-converter.js      # Shared conversion library
│   └── html-to-markdown.js            # Local CLI for testing
├── dist/
│   └── utils/product-mappings.js      # Product detection (compiled from TS)
└── deploy/
    └── llm-markdown/
        └── lambda-edge/
            └── markdown-generator/
                ├── index.js           # Lambda handler
                ├── lib/s3-utils.js    # S3 operations
                ├── deploy.sh          # Deployment script
                └── package.json       # Lambda dependencies

The Lambda handler imports the conversion library from the parent repo using relative paths, so everything stays in sync automatically.

Prerequisites

Node.js 18+
npm or yarn
AWS CLI configured with appropriate credentials
Access to the S3 bucket containing documentation HTML files

Local Testing

Before deploying, test the conversion library locally:

# From docs-v2 root
yarn install
yarn build:ts
npx hugo --quiet

# Generate markdown for testing
node scripts/html-to-markdown.js --path influxdb3/core/get-started --limit 5 --verbose

# Run validation tests
node cypress/support/run-e2e-specs.js \
  --spec "cypress/e2e/content/markdown-content-validation.cy.js"

Testing in AWS Console

You can test the Lambda function directly in the AWS Console using the Test feature:

Single Page Request

{
  "Records": [
    {
      "cf": {
        "request": {
          "uri": "/influxdb3/core/get-started/index.md",
          "querystring": "",
          "headers": {
            "host": [
              {
                "key": "Host",
                "value": "docs.influxdata.com"
              }
            ]
          }
        }
      }
    }
  ]
}

Section Aggregation Request

{
  "Records": [
    {
      "cf": {
        "request": {
          "uri": "/influxdb3/core/get-started/index.section.md",
          "querystring": "",
          "headers": {
            "host": [
              {
                "key": "Host",
                "value": "docs.influxdata.com"
              }
            ]
          }
        }
      }
    }
  ]
}

Expected Response: Lambda@Edge origin-request handlers return a modified request object (not the final response). The function should return the request unchanged since it's handled by S3.

Code Architecture

CommonJS Module System

The Lambda function uses CommonJS (require/module.exports) instead of ES6 modules (import/export) because:

Lambda@Edge Compatibility: Lambda@Edge Node.js 18 runtime works best with CommonJS
No package.json type field: The package.json must NOT include "type": "module"
Shared Library: The markdown-converter library (scripts/lib/markdown-converter.js) has been converted to CommonJS for Lambda compatibility

Key Files

index.js: Lambda handler using CommonJS exports (exports.handler)
lib/s3-utils.js: S3 operations using CommonJS (module.exports)
scripts/lib/markdown-converter.js: Shared conversion library (CommonJS)
package.json: Dependencies WITHOUT "type": "module" field

Testing Module Loading

To verify the Lambda function loads correctly:

cd deploy/llm-markdown/lambda-edge/markdown-generator
node -e "const h = require('./index.js'); console.log('Handler type:', typeof h.handler);"

Expected output: Handler type: function

Deployment

Step 1: Install Lambda Dependencies

# Navigate to Lambda directory
cd deploy/llm-markdown/lambda-edge/markdown-generator

# Install dependencies
npm install

Step 2: Deploy to AWS

# Deploy to staging environment
./deploy.sh staging

# Test in staging
curl -I https://test2.docs.influxdata.com/influxdb3/core/get-started/index.md

# Deploy to production (after staging verification)
./deploy.sh production

What Happens During Deployment

The deploy.sh script:

Runs npm install --production to ensure all dependencies are installed
Bundles the Lambda function with dependencies and conversion library
Creates a deployment package (ZIP file)
Uploads to AWS Lambda
Publishes a new version
Updates the CloudFront distribution

Configuration

Environment Variables

Configure in Lambda function settings:

S3_BUCKET: S3 bucket containing HTML files
NODE_ENV: Set to production

Lambda Settings

Runtime: Node.js 18.x
Memory: 512 MB
Timeout: 30 seconds
Handler: index.handler
Trigger: CloudFront Origin Request

Development Workflow

Making Changes to Conversion Logic

Edit the conversion library:

# Edit scripts/lib/markdown-converter.js in docs-v2 root

Test locally:

npx hugo --quiet
node scripts/html-to-markdown.js --path influxdb3/core/get-started --limit 5 --verbose

Run validation tests:

node cypress/support/run-e2e-specs.js \
  --spec "cypress/e2e/content/markdown-content-validation.cy.js"

Deploy to Lambda:

cd deploy/llm-markdown/lambda-edge/markdown-generator
./deploy.sh staging

# Test
curl https://test2.docs.influxdata.com/influxdb3/core/get-started/index.md

# Deploy to production
./deploy.sh production

Making Changes to Lambda Handler

Edit Lambda files:
```
# Edit index.js or lib/s3-utils.js
```

Deploy:

cd deploy/llm-markdown/lambda-edge/markdown-generator
./deploy.sh staging

Monitoring

CloudWatch Logs

Lambda@Edge logs are written to CloudWatch in the region where the function executes (typically us-east-1):

# View logs
aws logs tail /aws/lambda/us-east-1.docs-markdown-generator --follow

# Or use AWS Console:
# CloudWatch > Log groups > /aws/lambda/us-east-1.docs-markdown-generator

Key Metrics

Invocations: Number of .md requests
Duration: Time to generate markdown
Errors: Failed conversions or S3 errors
Cache Hit Rate: CloudFront caching effectiveness (should be high)

Troubleshooting

Issue: "Cannot find module" errors

Cause: Dependencies not installed or TypeScript not compiled

Solution:

# In docs-v2 root
yarn build:ts

# In Lambda directory
cd deploy/llm-markdown/lambda-edge/markdown-generator
npm install

Issue: "No article content found" warnings

Cause: Page doesn't have <article class="article--content"> element

Solution: This is normal for index/list pages. The converter skips these pages.

Issue: S3 access denied errors

Cause: Lambda execution role lacks S3 permissions

Solution: Update Lambda execution role with:

s3:GetObject permission for the docs S3 bucket
s3:ListBucket permission for listing child pages

Issue: Markdown output doesn't reflect recent changes

Cause: CloudFront caching or Lambda using old code

Solution:

Ensure changes are saved
Redeploy Lambda: ./deploy.sh production
Invalidate CloudFront cache if needed

Key Features

Product Detection: Automatically detects InfluxDB product/version from URL
Frontmatter Generation: YAML frontmatter with title, description, URL, product info
Section Aggregation: Combines parent + children into single LLM-friendly document
Shortcode Evaluation: All Hugo shortcodes are evaluated (no raw {{< syntax)
UI Element Removal: Strips navigation, feedback forms, format selectors
GitHub Callouts: Converts to GitHub-style callout syntax

Security

Lambda execution role has minimal permissions (S3 read-only)
No secrets or credentials in code
CloudFront caching reduces Lambda invocations
Rate limiting via CloudFront/AWS WAF

Local Testing: ../../DOCS-TESTING.md - Comprehensive testing guide
Deployment Overview: ../../DOCS-DEPLOYING.md - High-level deployment info
Conversion Library: ../../scripts/lib/markdown-converter.js - Core conversion logic
Local CLI: ../../scripts/html-to-markdown.js - Local markdown generation

Support

For issues or questions:

Documentation Team: Slack #team-docs
GitHub Issues: docs-v2 issues
CloudWatch Logs: See Monitoring section above

README.md

Lambda@Edge Markdown Generator

Overview

Architecture

Prerequisites

Local Testing

Testing in AWS Console

Single Page Request

Section Aggregation Request

Code Architecture

CommonJS Module System

Key Files

Testing Module Loading

Deployment

Step 1: Install Lambda Dependencies

Step 2: Deploy to AWS

What Happens During Deployment

Configuration

Environment Variables

Lambda Settings

Development Workflow

Making Changes to Conversion Logic

Making Changes to Lambda Handler

Monitoring

CloudWatch Logs

Key Metrics

Troubleshooting

Issue: "Cannot find module" errors

Issue: "No article content found" warnings

Issue: S3 access denied errors

Issue: Markdown output doesn't reflect recent changes

Key Features

Security

Related Documentation

Support