8.3 KiB
Spell Checking Configuration Guide
This document explains the spell-checking rules and tools used in the InfluxData documentation repository.
Overview
The docs-v2 repository uses two complementary spell-checking tools:
- Vale - Integrated documentation spell checker (runs in pre-commit hooks)
- Codespell - Lightweight code comment spell checker (recommended for CI/CD)
Tool Comparison
| Feature | Vale | Codespell |
|---|---|---|
| Purpose | Document spell checking | Code comment spell checking |
| Integration | Pre-commit hooks (Docker) | CI/CD pipeline |
| False Positives | Low (comprehensive filters) | Low (clear dictionary only) |
| Customization | YAML rules | INI config + dictionary lists |
| Performance | Moderate | Fast |
| True Positive Detection | Document-level | Code-level |
Vale Configuration
File: .ci/vale/styles/InfluxDataDocs/Spelling.yml
Why Code Blocks Are Included
Unlike other documentation style checkers, this configuration intentionally includes code blocks (~code is NOT excluded). This is critical because:
-
Comments in examples - Users copy code blocks with comments:
# Download and verify the GPG key curl https://repos.influxdata.com/influxdata-archive.keyTypos in such comments become part of user documentation/scripts.
-
Documentation strings - Code examples may include documentation:
def create_database(name): """This funtion creates a new database.""" # ← typo caught pass -
Inline comments - Shell script comments are checked:
#!/bin/bash # Retrive configuration from server influxctl config get
Filter Patterns Explained
1. camelCase and snake_case Identifiers
(?:_*[a-z]+(?:[A-Z][a-z0-9]*)+(?:[A-Z][a-zA-Z0-9]*)*|[a-z_][a-z0-9]*_[a-z0-9_]*)
Why: Prevents false positives on variable/method names while NOT matching normal prose
Breakdown:
- camelCase:
_*[a-z]+(?:[A-Z][a-z0-9]*)+(?:[A-Z][a-zA-Z0-9]*)*- Requires at least one uppercase letter (distinguishes
myVariablefromprovide) - Allows leading underscores for private variables (
_privateVar,__dunder__)
- Requires at least one uppercase letter (distinguishes
- snake_case:
[a-z_][a-z0-9]*_[a-z0-9_]*- Requires at least one underscore
- Distinguishes
my_variablefrom normal words
Examples Ignored: myVariable, targetField, getCwd, _privateVar, my_variable, terminationGracePeriodSeconds
Examples NOT Ignored (caught by spell-checker): provide, database, variable (normal prose)
2. UPPER_CASE Constants
[A-Z_][A-Z0-9_]+
Why: Prevents false positives on environment variables and constants
Examples Ignored: API_KEY, AWS_REGION, INFLUXDB_TOKEN
Note: Matches AWS, API (even single uppercase acronyms) - acceptable in docs
3. Version Numbers
\d+\.\d+(?:\.\d+)*
Why: Version numbers aren't words
Examples Ignored: 1.0, 2.3.1, 0.101.0, 1.2.3.4, v1.2.3
Note: Handles any number of version parts (2-part, 3-part, 4-part, etc.)
4. Hexadecimal Values
0[xX][0-9a-fA-F]+
Why: Hex values appear in code and aren't dictionary words
Examples Ignored: 0xFF, 0xDEADBEEF, 0x1A
5. URLs and Paths
/[a-zA-Z0-9/_\-\.\{\}]+ # Paths: /api/v2/write
https?://[^\s\)\]>"]+ # Full URLs: https://docs.example.com
Why: URLs contain hyphens, slashes, and special chars
Examples Ignored: /api/v2/write, /kapacitor/v1/, https://docs.influxdata.com
6. Shortcode Attributes
(?:endpoint|method|url|href|src|path)="[^"]+"
Why: Hugo shortcode attribute values often contain hyphens and special chars
Examples Ignored: endpoint="https://...", method="POST"
Future Enhancement: Add more attributes as needed (name, value, data, etc.)
7. Code Punctuation
[@#$%^&*()_+=\[\]{};:,.<>?/\\|-]+
Why: Symbols and special characters aren't words
Examples Ignored: (), {}, [], ->, =>, |, etc.
Ignored Words
The configuration references two word lists:
InfluxDataDocs/Terms/ignore.txt- Product and technical terms (non-English)InfluxDataDocs/Terms/query-functions.txt- InfluxQL/Flux function names
To add a word that should be ignored, edit the appropriate file.
Codespell Configuration
File: .codespellrc
Dictionary Choice: "clear"
Why "clear" (not "rare" or "code"):
-
clear- Unambiguous spelling errors only- Examples: "recieve" → "receive", "occured" → "occurred"
- False positive rate: ~1%
-
rare- Includes uncommon but valid English words- Would flag legitimate technical terms
- False positive rate: ~15-20%
-
code- Includes code-specific words- Too aggressive for documentation
- False positive rate: ~25-30%
Skip Directories
skip = public,node_modules,dist,.git,.vale,api-docs
public- Generated HTML (not source)node_modules- npm dependencies (not our code)dist- Compiled TypeScript output (not source).git- Repository metadata.vale- Vale configuration and cacheapi-docs- Generated OpenAPI specifications (many false positives)
Ignored Words
ignore-words-list = aks,invokable
aks- Azure Kubernetes Service (acronym)invokable- InfluxData product branding term (scriptable tasks/queries)
To add more:
- Edit
.codespellrc - Add word to
ignore-words-list(comma-separated) - Add inline comment explaining why
Running Spell Checkers
Vale (Pre-commit)
Vale automatically runs on files you commit via Lefthook.
Manual check:
# Check all content
docker compose run -T vale content/**/*.md
# Check specific file
docker compose run -T vale content/influxdb/cloud/reference/cli.md
Codespell (Manual/CI)
# Check entire content directory
codespell content/ --builtin clear
# Check specific directory
codespell content/influxdb3/core/
# Interactive mode (prompts for fixes)
codespell content/ --builtin clear -i 3
# Auto-fix (USE WITH CAUTION)
codespell content/ --builtin clear -w
Rule Validation
The spell-checking rules are designed to:
✅ Catch real spelling errors (true positives) ✅ Ignore code patterns, identifiers, and paths (false negative prevention) ✅ Respect product branding terms (invokable, Flux, InfluxQL) ✅ Work seamlessly in existing workflows
Manual Validation
Create a test file with various patterns:
# Test camelCase handling
echo "variable myVariable is defined" | codespell
# Test version numbers
echo "InfluxDB version 2.3.1 is released" | codespell
# Test real typos (should be caught)
echo "recieve the data" | codespell
Troubleshooting
Vale: False Positives
Problem: Vale flags a word that should be valid
Solutions:
- Check if it's a code identifier (camelCase, UPPER_CASE, hex, version)
- Add to
InfluxDataDocs/Terms/ignore.txtif it's a technical term - Add filter pattern to
.ci/vale/styles/InfluxDataDocs/Spelling.ymlif it's a pattern
Codespell: False Positives
Problem: Codespell flags a legitimate term
Solutions:
- Add to
ignore-words-listin.codespellrc - Add skip directory if entire directory should be excluded
- Use
-i 3(interactive mode) to review before accepting
Both Tools: Missing Real Errors
Problem: A real typo isn't caught
Solutions:
- Verify it's actually a typo (not a branding term or intentional)
- Check if it's in excluded scope (tables, URLs, code identifiers)
- Report as GitHub issue for tool improvement
Contributing
When adding content:
- Use semantic line feeds (one sentence per line)
- Run Vale pre-commit checks before committing
- Test code block comments for typos
- Avoid adding to ignore lists when possible
- Document why you excluded a term (if necessary)
Related Files
.ci/vale/styles/InfluxDataDocs/- Vale rule configuration.codespellrc- Codespell configuration.codespellignore- Codespell ignore word listDOCS-CONTRIBUTING.md- General contribution guidelinesDOCS-TESTING.md- Testing and validation guide
Future Improvements
- Create comprehensive test suite for spell-checking rules
- Document how to add product-specific branding terms
- Consider adding codespell to CI/CD pipeline
- Monitor and update ignore lists quarterly