chore(tc-heartbeat-status): remove impl plans

2026-03-18 16:50:55 -06:00 · 2026-03-18 16:50:55 -06:00 · c519072c55
parent 07f29b2cf9
commit c519072c55
2 changed files with 0 additions and 1103 deletions
--- a/docs/superpowers/plans/2026-03-17-tc-cel-status.md
+++ b/docs/superpowers/plans/2026-03-17-tc-cel-status.md
@ -1,972 +0,0 @@
-# Telegraf Controller: Agent Status & CEL Reference Implementation Plan
-
-> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add comprehensive agent status configuration docs and a multi-page CEL expression reference to the Telegraf Controller documentation.
-
-**Architecture:** Update the existing status stub page with practical examples and create four new pages under `reference/cel/`. All content is documentation-only (Markdown). The CEL reference is self-contained and does not depend on the heartbeat plugin docs.
-
-**Tech Stack:** Hugo, Markdown, TOML (config examples), CEL (expression examples)
-
-**Spec:** `docs/superpowers/specs/2026-03-17-tc-cel-status-design.md`
-
-***
-
-## File Map
-
-| Action | File                                                     | Responsibility                                            |
-| ------ | -------------------------------------------------------- | --------------------------------------------------------- |
-| Modify | `content/telegraf/controller/agents/status.md`           | Practical guide: status values, config examples, UI steps |
-| Create | `content/telegraf/controller/reference/cel/_index.md`    | CEL overview, evaluation flow, config reference           |
-| Create | `content/telegraf/controller/reference/cel/variables.md` | All CEL variables: top-level, agent, inputs, outputs      |
-| Create | `content/telegraf/controller/reference/cel/functions.md` | CEL functions, operators, quick reference                 |
-| Create | `content/telegraf/controller/reference/cel/examples.md`  | Real-world CEL expression examples by scenario            |
-
-### Conventions (from existing TC docs)
-
- **Menu:** All TC pages use `menu: telegraf_controller:`. Child pages use `parent:` matching the parent's `name`.
- **Reference children:** Existing reference pages use `parent: Reference` with weights 101-110. The CEL section uses `parent: Reference` on `_index.md` with weight 107 (after authorization at 106, before EULA at 110). CEL child pages use `parent: CEL expressions`.
- **Product name shortcode:** Use `{{% product-name %}}` for "Telegraf Controller" and `{{% product-name "short" %}}` for "Controller".
- **Dynamic values shortcode:** Wrap TOML configs containing `&{...}` parameters with `{{% telegraf/dynamic-values %}}...{{% /telegraf/dynamic-values %}}`.
- **Callouts:** Use `> [!Note]`, `> [!Important]`, `> [!Warning]` syntax.
- **Semantic line feeds:** One sentence per line.
-
-***
-
-## Task 1: Create CEL reference index page
-
-**Files:**
-
- Create: `content/telegraf/controller/reference/cel/_index.md`
-
- [ ] **Step 1: Create the CEL reference index page**
-
-Create `content/telegraf/controller/reference/cel/_index.md` with the following content:
-
-````markdown
---
-title: CEL expressions
-description: >
-  Reference documentation for Common Expression Language (CEL) expressions used
-  to evaluate Telegraf agent status in {{% product-name %}}.
-menu:
-  telegraf_controller:
-    name: CEL expressions
-    parent: Reference
-weight: 107
-related:
-  - /telegraf/controller/agents/status/
-  - /telegraf/v1/output-plugins/heartbeat/
---
-
-[Common Expression Language (CEL)](https://cel.dev) is a lightweight expression
-language designed for evaluating simple conditions.
-{{% product-name %}} uses CEL expressions in the Telegraf
-[heartbeat output plugin](/telegraf/v1/output-plugins/heartbeat/) to evaluate
-agent status based on runtime data such as metric counts, error rates, and
-plugin statistics.
-
-## How status evaluation works
-
-You define CEL expressions for three status levels in the
-`[outputs.heartbeat.status]` section of your Telegraf configuration:
-
- **`ok`** — The agent is healthy.
- **`warn`** — The agent has a potential issue.
- **`fail`** — The agent has a critical problem.
-
-Each expression is a CEL program that returns a boolean value.
-Telegraf evaluates expressions in a configurable order (default:
-`ok`, `warn`, `fail`) and assigns the status of the **first expression that
-evaluates to `true`**.
-
-If no expression evaluates to `true`, the `default` status is used
-(default: `"ok"`).
-
-### Initial status
-
-Use the `initial` setting to define a status before the first Telegraf flush
-cycle.
-If `initial` is not set or is empty, Telegraf evaluates the status expressions
-immediately, even before the first flush.
-
-### Evaluation order
-
-The `order` setting controls which expressions are evaluated and in what
-sequence.
-
-> [!Note]
-> If you omit a status from the `order` list, its expression is **not
-> evaluated**.
-
-## Configuration reference
-
-Configure status evaluation in the `[outputs.heartbeat.status]` section of the
-heartbeat output plugin.
-You must include `"status"` in the `include` list for status evaluation to take
-effect.
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ## CEL expressions that return a boolean.
-    ## The first expression that evaluates to true sets the status.
-    ok = "metrics > 0"
-    warn = "log_errors > 0"
-    fail = "log_errors > 10"
-
-    ## Evaluation order (default: ["ok", "warn", "fail"])
-    order = ["ok", "warn", "fail"]
-
-    ## Default status when no expression matches
-    ## Options: "ok", "warn", "fail", "undefined"
-    default = "ok"
-
-    ## Initial status before the first flush cycle
-    ## Options: "ok", "warn", "fail", "undefined", ""
-    # initial = ""
-````
-
-| Option    | Type            | Default                  | Description                                                                                                     |
-| :-------- | :-------------- | :----------------------- | :-------------------------------------------------------------------------------------------------------------- |
-| `ok`      | string (CEL)    | `"false"`                | Expression that, when `true`, sets status to **ok**.                                                            |
-| `warn`    | string (CEL)    | `"false"`                | Expression that, when `true`, sets status to **warn**.                                                          |
-| `fail`    | string (CEL)    | `"false"`                | Expression that, when `true`, sets status to **fail**.                                                          |
-| `order`   | list of strings | `["ok", "warn", "fail"]` | Order in which expressions are evaluated.                                                                       |
-| `default` | string          | `"ok"`                   | Status used when no expression evaluates to `true`. Options: `ok`, `warn`, `fail`, `undefined`.                 |
-| `initial` | string          | `""`                     | Status before the first flush. Options: `ok`, `warn`, `fail`, `undefined`, `""` (empty = evaluate expressions). |
-
-{{< children hlevel="h2" >}}
-
-````
-
- [ ] **Step 2: Verify the file renders correctly**
-
-Run: `npx hugo server` and navigate to the CEL expressions reference page.
-Verify: page renders, navigation shows "CEL expressions" under "Reference", child page links appear.
-
- [ ] **Step 3: Commit**
-
-```bash
-git add content/telegraf/controller/reference/cel/_index.md
-git commit -m "feat(tc-cel): add CEL expressions reference index page"
-````
-
-***
-
-## Task 2: Create CEL variables reference page
-
-**Files:**
-
- Create: `content/telegraf/controller/reference/cel/variables.md`
-
- [ ] **Step 1: Create the variables reference page**
-
-Create `content/telegraf/controller/reference/cel/variables.md` with the following content:
-
-````markdown
---
-title: CEL variables
-description: >
-  Reference for variables available in CEL expressions used to evaluate
-  Telegraf agent status in {{% product-name %}}.
-menu:
-  telegraf_controller:
-    name: Variables
-    parent: CEL expressions
-weight: 201
---
-
-CEL expressions for agent status evaluation have access to variables that
-represent data collected by Telegraf since the last successful heartbeat message
-(unless noted otherwise).
-
-## Top-level variables
-
-| Variable | Type | Description |
-|:---------|:-----|:------------|
-| `metrics` | int | Number of metrics arriving at the heartbeat output plugin. |
-| `log_errors` | int | Number of errors logged by the Telegraf instance. |
-| `log_warnings` | int | Number of warnings logged by the Telegraf instance. |
-| `last_update` | time | Timestamp of the last successful heartbeat message. Use with `now()` to calculate durations or rates. |
-| `agent` | map | Agent-level statistics. See [Agent statistics](#agent-statistics). |
-| `inputs` | map | Input plugin statistics. See [Input plugin statistics](#input-plugin-statistics-inputs). |
-| `outputs` | map | Output plugin statistics. See [Output plugin statistics](#output-plugin-statistics-outputs). |
-
-## Agent statistics
-
-The `agent` variable is a map containing aggregate statistics for the entire
-Telegraf instance.
-These fields correspond to the `internal_agent` metric from the
-Telegraf [internal input plugin](/telegraf/v1/plugins/#input-internal).
-
-| Field | Type | Description |
-|:------|:-----|:------------|
-| `agent.metrics_written` | int | Total metrics written by all output plugins. |
-| `agent.metrics_rejected` | int | Total metrics rejected by all output plugins. |
-| `agent.metrics_dropped` | int | Total metrics dropped by all output plugins. |
-| `agent.metrics_gathered` | int | Total metrics collected by all input plugins. |
-| `agent.gather_errors` | int | Total collection errors across all input plugins. |
-| `agent.gather_timeouts` | int | Total collection timeouts across all input plugins. |
-
-### Example
-
-```cel
-agent.gather_errors > 0
-````
-
-## Input plugin statistics (`inputs`)
-
-The `inputs` variable is a map where each key is a plugin type (for example,
-`cpu` for `inputs.cpu`) and the value is a **list** of plugin instances.
-Each entry in the list represents one configured instance of that plugin type.
-
-These fields correspond to the `internal_gather` metric from the Telegraf
-[internal input plugin](/telegraf/v1/plugins/#input-internal).
-
-| Field              | Type   | Description                                                                               |
-| :----------------- | :----- | :---------------------------------------------------------------------------------------- |
-| `id`               | string | Unique plugin identifier.                                                                 |
-| `alias`            | string | Alias set for the plugin. Only exists if an alias is defined in the plugin configuration. |
-| `errors`           | int    | Collection errors for this plugin instance.                                               |
-| `metrics_gathered` | int    | Number of metrics collected by this instance.                                             |
-| `gather_time_ns`   | int    | Time spent gathering metrics, in nanoseconds.                                             |
-| `gather_timeouts`  | int    | Number of timeouts during metric collection.                                              |
-| `startup_errors`   | int    | Number of times the plugin failed to start.                                               |
-
-### Access patterns
-
-Access a specific plugin type and iterate over its instances:
-
-```cel
-// Check if any cpu input instance has errors
-inputs.cpu.exists(i, i.errors > 0)
-```
-
-```cel
-// Access the first instance of the cpu input
-inputs.cpu[0].metrics_gathered
-```
-
-Use `has()` to safely check if a plugin type exists before accessing it:
-
-```cel
-// Safe access — returns false if no cpu input is configured
-has(inputs.cpu) && inputs.cpu.exists(i, i.errors > 0)
-```
-
-## Output plugin statistics (`outputs`)
-
-The `outputs` variable is a map with the same structure as `inputs`.
-Each key is a plugin type (for example, `influxdb_v2` for `outputs.influxdb_v2`)
-and the value is a list of plugin instances.
-
-These fields correspond to the `internal_write` metric from the Telegraf
-[internal input plugin](/telegraf/v1/plugins/#input-internal).
-
-| Field              | Type   | Description                                                                                              |
-| :----------------- | :----- | :------------------------------------------------------------------------------------------------------- |
-| `id`               | string | Unique plugin identifier.                                                                                |
-| `alias`            | string | Alias set for the plugin. Only exists if an alias is defined in the plugin configuration.                |
-| `errors`           | int    | Write errors for this plugin instance.                                                                   |
-| `metrics_filtered` | int    | Number of metrics filtered by the output.                                                                |
-| `write_time_ns`    | int    | Time spent writing metrics, in nanoseconds.                                                              |
-| `startup_errors`   | int    | Number of times the plugin failed to start.                                                              |
-| `metrics_added`    | int    | Number of metrics added to the output buffer.                                                            |
-| `metrics_written`  | int    | Number of metrics written to the output destination.                                                     |
-| `metrics_rejected` | int    | Number of metrics rejected by the service or serialization.                                              |
-| `metrics_dropped`  | int    | Number of metrics dropped (for example, due to buffer fullness).                                         |
-| `buffer_size`      | int    | Current number of metrics in the output buffer.                                                          |
-| `buffer_limit`     | int    | Capacity of the output buffer. Irrelevant for disk-based buffers.                                        |
-| `buffer_fullness`  | float  | Ratio of metrics in the buffer to capacity. Can exceed `1.0` (greater than 100%) for disk-based buffers. |
-
-### Access patterns
-
-```cel
-// Check if any InfluxDB v2 output has write errors
-outputs.influxdb_v2.exists(o, o.errors > 0)
-```
-
-```cel
-// Check buffer fullness across all instances of an output
-outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8)
-```
-
-## Accumulation behavior
-
-Unless noted otherwise, all variable values are **accumulated since the last
-successful heartbeat message**.
-Use the `last_update` variable with `now()` to calculate rates — for example:
-
-```cel
-// True if the error rate exceeds 1 error per minute
-log_errors > 0 && duration.getMinutes(now() - last_update) > 0
-  && log_errors / duration.getMinutes(now() - last_update) > 1
-```
-
-````
-
- [ ] **Step 2: Verify the file renders correctly**
-
-Run: `npx hugo server` and navigate to the Variables page under CEL expressions.
-Verify: page renders, tables display correctly, code blocks have proper syntax highlighting, navigation shows "Variables" under "CEL expressions".
-
- [ ] **Step 3: Commit**
-
-```bash
-git add content/telegraf/controller/reference/cel/variables.md
-git commit -m "feat(tc-cel): add CEL variables reference page"
-````
-
-***
-
-## Task 3: Create CEL functions reference page
-
-**Files:**
-
- Create: `content/telegraf/controller/reference/cel/functions.md`
-
- [ ] **Step 1: Create the functions reference page**
-
-Create `content/telegraf/controller/reference/cel/functions.md` with the following content:
-
-````markdown
---
-title: CEL functions and operators
-description: >
-  Reference for functions and operators available in CEL expressions used to
-  evaluate Telegraf agent status in {{% product-name %}}.
-menu:
-  telegraf_controller:
-    name: Functions
-    parent: CEL expressions
-weight: 202
---
-
-CEL expressions for agent status evaluation support built-in CEL operators and
-the following function libraries.
-
-## Time functions
-
-### `now()`
-
-Returns the current time.
-Use with `last_update` to calculate durations or detect stale data.
-
-```cel
-// True if more than 10 minutes since last heartbeat
-now() - last_update > duration('10m')
-````
-
-```cel
-// True if more than 5 minutes since last heartbeat
-now() - last_update > duration('5m')
-```
-
-## Math functions
-
-Math functions from the
-[CEL math library](https://github.com/google/cel-go/blob/master/ext/README.md#math)
-are available for numeric calculations.
-
-### Commonly used functions
-
-| Function                   | Description                 | Example                                    |
-| :------------------------- | :-------------------------- | :----------------------------------------- |
-| `math.greatest(a, b, ...)` | Returns the greatest value. | `math.greatest(log_errors, log_warnings)`  |
-| `math.least(a, b, ...)`    | Returns the least value.    | `math.least(agent.metrics_gathered, 1000)` |
-
-### Example
-
-```cel
-// Warn if either errors or warnings exceed a threshold
-math.greatest(log_errors, log_warnings) > 5
-```
-
-## String functions
-
-String functions from the
-[CEL strings library](https://github.com/google/cel-go/blob/master/ext/README.md#strings)
-are available for string operations.
-These are useful when checking plugin `alias` or `id` fields.
-
-### Example
-
-```cel
-// Check if any input plugin has an alias containing "critical"
-inputs.cpu.exists(i, has(i.alias) && i.alias.contains("critical"))
-```
-
-## Encoding functions
-
-Encoding functions from the
-[CEL encoder library](https://github.com/google/cel-go/blob/master/ext/README.md#encoders)
-are available for encoding and decoding values.
-
-## Operators
-
-CEL supports standard operators for building expressions.
-
-### Comparison operators
-
-| Operator | Description           | Example                        |
-| :------- | :-------------------- | :----------------------------- |
-| `==`     | Equal                 | `metrics == 0`                 |
-| `!=`     | Not equal             | `log_errors != 0`              |
-| `<`      | Less than             | `agent.metrics_gathered < 100` |
-| `<=`     | Less than or equal    | `buffer_fullness <= 0.5`       |
-| `>`      | Greater than          | `log_errors > 10`              |
-| `>=`     | Greater than or equal | `metrics >= 1000`              |
-
-### Logical operators
-
-| Operator | Description | Example                                  |
-| :------- | :---------- | :--------------------------------------- |
-| `&&`     | Logical AND | `log_errors > 0 && metrics == 0`         |
-| `\|\|`   | Logical OR  | `log_errors > 10 \|\| log_warnings > 50` |
-| `!`      | Logical NOT | `!(metrics > 0)`                         |
-
-### Arithmetic operators
-
-| Operator | Description    | Example                                          |
-| :------- | :------------- | :----------------------------------------------- |
-| `+`      | Addition       | `log_errors + log_warnings`                      |
-| `-`      | Subtraction    | `agent.metrics_gathered - agent.metrics_dropped` |
-| `*`      | Multiplication | `log_errors * 2`                                 |
-| `/`      | Division       | `agent.metrics_dropped / agent.metrics_gathered` |
-| `%`      | Modulo         | `metrics % 100`                                  |
-
-### Ternary operator
-
-```cel
-// Conditional expression
-log_errors > 10 ? true : false
-```
-
-### List operations
-
-| Function                 | Description                    | Example                                     |
-| :----------------------- | :----------------------------- | :------------------------------------------ |
-| `exists(var, condition)` | True if any element matches.   | `inputs.cpu.exists(i, i.errors > 0)`        |
-| `all(var, condition)`    | True if all elements match.    | `outputs.influxdb_v2.all(o, o.errors == 0)` |
-| `size()`                 | Number of elements.            | `inputs.cpu.size() > 0`                     |
-| `has()`                  | True if a field or key exists. | `has(inputs.cpu)`                           |
-
-````
-
- [ ] **Step 2: Verify the file renders correctly**
-
-Run: `npx hugo server` and navigate to the Functions page under CEL expressions.
-Verify: page renders, tables display correctly, pipe characters in logical operators table render properly, navigation shows "Functions" under "CEL expressions".
-
- [ ] **Step 3: Commit**
-
-```bash
-git add content/telegraf/controller/reference/cel/functions.md
-git commit -m "feat(tc-cel): add CEL functions and operators reference page"
-````
-
-***
-
-## Task 4: Create CEL examples page
-
-**Files:**
-
- Create: `content/telegraf/controller/reference/cel/examples.md`
-
- [ ] **Step 1: Create the examples page**
-
-Create `content/telegraf/controller/reference/cel/examples.md` with the following content:
-
-````markdown
---
-title: CEL expression examples
-description: >
-  Real-world examples of CEL expressions for evaluating Telegraf agent status
-  in {{% product-name %}}.
-menu:
-  telegraf_controller:
-    name: Examples
-    parent: CEL expressions
-weight: 203
-related:
-  - /telegraf/controller/agents/status/
-  - /telegraf/controller/reference/cel/variables/
-  - /telegraf/controller/reference/cel/functions/
---
-
-Each example includes a scenario description, the CEL expression, a full
-heartbeat plugin configuration block, and an explanation.
-
-For the full list of available variables and functions, see:
-
- [CEL variables](/telegraf/controller/reference/cel/variables/)
- [CEL functions and operators](/telegraf/controller/reference/cel/functions/)
-
-## Basic health check
-
-**Scenario:** Report `ok` when Telegraf is actively processing metrics.
-Fall back to the default status (`ok`) when no expression matches — this means
-the agent is healthy as long as metrics are flowing.
-
-**Expression:**
-
-```cel
-ok = "metrics > 0"
-````
-
-**Configuration:**
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0"
-    default = "fail"
-```
-
-**How it works:** If the heartbeat plugin received metrics since the last
-heartbeat, the status is `ok`.
-If no metrics arrived, no expression matches and the `default` status of `fail`
-is used, indicating the agent is not processing data.
-
-## Error rate monitoring
-
-**Scenario:** Warn when any errors are logged and fail when the error count is
-high.
-
-**Expressions:**
-
-```cel
-warn = "log_errors > 0"
-fail = "log_errors > 10"
-```
-
-**Configuration:**
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "log_errors == 0 && log_warnings == 0"
-    warn = "log_errors > 0"
-    fail = "log_errors > 10"
-    order = ["fail", "warn", "ok"]
-    default = "ok"
-```
-
-**How it works:** Expressions are evaluated in `fail`, `warn`, `ok` order.
-If more than 10 errors occurred since the last heartbeat, the status is `fail`.
-If 1-10 errors occurred, the status is `warn`.
-If no errors or warnings occurred, the status is `ok`.
-
-## Buffer health
-
-**Scenario:** Warn when any output plugin's buffer exceeds 80% fullness,
-indicating potential data backpressure.
-
-**Expression:**
-
-```cel
-warn = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8)"
-fail = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.95)"
-```
-
-**Configuration:**
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0"
-    warn = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8)"
-    fail = "outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.95)"
-    order = ["fail", "warn", "ok"]
-    default = "ok"
-```
-
-**How it works:** The `outputs.influxdb_v2` map contains a list of all
-`influxdb_v2` output plugin instances.
-The `exists()` function iterates over all instances and returns `true` if any
-instance's `buffer_fullness` exceeds the threshold.
-At 95% fullness, the status is `fail`; at 80%, `warn`; otherwise `ok`.
-
-## Plugin-specific checks
-
-**Scenario:** Monitor a specific input plugin for collection errors and use
-safe access patterns to avoid errors when the plugin is not configured.
-
-**Expression:**
-
-```cel
-warn = "has(inputs.cpu) && inputs.cpu.exists(i, i.errors > 0)"
-fail = "has(inputs.cpu) && inputs.cpu.exists(i, i.startup_errors > 0)"
-```
-
-**Configuration:**
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0"
-    warn = "has(inputs.cpu) && inputs.cpu.exists(i, i.errors > 0)"
-    fail = "has(inputs.cpu) && inputs.cpu.exists(i, i.startup_errors > 0)"
-    order = ["fail", "warn", "ok"]
-    default = "ok"
-```
-
-**How it works:** The `has()` function checks if the `cpu` key exists in the
-`inputs` map before attempting to access it.
-This prevents evaluation errors when the plugin is not configured.
-If the plugin has startup errors, the status is `fail`.
-If it has collection errors, the status is `warn`.
-
-## Composite conditions
-
-**Scenario:** Combine multiple signals to detect a degraded agent — high error
-count combined with output buffer pressure.
-
-**Expression:**
-
-```cel
-fail = "log_errors > 5 && has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.9)"
-```
-
-**Configuration:**
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0 && log_errors == 0"
-    warn = "log_errors > 0 || (has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8))"
-    fail = "log_errors > 5 && has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.9)"
-    order = ["fail", "warn", "ok"]
-    default = "ok"
-```
-
-**How it works:** The `fail` expression requires **both** a high error count
-**and** buffer pressure to trigger.
-The `warn` expression uses `||` to trigger on **either** condition independently.
-This layered approach avoids false alarms from transient spikes in a single
-metric.
-
-## Time-based expressions
-
-**Scenario:** Warn when the time since the last successful heartbeat exceeds a
-threshold, indicating potential connectivity or performance issues.
-
-**Expression:**
-
-```cel
-warn = "now() - last_update > duration('10m')"
-fail = "now() - last_update > duration('30m')"
-```
-
-**Configuration:**
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0"
-    warn = "now() - last_update > duration('10m')"
-    fail = "now() - last_update > duration('30m')"
-    order = ["fail", "warn", "ok"]
-    default = "undefined"
-    initial = "undefined"
-```
-
-**How it works:** The `now()` function returns the current time and
-`last_update` is the timestamp of the last successful heartbeat.
-Subtracting them produces a duration that can be compared against a threshold.
-The `initial` status is set to `undefined` so new agents don't immediately show
-a stale-data warning before their first successful heartbeat.
-
-## Custom evaluation order
-
-**Scenario:** Use fail-first evaluation to prioritize detecting critical issues
-before checking for healthy status.
-
-**Configuration:**
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "agent-123"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0 && log_errors == 0"
-    warn = "log_errors > 0"
-    fail = "log_errors > 10 || agent.metrics_dropped > 100"
-    order = ["fail", "warn", "ok"]
-    default = "undefined"
-```
-
-**How it works:** By setting `order = ["fail", "warn", "ok"]`, the most severe
-conditions are checked first.
-If the agent has more than 10 logged errors or has dropped more than 100
-metrics, the status is `fail` — regardless of whether the `ok` or `warn`
-expression would also match.
-This is the recommended order for production deployments where early detection
-of critical issues is important.
-
-````
-
- [ ] **Step 2: Verify the file renders correctly**
-
-Run: `npx hugo server` and navigate to the Examples page under CEL expressions.
-Verify: page renders, all seven example sections display with correct TOML syntax highlighting, navigation shows "Examples" under "CEL expressions".
-
- [ ] **Step 3: Commit**
-
-```bash
-git add content/telegraf/controller/reference/cel/examples.md
-git commit -m "feat(tc-cel): add CEL expression examples page"
-````
-
-***
-
-## Task 5: Update the agent status page
-
-**Files:**
-
- Modify: `content/telegraf/controller/agents/status.md`
-
- [ ] **Step 1: Replace the status page content**
-
-Replace the full content of `content/telegraf/controller/agents/status.md` with the following:
-
-````markdown
---
-title: Set agent statuses
-description: >
-  Configure agent status evaluation using CEL expressions in the Telegraf
-  heartbeat output plugin and view statuses in {{% product-name %}}.
-menu:
-  telegraf_controller:
-    name: Set agent statuses
-    parent: Manage agents
-weight: 104
-related:
-  - /telegraf/controller/reference/cel/
-  - /telegraf/controller/agents/reporting-rules/
-  - /telegraf/v1/output-plugins/heartbeat/
---
-
-Agent statuses reflect the health of a Telegraf instance based on runtime data.
-The Telegraf [heartbeat output plugin](/telegraf/v1/output-plugins/heartbeat/)
-evaluates [Common Expression Language (CEL)](/telegraf/controller/reference/cel/)
-expressions against agent metrics, error counts, and plugin statistics to
-determine the status sent with each heartbeat.
-
-## Status values
-
-{{% product-name %}} displays the following agent statuses:
-
-| Status | Source | Description |
-|:-------|:-------|:------------|
-| **Ok** | Heartbeat plugin | The agent is healthy. Set when the `ok` CEL expression evaluates to `true`. |
-| **Warn** | Heartbeat plugin | The agent has a potential issue. Set when the `warn` CEL expression evaluates to `true`. |
-| **Fail** | Heartbeat plugin | The agent has a critical problem. Set when the `fail` CEL expression evaluates to `true`. |
-| **Undefined** | Heartbeat plugin | No expression matched and the `default` is set to `undefined`, or the `initial` status is `undefined`. |
-| **Not Reporting** | {{% product-name "short" %}} | The agent has not sent a heartbeat within the [reporting rule](/telegraf/controller/agents/reporting-rules/) threshold. {{% product-name "short" %}} applies this status automatically. |
-
-## How status evaluation works
-
-You define CEL expressions for `ok`, `warn`, and `fail` in the
-`[outputs.heartbeat.status]` section of your heartbeat plugin configuration.
-Telegraf evaluates expressions in a configurable order and assigns the status
-of the first expression that evaluates to `true`.
-
-For full details on evaluation flow, configuration options, and available
-variables and functions, see the
-[CEL expressions reference](/telegraf/controller/reference/cel/).
-
-## Configure agent statuses
-
-To configure status evaluation, add `"status"` to the `include` list in your
-heartbeat plugin configuration and define CEL expressions in the
-`[outputs.heartbeat.status]` section.
-
-### Example: Basic health check
-
-Report `ok` when metrics are flowing.
-If no metrics arrive, fall back to the `fail` status.
-
-{{% telegraf/dynamic-values %}}
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "&{agent_id}"
-  token = "${INFLUX_TOKEN}"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0"
-    default = "fail"
-````
-
-{{% /telegraf/dynamic-values %}}
-
-### Example: Error-based status
-
-Warn when errors are logged, fail when the error count is high.
-
-{{% telegraf/dynamic-values %}}
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "&{agent_id}"
-  token = "${INFLUX_TOKEN}"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "log_errors == 0 && log_warnings == 0"
-    warn = "log_errors > 0"
-    fail = "log_errors > 10"
-    order = ["fail", "warn", "ok"]
-    default = "ok"
-```
-
-{{% /telegraf/dynamic-values %}}
-
-### Example: Composite condition
-
-Combine error count and buffer pressure signals.
-
-{{% telegraf/dynamic-values %}}
-
-```toml
-[[outputs.heartbeat]]
-  url = "http://telegraf_controller.example.com/agents/heartbeat"
-  instance_id = "&{agent_id}"
-  token = "${INFLUX_TOKEN}"
-  interval = "1m"
-  include = ["hostname", "statistics", "status"]
-
-  [outputs.heartbeat.status]
-    ok = "metrics > 0 && log_errors == 0"
-    warn = "log_errors > 0 || (has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.8))"
-    fail = "log_errors > 5 && has(outputs.influxdb_v2) && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.9)"
-    order = ["fail", "warn", "ok"]
-    default = "ok"
-```
-
-{{% /telegraf/dynamic-values %}}
-
-For more examples including buffer health, plugin-specific checks, and
-time-based expressions, see
-[CEL expression examples](/telegraf/controller/reference/cel/examples/).
-
-## View an agent's status
-
-1. In {{% product-name %}}, go to **Agents**.
-2. Check the **Status** column for each agent.
-3. To see more details, click the **More button ({{% icon "tc-more" %}})** and
-   select **View Details**.
-4. The details page shows the reported status, reporting rule assignment, and
-   the time of the last heartbeat.
-
-## Learn more
-
- [CEL expressions reference](/telegraf/controller/reference/cel/) — Full
-  reference for CEL evaluation flow, configuration, variables, functions, and
-  examples.
- [Heartbeat output plugin](/telegraf/v1/output-plugins/heartbeat/) — Plugin
-  configuration reference.
- [Define reporting rules](/telegraf/controller/agents/reporting-rules/) — Configure
-  thresholds for the **Not Reporting** status.
-
-````
-
- [ ] **Step 2: Verify the file renders correctly**
-
-Run: `npx hugo server` and navigate to the "Set agent statuses" page under "Manage agents".
-Verify: page renders, status table displays correctly, all three example config blocks render with TOML syntax highlighting, cross-links resolve correctly, the "View an agent's status" section is preserved.
-
- [ ] **Step 3: Commit**
-
-```bash
-git add content/telegraf/controller/agents/status.md
-git commit -m "feat(tc-status): expand agent status page with CEL examples and configuration"
-````
-
-***
-
-## Task 6: Cross-link verification and final review
-
-**Files:**
-
- All files from Tasks 1-5
-
- [ ] **Step 1: Verify all cross-links**
-
-Run: `npx hugo server` and verify the following links resolve:
-
-1. Status page → CEL reference index: `/telegraf/controller/reference/cel/`
-2. Status page → Heartbeat plugin: `/telegraf/v1/output-plugins/heartbeat/`
-3. Status page → Reporting rules: `/telegraf/controller/agents/reporting-rules/`
-4. Status page → CEL examples: `/telegraf/controller/reference/cel/examples/`
-5. CEL index → Heartbeat plugin: `/telegraf/v1/output-plugins/heartbeat/`
-6. CEL examples → Variables: `/telegraf/controller/reference/cel/variables/`
-7. CEL examples → Functions: `/telegraf/controller/reference/cel/functions/`
-8. CEL examples → Status page: `/telegraf/controller/agents/status/`
-9. CEL variables → Internal input plugin: `/telegraf/v1/plugins/#input-internal`
-
- [ ] **Step 2: Verify navigation structure**
-
-In the left nav, confirm:
-
- "CEL expressions" appears under "Reference"
-
- "Variables", "Functions", and "Examples" appear as children of "CEL expressions"
-
- "Set agent statuses" remains under "Manage agents"
-
- [ ] **Step 3: Run Vale linting**
-
-Run: `.ci/vale/vale.sh content/telegraf/controller/agents/status.md content/telegraf/controller/reference/cel/`
-Fix any errors or warnings. Suggestions can be evaluated but are not blocking.
-
- [ ] **Step 4: Commit any linting fixes**
-
-```bash
-git add content/telegraf/controller/agents/status.md content/telegraf/controller/reference/cel/
-git commit -m "style(tc-cel): fix Vale linting issues"
-```
--- a/docs/superpowers/specs/2026-03-17-tc-cel-status-design.md
+++ b/docs/superpowers/specs/2026-03-17-tc-cel-status-design.md
@ -1,131 +0,0 @@
-# Telegraf Controller: Agent Status & CEL Expression Reference
-
-**Date:** 2026-03-17
-**Status:** Approved
-**Scope:** Documentation content (no code changes)
-
-## Summary
-
-Add comprehensive agent status configuration documentation to Telegraf Controller docs. This includes updating the existing status page with practical examples and creating a new multi-page CEL expression reference in the reference section.
-
-## Deliverables
-
-### 1. Update existing status page
-
-**File:** `content/telegraf/controller/agents/status.md`
-
-Expand from the current stub into a practical guide with the following structure:
-
-1. **Intro** — What agent statuses are, the four status values (`ok`, `warn`, `fail`, `undefined`) plus the Controller-applied `Not Reporting` state.
-2. **How status evaluation works** — Brief explanation of CEL expressions, evaluation order, defaults, and initial status. Links to the CEL reference for full details.
-3. **Configure agent statuses** — Example heartbeat plugin config with `include = ["status"]` and the `[outputs.heartbeat.status]` section. 2-3 practical inline examples:
-   - Basic health check (ok when metrics are flowing)
-   - Error-based warning/failure
-   - Composite condition
-4. **View an agent's status** — Keep existing UI steps as-is.
-5. **Link to CEL reference** — Points users to the full reference for all variables, functions, and more examples.
-
-### 2. Create CEL expression reference (multi-page)
-
-New section under `content/telegraf/controller/reference/cel/`.
-
-#### `_index.md` — CEL Overview
-
-1. **Intro** — What CEL is (Common Expression Language), how Telegraf Controller uses it to evaluate agent status from heartbeat data.
-2. **How status evaluation works** — Detailed evaluation flow:
-   - Expressions are defined for `ok`, `warn`, `fail` — each is a CEL program returning a boolean.
-   - Evaluation order is configurable via `order` (default: `["ok", "warn", "fail"]`).
-   - First expression evaluating to `true` sets the status.
-   - If none match, `default` status is used (default: `"ok"`).
-   - `initial` status can be set for the period before the first flush.
-3. **Configuration reference** — The `[outputs.heartbeat.status]` config block with all options: `ok`, `warn`, `fail`, `order`, `default`, `initial`.
-4. **Child page links** — Variables, Functions, Examples.
-
-#### `variables.md` — Variables Reference
-
-1. **Intro** — Variables represent data collected by Telegraf since the last successful heartbeat (unless noted otherwise).
-2. **Top-level variables** — Table or definition list:
-   - `metrics` (int) — metrics arriving at the heartbeat plugin
-   - `log_errors` (int) — errors logged
-   - `log_warnings` (int) — warnings logged
-   - `last_update` (time) — time of last successful heartbeat
-   - `agent` (map) — agent-level statistics
-   - `inputs` (map) — input plugin statistics
-   - `outputs` (map) — output plugin statistics
-3. **Agent statistics (`agent`)** — Map fields:
-   - `metrics_written`, `metrics_rejected`, `metrics_dropped`, `metrics_gathered`, `gather_errors`, `gather_timeouts`
-4. **Input plugin statistics (`inputs`)** — Map structure: key = plugin type (e.g., `cpu`), value = list of instances. Fields per instance:
-   - `id`, `alias`, `errors`, `metrics_gathered`, `gather_time_ns`, `gather_timeouts`, `startup_errors`
-5. **Output plugin statistics (`outputs`)** — Same map structure. Fields per instance:
-   - `id`, `alias`, `errors`, `metrics_filtered`, `write_time_ns`, `startup_errors`, `metrics_added`, `metrics_written`, `metrics_rejected`, `metrics_dropped`, `buffer_size`, `buffer_limit`, `buffer_fullness`
-6. **Note on accumulation** — Values accumulate since last successful heartbeat; `last_update` enables rate calculation.
-
-#### `functions.md` — Functions Reference
-
-1. **Intro** — CEL expressions support built-in CEL operators plus additional function libraries.
-2. **Time functions** — `now()` returns current time; usage with `last_update` for duration/rate calculations. Include usage example.
-3. **Math functions** — Link to CEL math library. Highlight commonly useful functions (e.g., `math.greatest()`, `math.least()`). Brief examples.
-4. **String functions** — Link to CEL strings library. Note usefulness for checking `alias` or `id` fields. Brief example.
-5. **Encoding functions** — Link to CEL encoder library. Brief note on relevance.
-6. **CEL operators reference** — Quick reference for comparison (`==`, `!=`, `<`, `>`), logical (`&&`, `||`, `!`), arithmetic (`+`, `-`, `*`, `/`), and ternary (`? :`) operators.
-
-#### `examples.md` — Examples
-
-Each example follows a consistent pattern: **scenario description → CEL expression(s) → full config block → explanation**.
-
-1. **Basic health check** — `ok` when metrics are flowing, `fail` otherwise.
-   - `ok = "metrics > 0"`
-2. **Error rate monitoring** — warn on logged errors, fail on high error count.
-   - `warn = "log_errors > 0"`, `fail = "log_errors > 10"`
-3. **Buffer health** — warn when any output buffer exceeds 80% fullness.
-   - Uses `outputs` map iteration to check `buffer_fullness` across plugin instances.
-4. **Plugin-specific checks** — check a specific input or output for errors.
-   - Demonstrates map access like `outputs.influxdb_v2.exists(o, o.errors > 0)` and safe access with `has()`.
-5. **Composite conditions** — combining multiple signals.
-   - `fail = "log_errors > 5 && outputs.influxdb_v2.exists(o, o.buffer_fullness > 0.9)"`
-6. **Time-based expressions** — using `now()` and `last_update` for staleness.
-   - e.g., `warn = "now() - last_update > duration('10m')"`
-7. **Custom evaluation order** — shows `order = ["fail", "warn", "ok"]` for fail-first evaluation.
-
-## File Structure
-
-### New files
-
-```
-content/telegraf/controller/reference/
-  cel/
-    _index.md          — CEL overview, evaluation flow, config reference
-    variables.md       — All variables (top-level, agent, inputs, outputs)
-    functions.md       — Functions, operators, quick reference
-    examples.md        — Real-world examples by scenario
-```
-
-### Updated files
-
-```
-content/telegraf/controller/agents/status.md  — Expand from stub to practical guide
-```
-
-## Navigation / Menu Structure
-
-The CEL section nests under the existing `Reference` parent in the `telegraf_controller` menu:
-
- **Reference** (existing)
-  - **CEL expressions** (`_index.md`)
-    - **Variables** (`variables.md`)
-    - **Functions** (`functions.md`)
-    - **Examples** (`examples.md`)
-
-## Cross-Linking Strategy
-
- Status page → CEL reference `_index.md` for full details
- Status page → heartbeat plugin for base config syntax
- CEL examples page → status page for UI context
- CEL variables/functions pages are **self-contained** (standalone, no dependency on heartbeat plugin docs)
-
-## Design Decisions
-
-1. **Standalone CEL reference** — The TC CEL reference is self-contained with its own variable and function documentation, independent of the heartbeat plugin page. Users configuring statuses in Controller shouldn't need to navigate to plugin docs for the variable reference.
-2. **Status page as practical guide** — Includes 2-3 inline examples for quick start; full reference lives in the CEL section.
-3. **Multi-page reference** — Keeps pages shorter and searchable. Variables, functions, and examples each get their own page. Function pages can be split further by category later if they grow large.
-4. **Consistent example format** — Every example includes scenario, expression, full config block, and explanation.