docs-v2

9.5 KiB

Raw Blame History

title

description

weight

Telegraf Controller architecture

Architectural overview of the {{% product-name %}} application.

telegraf_controller

name	parent
Architectural overview	Reference

105

{{% product-name %}} is a standalone application that provides centralized management for Telegraf agents. It runs as a single binary that starts two separate servers: a web interface/API server and a dedicated high-performance heartbeat server for agent monitoring.

Runtime Architecture

Application Components

When you run the Telegraf Controller binary, it starts four main subsystems:

Web Server: Serves the management interface (default port: 8888)
API Server: Handles configuration management and administrative requests (served on the same port as the web server)
Heartbeat Server: Dedicated high-performance server for agent heartbeats (default port: 8000)
Background Scheduler: Monitors agent health every 60 seconds

Process Model

telegraf_controller (single process, multiple servers)
- Main HTTP Server (port 8888)
  - Web UI (/)
  - API Endpoints (/api/*)
- Heartbeat Server (port 8000)
  - POST /heartbeat (high-performance endpoint)
- Database Connection
  - SQLite or PostgreSQL
- Background Tasks
  - Agent Status Monitor (60s interval)

The dual-server architecture separates high-frequency heartbeat traffic from regular management operations, ensuring that the web interface remains responsive even under heavy agent load.

Configuration

{{% product-name %}} configuration is controlled through command options and environment variables.

Command Option	Environment Variable	Description
`--port`	`PORT`	API server port (default is `8888`)
`--heartbeat-port`	`HEARTBEAT_PORT`	Heartbeat service port (default: `8000`)
`--database`	`DATABASE`	Database filepath or URL (default is SQLite path)
`--ssl-cert`	`SSL_CERT`	Path to SSL certificate
`--ssl-key`	`SSL_KEY`	Path to SSL private key

To use environment variables, create a .env file in the same directory as the binary or export these environment variables in your terminal session.

Database Selection

{{% product-name %}} automatically selects the database type based on the DATABASE string:

SQLite (default): Best for development and small deployments with less than 1000 agents. Database file created automatically.
PostgreSQL: Required for large deployments. Must be provisioned separately.

Example PostgreSQL configuration:

DATABASE="postgresql://user:password@localhost:5432/telegraf_controller"

Data Flow

Agent registration and heartbeats

{{< diagram >}} flowchart LR T["Telegraf Agents
(POST heartbeats)"] --> H["Port 8000
Heartbeat Server"] H --Direct Write--> D[("Database")] W["Web UI/API
"] --> A["Port 8888
API Server"] --View Agents (Read-Only)--> D
R["Rust Scheduler
(Agent status updates)"] --> D

Agents send heartbeats:

Telegraf agents with the heartbeat output plugin send POST requests to the dedicated heartbeat server (port 8000 by default).
Heartbeat server processes the heartbeat:

The heartbeat server is a high-performance Rust-based HTTP server that:
- Receives the POST request at /agents/heartbeat
- Validates the heartbeat payload
- Extracts agent information (ID, hostname, IP address, status, etc.)
- Uniquely identifies each agent using the instance_id in the heartbeat payload.
Heartbeat server writes directly to the database:

The heartbeat server uses a Rust NAPI module that:
- Bypasses the application ORM (Object-Relational Mapping) layer entirely
- Uses sqlx (Rust SQL library) to write directly to the database
- Implements batch processing to efficiently process multiple heartbeats
- Provides much higher throughput than going through the API layer
The Rust module performs these operations:
- Creates a new agent if it does not already exist
- Adds or updates the last_seen timestamp
- Adds or updates the agent status to the status reported in the heartbeat
- Updates other agent metadata (hostname, IP, etc.)
API layer reads agent data:

The API layer has read-only access for agent data and performs the following actions:
- GET /api/agents - List agents
- GET /api/agents/summary - Agent status summary
The API never writes to the agents table. Only the heartbeat server does.
The Web UI displays updated agent data:

The web interface polls the API endpoints to display:
- Real-time agent status
- Last seen timestamps
- Agent health metrics
The background scheduler evaluates agent statuses:

Every 60 seconds, a Rust-based scheduler (also part of the NAPI module):
- Scans all agents in the database
- Checks last_seen timestamps against the agent's assigned reporting rule
- Updates agent statuses:
  - ok → not_reporting (if heartbeat missed beyond threshold)
  - not_reporting → ok (if heartbeat resumes)
- Auto-deletes agents that have exceeded the auto-delete threshold (if enabled for the reporting rule)

Configuration distribution

An agent requests a configuration:

Telegraf agents request their configuration from the main API server (port 8888):
```
telegraf --config "http://localhost:8888/api/configs/{config-id}/toml?location=datacenter1&env=prod"
```
The agent makes a GET request with:
- Config ID: Unique identifier for the configuration template
- Query Parameters: Variables for parameter substitution
- Accept Header: Can specify text/x-toml or application/octet-stream for download
The API server receives request:

The API server on port 8888 handles the request at /api/configs/{id}/toml and does the following:
- Validates the configuration ID
- Extracts all query parameters for substitution
- Checks the Accept header to determine response format
The application retrieves the configuration from the database:

{{% product-name %}} fetches configuration data from the database:
- Configuration TOML: The raw configuration with parameter placeholders
- Configuration name: Used for filename if downloading
- Updated timestamp: For the Last-Modified header
{{% product-name %}} substitutes parameters:

{{% product-name %}} processes the TOML template and replaces parameters with parameter values specified in the GET request.
{{% product-name %}} sets response headers:
- Content-Type
- Last-Modified
Telegraf uses the Last-Modified header to determine if a configuration has been updated and, if so, download and use the updated configuration.
{{% product-name %}} delivers the response:

Based on the Accept header:

{{< tabs-wrapper >}} {{% tabs "medium" %}} text/x-toml (TOML) application/octet-stream (Download) {{% /tabs %}} {{% tab-content %}}

HTTP/1.1 200 OK
Content-Type: text/x-toml; charset=utf-8
Last-Modified: Mon, 05 Jan 2025 07:28:00 GMT

[agent]
  hostname = "server-01"
  environment = "prod"
...

HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="config_name.toml"
Last-Modified: Mon, 05 Jan 2025 07:28:00 GMT

[agent]
  hostname = "server-01"
...

(Optional) Telegraf regularly checks the configuration for updates:

Telegraf agents can regularly check {{% product-name %}} for configuration updates and automatically load updates when detected. When starting a Telegraf agent, include the --config-url-watch-interval option with the interval that you want the agent to use to check for updates—for example:
```
telegraf \
  --config http://localhost:8888/api/configs/xxxxxx/toml \
  --config-url-watch-interval 1h
```

Reporting Rules

{{% product-name %}} uses reporting rules to determine when agents should be marked as not reporting:

Default Rule: Created automatically on first run
Heartbeat Interval: Expected frequency of agent heartbeats (default: 60s)
Threshold Multiplier: How many intervals to wait before marking not_reporting (default: 3x)

Access reporting rules via:

Web UI: Reporting Rules
API: GET /api/reporting-rules

9.5 KiB Raw Blame History