9.5 KiB
| title | description | menu | weight | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Telegraf Controller architecture | Architectural overview of the {{% product-name %}} application. |
|
105 |
{{% product-name %}} is a standalone application that provides centralized management for Telegraf agents. It runs as a single binary that starts two separate servers: a web interface/API server and a dedicated high-performance heartbeat server for agent monitoring.
Runtime Architecture
Application Components
When you run the Telegraf Controller binary, it starts four main subsystems:
- Web Server: Serves the management interface (default port:
8888) - API Server: Handles configuration management and administrative requests (served on the same port as the web server)
- Heartbeat Server: Dedicated high-performance server for agent heartbeats
(default port:
8000) - Background Scheduler: Monitors agent health every 60 seconds
Process Model
- telegraf_controller (single process, multiple servers)
- Main HTTP Server (port
8888)- Web UI (
/) - API Endpoints (
/api/*)
- Web UI (
- Heartbeat Server (port
8000)- POST /heartbeat (high-performance endpoint)
- Database Connection
- SQLite or PostgreSQL
- Background Tasks
- Agent Status Monitor (60s interval)
- Main HTTP Server (port
The dual-server architecture separates high-frequency heartbeat traffic from regular management operations, ensuring that the web interface remains responsive even under heavy agent load.
Configuration
{{% product-name %}} configuration is controlled through command options and environment variables.
| Command Option | Environment Variable | Description |
|---|---|---|
--port |
PORT |
API server port (default is 8888) |
--heartbeat-port |
HEARTBEAT_PORT |
Heartbeat service port (default: 8000) |
--database |
DATABASE |
Database filepath or URL (default is SQLite path) |
--ssl-cert |
SSL_CERT |
Path to SSL certificate |
--ssl-key |
SSL_KEY |
Path to SSL private key |
To use environment variables, create a .env file in the same directory as the
binary or export these environment variables in your terminal session.
Database Selection
{{% product-name %}} automatically selects the database type based on the
DATABASE string:
- SQLite (default): Best for development and small deployments with less than 1000 agents. Database file created automatically.
- PostgreSQL: Required for large deployments. Must be provisioned separately.
Example PostgreSQL configuration:
DATABASE="postgresql://user:password@localhost:5432/telegraf_controller"
Data Flow
Agent registration and heartbeats
{{< diagram >}}
flowchart LR
T["Telegraf Agents
(POST heartbeats)"] --> H["Port 8000
Heartbeat Server"]
H --Direct Write--> D[("Database")]
W["Web UI/API
"] --> A["Port 8888
API Server"] --View Agents (Read-Only)--> D
R["Rust Scheduler
(Agent status updates)"] --> D
{{< /diagram >}}
-
Agents send heartbeats:
Telegraf agents with the heartbeat output plugin send
POSTrequests to the dedicated heartbeat server (port8000by default). -
Heartbeat server processes the heartbeat:
The heartbeat server is a high-performance Rust-based HTTP server that:
- Receives the
POSTrequest at/agents/heartbeat - Validates the heartbeat payload
- Extracts agent information (ID, hostname, IP address, status, etc.)
- Uniquely identifies each agent using the
instance_idin the heartbeat payload.
- Receives the
-
Heartbeat server writes directly to the database:
The heartbeat server uses a Rust NAPI module that:
- Bypasses the application ORM (Object-Relational Mapping) layer entirely
- Uses
sqlx(Rust SQL library) to write directly to the database - Implements batch processing to efficiently process multiple heartbeats
- Provides much higher throughput than going through the API layer
The Rust module performs these operations:
- Creates a new agent if it does not already exist
- Adds or updates the
last_seentimestamp - Adds or updates the agent status to the status reported in the heartbeat
- Updates other agent metadata (hostname, IP, etc.)
-
API layer reads agent data:
The API layer has read-only access for agent data and performs the following actions:
GET /api/agents- List agentsGET /api/agents/summary- Agent status summary
The API never writes to the agents table. Only the heartbeat server does.
-
The Web UI displays updated agent data:
The web interface polls the API endpoints to display:
- Real-time agent status
- Last seen timestamps
- Agent health metrics
-
The background scheduler evaluates agent statuses:
Every 60 seconds, a Rust-based scheduler (also part of the NAPI module):
- Scans all agents in the database
- Checks
last_seentimestamps against the agent's assigned reporting rule - Updates agent statuses:
- ok → not_reporting (if heartbeat missed beyond threshold)
- not_reporting → ok (if heartbeat resumes)
- Auto-deletes agents that have exceeded the auto-delete threshold (if enabled for the reporting rule)
Configuration distribution
-
An agent requests a configuration:
Telegraf agents request their configuration from the main API server (port
8888):telegraf --config "http://localhost:8888/api/configs/{config-id}/toml?location=datacenter1&env=prod"The agent makes a
GETrequest with:- Config ID: Unique identifier for the configuration template
- Query Parameters: Variables for parameter substitution
- Accept Header: Can specify
text/x-tomlorapplication/octet-streamfor download
-
The API server receives request:
The API server on port
8888handles the request at/api/configs/{id}/tomland does the following:- Validates the configuration ID
- Extracts all query parameters for substitution
- Checks the
Acceptheader to determine response format
-
The application retrieves the configuration from the database:
{{% product-name %}} fetches configuration data from the database:
- Configuration TOML: The raw configuration with parameter placeholders
- Configuration name: Used for filename if downloading
- Updated timestamp: For the
Last-Modifiedheader
-
{{% product-name %}} substitutes parameters:
{{% product-name %}} processes the TOML template and replaces parameters with parameter values specified in the
GETrequest. -
{{% product-name %}} sets response headers:
- Content-Type
- Last-Modified
Telegraf uses the
Last-Modifiedheader to determine if a configuration has been updated and, if so, download and use the updated configuration. -
{{% product-name %}} delivers the response:
Based on the
Acceptheader:{{< tabs-wrapper >}} {{% tabs "medium" %}} text/x-toml (TOML) application/octet-stream (Download) {{% /tabs %}} {{% tab-content %}}
HTTP/1.1 200 OK
Content-Type: text/x-toml; charset=utf-8
Last-Modified: Mon, 05 Jan 2025 07:28:00 GMT
[agent]
hostname = "server-01"
environment = "prod"
...
{{% /tab-content %}} {{% tab-content %}}
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="config_name.toml"
Last-Modified: Mon, 05 Jan 2025 07:28:00 GMT
[agent]
hostname = "server-01"
...
{{% /tab-content %}} {{< /tabs-wrapper >}}
-
(Optional) Telegraf regularly checks the configuration for updates:
Telegraf agents can regularly check {{% product-name %}} for configuration updates and automatically load updates when detected. When starting a Telegraf agent, include the
--config-url-watch-intervaloption with the interval that you want the agent to use to check for updates—for example:telegraf \ --config http://localhost:8888/api/configs/xxxxxx/toml \ --config-url-watch-interval 1h
Reporting Rules
{{% product-name %}} uses reporting rules to determine when agents should be marked as not reporting:
- Default Rule: Created automatically on first run
- Heartbeat Interval: Expected frequency of agent heartbeats (default: 60s)
- Threshold Multiplier: How many intervals to wait before marking not_reporting (default: 3x)
Access reporting rules via:
- Web UI: Reporting Rules
- API:
GET /api/reporting-rules