chore(telegraf): add Avro input format (parser). (#5068)

* chore(telegraf): add Avro input format (parser).

Closes #5054

* Apply suggestions from code review

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>

---------

Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>
pull/5069/head^2
Jason Stirnaman 2023-08-02 19:58:19 -05:00 committed by GitHub
parent 09361bb4e3
commit f34f2d9450
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 110 additions and 5 deletions

View File

@ -9,9 +9,11 @@ menu:
parent: Data formats
---
Telegraf contains many general purpose plugins that use a configurable parser for parsing input data into [metrics][]. This allows, for example, the
`kafka_consumer` input plugin to process messages formatted in InfluxDB Line
Protocol or in JSON format. Telegraf supports the following input data formats:
Telegraf contains many general purpose plugins that use a configurable parser for parsing input data into metrics.
This allows input plugins such as the [`kafka_consumer` plugin](/telegraf/v1.27/plugins/#input-kafka_consumer)
to consume and process different data formats, such as InfluxDB line
protocol or JSON.
Telegraf supports the following input **data formats**:
{{< children >}}

View File

@ -0,0 +1,103 @@
---
title: Avro input data format
description: Use the `avro` input data format to parse metrics from a message serialized as Avro binary or JSON format.
menu:
telegraf_1_27_ref:
name: Avro
weight: 10
parent: Input data formats
---
The Avro input data format parses messages serialized as [Avro](https://avro.apache.org/) format and encoded as binary or JSON.
## Wire format
Avro messages should conform to [Wire Format](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format) using the following byte-mapping:
| Bytes | Area | Description |
| ----- | ---------- | ------------------------------------------------ |
| 0 | Magic Byte | Confluent serialization format version number; currently always `0`. |
| 1-4 | Schema ID | 4-byte schema ID as returned by Schema Registry. |
| 5- | Data | Serialized data. |
{{% caption %}}
Source: [Confluent Documentation](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format)
{{% /caption %}}
For more information about Avro schema and encodings, see the [specification](https://avro.apache.org/docs/current/specification/) in the Apache Avro documentation.
## Configuration
```toml
[[inputs.kafka_consumer]]
## Kafka brokers.
brokers = ["localhost:9092"]
## Topics to consume.
topics = ["telegraf"]
## Maximum length of a message to consume, in bytes (default 0/unlimited);
## larger messages are dropped
max_message_len = 1000000
## Avro data format settings
data_format = "avro"
## Avro message format
## Supported values are "binary" (default) and "json"
# avro_format = "binary"
## Url of the schema registry; exactly one of schema registry and
## schema must be set
avro_schema_registry = "http://localhost:8081"
## Schema string; exactly one of schema registry and schema must be set
#avro_schema = '''
# {
# "type":"record",
# "name":"Value",
# "namespace":"com.example",
# "fields":[
# {
# "name":"tag",
# "type":"string"
# },
# {
# "name":"field",
# "type":"long"
# },
# {
# "name":"timestamp",
# "type":"long"
# }
# ]
# }
#'''
## Measurement string; if not set, determine measurement name from
## schema (as "<namespace>.<name>")
# avro_measurement = "ratings"
## Avro fields to be used as tags; optional.
# avro_tags = ["CHANNEL", "CLUB_STATUS"]
## Avro fields to be used as fields; if empty, any Avro fields
## detected from the schema, not used as tags, will be used as
## measurement fields.
# avro_fields = ["STARS"]
## Avro fields to be used as timestamp; if empty, current time will
## be used for the measurement timestamp.
# avro_timestamp = ""
## If avro_timestamp is specified, avro_timestamp_format must be set
## to one of 'unix', 'unix_ms', 'unix_us', or 'unix_ns'
# avro_timestamp_format = "unix"
## Used to separate parts of array structures. As above, the default
## is the empty string, so a=["a", "b"] becomes a0="a", a1="b".
## If this were set to "_", then it would be a_0="a", a_1="b".
# avro_field_separator = "_"
## Default values for given tags: optional
# tags = { "application": "hermes", "region": "central" }
```

View File

@ -184,7 +184,7 @@ menu:
- Always disable cgo support (static builds).
- Plugin state-persistence.
- Add `/etc/telegraf/telegraf.d` to default configuration file locations.
- Print loaded configurationss.
- Print loaded configurations.
- Accept durations given in days (e.g. 7d).
- OAuth (`common.oauth`): Add `audience` parameter.
- TLS (`common.tls`): Add `enable` flag.
@ -1147,7 +1147,7 @@ Telegraf without having to paste in sample configurations from each plugin's REA
- Update client API version.
- ECS (`ecs`): Use current time as timestamp.
- Execd `execd`: Add newline for Prometheus parsing.
- File (`file`): Statefull parser handling.
- File (`file`): Stateful parser handling.
- GNMI (`gnmi`): Add dynamic tagging.
- Graylog (`graylog`):
- Add `toml` tags.