chore(telegraf): add Avro input format (parser). (#5068)

* chore(telegraf): add Avro input format (parser). Closes #5054 * Apply suggestions from code review Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> --------- Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>
2023-08-02 19:58:19 -05:00 · 2023-08-02 19:58:19 -05:00 · f34f2d9450
parent 09361bb4e3
commit f34f2d9450
3 changed files with 110 additions and 5 deletions
--- a/content/telegraf/v1.27/data_formats/input/_index.md
+++ b/content/telegraf/v1.27/data_formats/input/_index.md
@ -9,9 +9,11 @@ menu:
    parent: Data formats
 ---

-Telegraf contains many general purpose plugins that use a configurable parser for parsing input data into [metrics][].  This allows, for example, the
-`kafka_consumer` input plugin to process messages formatted in InfluxDB Line
-Protocol or in JSON format. Telegraf supports the following input data formats:
+Telegraf contains many general purpose plugins that use a configurable parser for parsing input data into metrics.
+This allows input plugins such as the [`kafka_consumer` plugin](/telegraf/v1.27/plugins/#input-kafka_consumer)
+to consume and process different data formats, such as InfluxDB line
+protocol or JSON.
+Telegraf supports the following input **data formats**:

 {{< children >}}

--- a/content/telegraf/v1.27/data_formats/input/avro.md
+++ b/content/telegraf/v1.27/data_formats/input/avro.md
@ -0,0 +1,103 @@
+---
+title: Avro input data format
+description: Use the `avro` input data format to parse metrics from a message serialized as Avro binary or JSON format.
+menu:
+  telegraf_1_27_ref:
+    name: Avro
+    weight: 10
+    parent: Input data formats
+---
+
+The Avro input data format parses messages serialized as [Avro](https://avro.apache.org/) format and encoded as binary or JSON.
+
+## Wire format
+
+Avro messages should conform to [Wire Format](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format) using the following byte-mapping:
+
+| Bytes | Area       | Description                                      |
+| ----- | ---------- | ------------------------------------------------ |
+| 0     | Magic Byte | Confluent serialization format version number; currently always `0`. |
+| 1-4   | Schema ID  | 4-byte schema ID as returned by Schema Registry. |
+| 5-    | Data       | Serialized data.                                 |
+
+{{% caption %}}
+Source: [Confluent Documentation](https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format)
+{{% /caption %}}
+
+For more information about Avro schema and encodings, see the [specification](https://avro.apache.org/docs/current/specification/) in the Apache Avro documentation.
+
+## Configuration
+
+```toml
+[[inputs.kafka_consumer]]
+  ## Kafka brokers.
+  brokers = ["localhost:9092"]
+
+  ## Topics to consume.
+  topics = ["telegraf"]
+
+  ## Maximum length of a message to consume, in bytes (default 0/unlimited);
+  ## larger messages are dropped
+  max_message_len = 1000000
+
+  ## Avro data format settings
+  data_format = "avro"
+
+  ## Avro message format
+  ## Supported values are "binary" (default) and "json"
+  # avro_format = "binary"
+
+  ## Url of the schema registry; exactly one of schema registry and
+  ## schema must be set
+  avro_schema_registry = "http://localhost:8081"
+
+  ## Schema string; exactly one of schema registry and schema must be set
+  #avro_schema = '''
+  #        {
+  #          "type":"record",
+  #          "name":"Value",
+  #          "namespace":"com.example",
+  #          "fields":[
+  #              {
+  #                  "name":"tag",
+  #                  "type":"string"
+  #              },
+  #              {
+  #                  "name":"field",
+  #                  "type":"long"
+  #              },
+  #              {
+  #                  "name":"timestamp",
+  #                  "type":"long"
+  #              }
+  #          ]
+  #      }
+  #'''
+
+  ## Measurement string; if not set, determine measurement name from
+  ## schema (as "<namespace>.<name>")
+  # avro_measurement = "ratings"
+
+  ## Avro fields to be used as tags; optional.
+  # avro_tags = ["CHANNEL", "CLUB_STATUS"]
+
+  ## Avro fields to be used as fields; if empty, any Avro fields
+  ## detected from the schema, not used as tags, will be used as
+  ## measurement fields.
+  # avro_fields = ["STARS"]
+
+  ## Avro fields to be used as timestamp; if empty, current time will
+  ## be used for the measurement timestamp.
+  # avro_timestamp = ""
+  ## If avro_timestamp is specified, avro_timestamp_format must be set
+  ## to one of 'unix', 'unix_ms', 'unix_us', or 'unix_ns'
+  # avro_timestamp_format = "unix"
+
+  ## Used to separate parts of array structures.  As above, the default
+  ## is the empty string, so a=["a", "b"] becomes a0="a", a1="b".
+  ## If this were set to "_", then it would be a_0="a", a_1="b".
+  # avro_field_separator = "_"
+
+  ## Default values for given tags: optional
+  # tags = { "application": "hermes", "region": "central" }
+```
--- a/content/telegraf/v1.27/release-notes-changelog.md
+++ b/content/telegraf/v1.27/release-notes-changelog.md
@ -184,7 +184,7 @@ menu:
 - Always disable cgo support (static builds).
 - Plugin state-persistence.
 - Add `/etc/telegraf/telegraf.d` to default configuration file locations.
- Print loaded configurationss.
+- Print loaded configurations.
 - Accept durations given in days (e.g. 7d).
 - OAuth (`common.oauth`): Add `audience` parameter.
 - TLS (`common.tls`): Add `enable` flag.
@ -1147,7 +1147,7 @@ Telegraf without having to paste in sample configurations from each plugin's REA
  - Update client API version.
 - ECS (`ecs`): Use current time as timestamp.
 - Execd `execd`: Add newline for Prometheus parsing.
- File (`file`): Statefull parser handling.
+- File (`file`): Stateful parser handling.
 - GNMI (`gnmi`): Add dynamic tagging.
 - Graylog (`graylog`):
  - Add `toml` tags.