Merge pull request #20 from influxdata/process-data

Process data
pull/24/head
Scott Anderson 2019-01-22 15:49:17 -07:00 committed by GitHub
commit 3355ba9b9e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
12 changed files with 645 additions and 40 deletions

View File

@ -1,37 +0,0 @@
---
title: Using tasks
description: This is just an example post to show the format of new 2.0 posts
menu:
v2_0:
name: Using tasks
weight: 1
---
A task is a scheduled Flux query. Main use case is replacement for continuous queries, add info about CQs.
**To filter the list of tasks**:
1. Enable the **Show Inactive** option to include inactive tasks on the list.
2. Enter text in the **Filter tasks by name** field to search for tasks by name.
3. Select an organization from the **All Organizations** dropdown to filter the list by organization.
4. Click on the heading of any column to sort by that field.
**To import a task**:
1. Click the Tasks (calendar) icon in the left navigation menu.
2. Click **Import** in the upper right.
3. Drag and drop or select a file to upload.
4. !!!
**To create a task**:
1. Click **+ Create Task**.
2. In the left sidebar panel, enter the following details:
* **Name**: The name of your task.
* **Owner**: Select an organization from the drop-down menu.
* **Schedule Task**: Select **Interval** for !!!! or **Cron** to !!!. Also enter value below (interval window or Cron thing).
* **Offset**: Enter an offset time. If you schedule it to run at the hour but you have an offset of ten minutes, then it runs at an hour and ten minutes.
3. In the right panel, enter your task script.
4. Click **Save**.
**Disable tasks**

View File

@ -0,0 +1,29 @@
---
title: Process Data with InfluxDB tasks
seotitle: Process Data with InfluxDB tasks
description: >
InfluxDB's task engine runs scheduled Flux tasks that process and analyze data.
This collection of articles provides information about creating and managing InfluxDB tasks.
menu:
v2_0:
name: Process data
weight: 3
---
InfluxDB's _**task engine**_ is designed for processing and analyzing data.
A task is a scheduled Flux query that take a stream of input data, modify or
analyze it in some way, then perform an action.
Examples include data downsampling, anomaly detection _(Coming)_, alerting _(Coming)_, etc.
{{% note %}}
Tasks are a replacement for InfluxDB v1.x's continuous queries.
{{% /note %}}
The following articles explain how to configure and build tasks using the InfluxDB user interface (UI)
and via raw Flux scripts with the `influx` command line interface (CLI).
They also provide examples of commonly used tasks.
[Write a task](/v2.0/process-data/write-a-task)
[Manage Tasks](/v2.0/process-data/manage-tasks)
[Common Tasks](/v2.0/process-data/common-tasks)
[Task Options](/v2.0/process-data/task-options)

View File

@ -0,0 +1,22 @@
---
title: Common data processing tasks
seotitle: Common data processing tasks performed with with InfluxDB
description: >
InfluxDB Tasks process data on specified schedules.
This collection of articles walks through common use cases for InfluxDB tasks.
menu:
v2_0:
name: Common tasks
parent: Process data
weight: 4
---
The following articles walk through common task use cases.
[Downsample Data with InfluxDB](/v2.0/process-data/common-tasks/downsample-data)
{{% note %}}
This list will continue to grow.
If you have suggestions, please [create an issue](https://github.com/influxdata/docs-v2/issues/new)
on the InfluxData documentation repository on Github.
{{% /note %}}

View File

@ -0,0 +1,85 @@
---
title: Downsample data with InfluxDB
seotitle: Downsample data in an InfluxDB task
description: >
How to create a task that downsamples data much like continuous queries
in previous versions of InfluxDB.
menu:
v2_0:
name: Downsample data
parent: Common tasks
weight: 4
---
One of the most common use cases for InfluxDB tasks is downsampling data to reduce
the overall disk usage as data collects over time.
In previous versions of InfluxDB, continuous queries filled this role.
This article walks through creating a continuous-query-like task that downsamples
data by aggregating data within windows of time, then storing the aggregate value in a new bucket.
### Requirements
To perform a downsampling task, you need to the following:
##### A "source" bucket
The bucket from which data is queried.
##### A "destination" bucket
A separate bucket where aggregated, downsampled data is stored.
##### Some type of aggregation
To downsample data, it must be aggregated in some way.
What specific method of aggregation you use depends on your specific use case,
but examples include mean, median, top, bottom, etc.
View [Flux's aggregate functions](/v2.0/reference/flux/functions/transformations/aggregates/)
for more information and ideas.
## Create a destination bucket
By design, tasks cannot write to the same bucket from which they are reading.
You need another bucket where the task can store the aggregated, downsampled data.
_For information about creating buckets, see [Create a bucket](#)._
## Example downsampling task script
The example task script below is a very basic form of data downsampling that does the following:
1. Defines a task named "cq-mem-data-1w" that runs once a week.
2. Defines a `data` variable that represents all data from the last 2 weeks in the
`mem` measurement of the `system-data` bucket.
3. Uses the [`aggregateWindow()` function](/v2.0/reference/flux/functions/transformations/aggregates/aggregatewindow/)
to window the data into 1 hour intervals and calculate the average of each interval.
4. Stores the aggregated data in the `system-data-downsampled` bucket under the
`my-org` organization.
```js
// Task Options
option task = {
name: "cq-mem-data-1w",
every: 1w,
}
// Defines a data source
data = from(bucket: "system-data")
|> range(start: -task.every * 2)
|> filter(fn: (r) => r._measurement == "mem")
data
// Windows and aggregates the data in to 1h averages
|> aggregateWindow(fn: mean, every: 1h)
// Stores the aggregated data in a new bucket
|> to(bucket: "system-data-downsampled", org: "my-org")
```
Again, this is a very basic example, but it should provide you with a foundation
to build more complex downsampling tasks.
## Add your task
Once your task is ready, see [Create a task](/v2.0/process-data/manage-tasks/create-task) for information about adding it to InfluxDB.
## Things to consider
- If there is a chance that data may arrive late, specify an `offset` in your
task options long enough to account for late-data.
- If running a task against a bucket with a finite retention policy, do not schedule
tasks to run too closely to the end of the retention policy.
Always provide a "cushion" for downsampling tasks to complete before the data
is dropped by the retention policy.

View File

@ -0,0 +1,21 @@
---
title: Manage tasks in InfluxDB
seotitle: Manage data processing tasks in InfluxDB
description: >
InfluxDB provides options for managing the creation, reading, updating, and deletion
of tasks using both the 'influx' CLI and the InfluxDB UI.
menu:
v2_0:
name: Manage tasks
parent: Process data
weight: 2
---
InfluxDB provides two options for managing the creation, reading, updating, and deletion (CRUD) of tasks -
through the InfluxDB user interface (UI) or using the `influx` command line interface (CLI).
Both tools can perform all task CRUD operations.
[Create a task](/v2.0/process-data/manage-tasks/create-task)
[View tasks](/v2.0/process-data/manage-tasks/view-tasks)
[Update a task](/v2.0/process-data/manage-tasks/update-task)
[Delete a task](/v2.0/process-data/manage-tasks/delete-task)

View File

@ -0,0 +1,85 @@
---
title: Create a task
seotitle: Create a task for processing data in InfluxDB
description: >
How to create a task that processes data in InfluxDB using the InfluxDB user
interface or the 'influx' command line interface.
menu:
v2_0:
name: Create a task
parent: Manage tasks
weight: 1
---
InfluxDB provides multiple ways to create tasks both in the InfluxDB user interface (UI)
and the `influx` command line interface (CLI).
_This article assumes you have already [written a task](/v2.0/process-data/write-a-task)._
## Create a task in the InfluxDB UI
The InfluxDB UI provides multiple ways to create a task:
- [Create a task from the Data Explorer](#create-a-task-from-the-data-explorer)
- [Create a task in the Task UI](#create-a-task-in-the-task-ui)
- [Import a task](#import-a-task)
### Create a task from the Data Explorer
1. Click on the **Data Explorer** icon in the left navigation menu.
{{< img-hd src="/img/data-explorer-icon.png" alt="Data Explorer Icon" />}}
2. Building a query and click **Save As** in the upper right.
3. Select the **Task** option.
4. Specify the task options. See [Task options](/v2.0/process-data/task-options)
for detailed information about each option.
5. Click **Save as Task**.
{{< img-hd src="/img/data-explorer-save-as-task.png" alt="Add a task from the Data Explorer"/>}}
### Create a task in the Task UI
1. Click on the **Tasks** icon in the left navigation menu.
{{< img-hd src="/img/tasks-icon.png" alt="Tasks Icon" />}}
2. Click **+ Create Task** in the upper right.
3. In the left panel, specify the task options.
See [Task options](/v2.0/process-data/task-options)for detailed information about each option.
4. In the right panel, enter your task script.
5. Click **Save** in the upper right.
{{< img-hd src="/img/tasks-create-edit.png" alt="Create a task" />}}
### Import a task
1. Click on the **Tasks** icon in the left navigation menu.
2. Click **Import** in the upper right.
3. Drag and drop or select a file to upload.
4. Click **Upload Task**.
{{< img-hd src="/img/tasks-import-task.png" alt="Import a task" />}}
## Create a task using the influx CLI
Use `influx task create` command to create a new task.
It accepts either a file path or raw Flux.
###### Create a task using a file
```sh
# Pattern
influx task create --org <org-name> @</path/to/task-script>
# Example
influx task create --org my-org @/tasks/cq-mean-1h.flux
```
###### Create a task using raw Flux
```sh
influx task create --org my-org - # <return> to open stdin pipe
options task = {
name: "task-name",
every: 6h
}
# ... Task script ...
# <ctrl-d> to close the pipe and submit the command
```

View File

@ -0,0 +1,37 @@
---
title: Delete a task
seotitle: Delete a task for processing data in InfluxDB
description: >
How to delete a task in InfluxDB using the InfluxDB user interface or using
the 'influx' command line interface.
menu:
v2_0:
name: Delete a task
parent: Manage tasks
weight: 4
---
## Delete a task in the InfluxDB UI
1. Click the **Tasks** icon in the left navigation menu.
{{< img-hd src="/img/tasks-icon.png" alt="Tasks Icon" />}}
2. In the list of tasks, hover over the task you would like to delete.
3. Click **Delete** on the far right.
4. Click **Confirm**.
{{< img-hd src="/img/tasks-delete-task.png" alt="Delete a task" />}}
## Delete a task with the influx CLI
Use the `influx task delete` command to delete a task.
_This command requires a task ID, which is available in the output of `influx task find`._
```sh
# Pattern
influx task delete -i <task-id>
# Example
influx task delete -i 0343698431c35000
```

View File

@ -0,0 +1,63 @@
---
title: Update a task
seotitle: Update a task for processing data in InfluxDB
description: >
How to update a task that processes data in InfluxDB using the InfluxDB user
interface or the 'influx' command line interface.
menu:
v2_0:
name: Update a task
parent: Manage tasks
weight: 3
---
## Update a task in the InfluxDB UI
To view your tasks, click the **Tasks** icon in the left navigation menu.
{{< img-hd src="/img/tasks-icon.png" alt="Tasks Icon" />}}
#### Update a task's Flux script
1. In the list of tasks, click the **Name** of the task you would like to update.
2. In the left panel, modify the task options.
3. In the right panel, modify the task script.
4. Click **Save** in the upper right.
{{< img-hd src="/img/tasks-create-edit.png" alt="Update a task" />}}
#### Update the status of a task
In the list of tasks, click the toggle in the **Active** column of the task you
would like to activate or inactivate.
## Update a task with the influx CLI
Use the `influx task update` command to update or change the status of an existing task.
_This command requires a task ID, which is available in the output of `influx task find`._
#### Update a task's Flux script
Pass the file path of your updated Flux script to the `influx task update` command
with the ID of the task you would like to update.
Modified [task options](/v2.0/process-data/task-options) defined in the Flux
script are also updated.
```sh
# Pattern
influx task update -i <task-id> @/path/to/updated-task-script
# Example
influx task update -i 0343698431c35000 @/tasks/cq-mean-1h.flux
```
#### Update the status of a task
Pass the ID of the task you would like to update to the `influx task update`
command with the `--status` flag.
_Possible arguments of the `--status` flag are `active` or `inactive`._
```sh
# Pattern
influx task update -i <task-id> --status < active | inactive >
# Example
influx task update -i 0343698431c35000 --status inactive
```

View File

@ -0,0 +1,39 @@
---
title: View tasks in InfluxDB
seotitle: View created tasks that process data in InfluxDB
description: >
How to view all created data processing tasks using the InfluxDB user interface
or the 'influx' command line interface.
menu:
v2_0:
name: View tasks
parent: Manage tasks
weight: 2
---
## View tasks in the InfluxDB UI
Click the **Tasks** icon in the left navigation to view the lists of tasks.
{{< img-hd src="/img/tasks-icon.png" alt="Tasks Icon" />}}
### Filter the list of tasks
1. Enable the **Show Inactive** option to include inactive tasks in the list.
2. Enter text in the **Filter tasks by name** field to search for tasks by name.
3. Select an organization from the **All Organizations** dropdown to filter the list by organization.
4. Click on the heading of any column to sort by that field.
{{< img-hd src="/img/tasks-list.png" alt="View and filter tasks" />}}
## View tasks with the influx CLI
Use the `influx task find` command to return a list of created tasks.
```sh
influx task find
```
#### Filter tasks using the CLI
Other filtering options such as filtering by organization or user,
or limiting the number of tasks returned are available.
See the [`influx task find` documentation](/v2.0/reference/cli/influx/task/find)
for information about other available flags.

View File

@ -0,0 +1,109 @@
---
title: Task configuration options
seotitle: InfluxDB task configuration options
description: >
Task options define specific information about a task such as its name,
the schedule on which it runs, execution delays, and others.
menu:
v2_0:
name: Task options
parent: Process data
weight: 5
---
Task options define specific information about the task and are specified in your
Flux script or in the InfluxDB user interface (UI).
The following task options are available:
- [name](#name)
- [every](#every)
- [cron](#cron)
- [offset](#offset)
- [concurrency](#concurrency)
- [retry](#retry)
{{% note %}}
`every` and `cron` are mutually exclusive, but at least one is required.
{{% /note %}}
## name
The name of the task. _**Required**_.
_**Data type:** String_
```js
options task = {
name: "taskName",
// ...
}
```
## every
The interval at which the task runs.
_**Data type:** Duration_
_**Note:** In the InfluxDB UI, the **Interval** field sets this option_.
```js
options task = {
// ...
every: 1h,
}
```
## cron
The [cron expression](https://en.wikipedia.org/wiki/Cron#Overview) that
defines the schedule on which the task runs.
Cron scheduling is based on system time.
_**Data type:** String_
```js
options task = {
// ...
cron: "0 * * * *",
}
```
## offset
Delays the execution of the task but preserves the original time range.
For example, if a task is to run on the hour, a `10m` offset will delay it to 10
minutes after the hour, but all time ranges defined in the task are relative to
the specified execution time.
A common use case is offsetting execution to account for data that may arrive late.
_**Data type:** Duration_
```js
options task = {
// ...
offset: "0 * * * *",
}
```
## concurrency
The number task of executions that can run concurrently.
If the concurrency limit is reached, all subsequent executions are queued until
other running task executions complete.
_**Data type:** Integer_
```js
options task = {
// ...
concurrency: 2,
}
```
## retry
The number of times to retry the task before it is considered as having failed.
_**Data type:** Integer_
```js
options task = {
// ...
retry: 2,
}
```

View File

@ -0,0 +1,147 @@
---
title: Write an InfluxDB task
seotitle: Write an InfluxDB task that processes data
description: >
How to write an InfluxDB task that processes data in some way, then performs an action
such as storing the modified data in a new bucket or sending an alert.
menu:
v2_0:
name: Write a task
parent: Process data
weight: 1
---
InfluxDB tasks are scheduled Flux scripts that take a stream of input data, modify or analyze
it in some way, then store the modified data in a new bucket or perform other actions.
This article walks through writing a basic InfluxDB task that downsamples
data and stores it in a new bucket.
## Components of a task
Every InfluxDB task needs the following four components.
Their form and order can vary, but the are all essential parts of a task.
- [Task options](#define-task-options)
- [A data source](#define-a-data-source)
- [Data processing or transformation](#process-or-transform-your-data)
- [A destination](#define-a-destination)
_[Skip to the full example task script](#full-example-task-script)_
## Define task options
Task options define specific information about the task.
The example below illustrates how task options are defined in your Flux script:
```js
option task = {
name: "cqinterval15m",
every: 1h,
offset: 0m,
concurrency: 1,
retry: 5
}
```
_See [Task configuration options](/v2.0/process-data/task-options) for detailed information
about each option._
{{% note %}}
If creating a task in the InfluxDB user interface (UI), task options are defined
in form fields when creating the task.
{{% /note %}}
## Define a data source
Define a data source using Flux's [`from()` function](/v2.0/reference/flux/functions/inputs/from/)
or any other [Flux input functions](/v2.0/reference/flux/functions/inputs/).
For convenience, consider creating a variable that includes the sourced data with
the required time range and any relevant filters.
```js
data = from(bucket: "telegraf/default")
|> range(start: -task.every)
|> filter(fn: (r) =>
r._measurement == "mem" AND
r.host == "myHost"
)
```
{{% note %}}
#### Using task options in your Flux script
Task options are passed as part of a `task` object and can be referenced in your Flux script.
In the example above, the time range is defined as `-task.every`.
`task.every` is dot notation that references the `every` property of the `task` object.
`every` is defined as `1h`, therefore `-task.every` equates to `-1h`.
Using task options to define values in your Flux script can make reusing your task easier.
{{% /note %}}
## Process or transform your data
The purpose of tasks is to process or transform data in some way.
What exactly happens and what form the output data takes is up to you and your
specific use case.
The example below illustrates a task that downsamples data by calculating the average of set intervals.
It uses the `data` variable defined [above](#define-a-data-source) as the data source.
It then windows the data into 5 minute intervals and calculates the average of each
window using the [`aggregateWindow()` function](/v2.0/reference/flux/functions/transformations/aggregates/aggregatewindow/).
```js
data
|> aggregateWindow(
every: 5m,
fn: mean
)
```
_See [Common tasks](/v2.0/process-data/common-tasks) for examples of tasks commonly used with InfluxDB._
## Define a destination
In the vast majority of task use cases, once data is transformed, it needs to sent and stored somewhere.
This could be a separate bucket with a different retention policy, another measurement, or even an alert endpoint _(Coming)_.
The example below uses Flux's [`to()` function](/v2.0/reference/flux/functions/outputs/to)
to send the transformed data to another bucket:
```js
// ...
|> to(bucket: "telegraf_downsampled", org: "my-org")
```
{{% note %}}
You cannot write to the same bucket you are reading from.
{{% /note %}}
## Full example task script
Below is the full example task script that combines all of the components described above:
```js
// Task options
option task = {
name: "cqinterval15m",
every: 1h,
offset: 0m,
concurrency: 1,
retry: 5
}
// Data source
data = from(bucket: "telegraf/default")
|> range(start: -task.every)
|> filter(fn: (r) =>
r._measurement == "mem" AND
r.host == "myHost"
)
data
// Data transformation
|> aggregateWindow(
every: 5m,
fn: mean
)
// Data destination
|> to(bucket: "telegraf_downsampled")
```

View File

@ -1,7 +1,12 @@
{{ .Inner }}
{{ $src := .Get "src" }}
{{ $alt := .Get "alt" }}
{{ with (imageConfig ( print "/static" $src )) }}
{{ $imageWidth := div .Width 2 }}
<img src='{{ $src }}' alt='{{ $alt }}' width='{{ $imageWidth }}' />
{{ if (fileExists ( print "/static" $src )) }}
{{ with (imageConfig ( print "/static" $src )) }}
{{ $imageWidth := div .Width 2 }}
<img src='{{ $src }}' alt='{{ $alt }}' width='{{ $imageWidth }}' />
{{ end }}
{{ else }}
<img src='{{ $src }}' alt='{{ $alt }}'/>
{{ end }}