Update influxdb join doc with new join content (#4480)

pull/4485/head
Scott Anderson 2022-09-23 16:59:34 -06:00 committed by GitHub
parent 9472a6c7fc
commit 07810bc0de
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 409 additions and 270 deletions

View File

@ -55,6 +55,9 @@ schemas or that come from two separate data sources.
If you're joining data from the same data source with the same schema, using
[`union()`](/flux/v0.x/stdlib/universe/union/) and [`pivot()`](/flux/v0.x/stdlib/universe/pivot/)
to combine the data will likely be more performant.
For more information, see
[When to use union and pivot instead of join functions](/{{< latest "influxdb" >}}/query-data/flux/join/#when-to-use-union-and-pivot-instead-of-join-functions).
{{% /note %}}
- [How join functions work](#how-join-functions-work)

View File

@ -68,7 +68,7 @@ Which method you use depends on your desired behavior:
_For more information, see [join data requirements](/flux/v0.x/join-data/#data-requirements)._
3. Use `join.time()` to join the two streams together.
3. Use `join.time()` to join the two streams together based on time values.
Provide the following parameters:
- `left`: ({{< req >}}) Stream of data representing the left side of the join.

View File

@ -12,280 +12,389 @@ weight: 210
aliases:
- /influxdb/v2.4/query-data/guides/join/
related:
- /{{< latest "flux" >}}/stdlib/universe/join
- /{{< latest "flux" >}}/join-data/
- /{{< latest "flux" >}}/join-data/inner/
- /{{< latest "flux" >}}/join-data/left-outer/
- /{{< latest "flux" >}}/join-data/right-outer/
- /{{< latest "flux" >}}/join-data/full-outer/
- /{{< latest "flux" >}}/join-data/time/
list_query_example: join
---
The [`join()` function](/{{< latest "flux" >}}/stdlib/universe/join) merges two or more
input streams, whose values are equal on a set of common columns, into a single output stream.
Flux allows you to join on any columns common between two data streams and opens the door
for operations such as cross-measurement joins and math across measurements.
Use the Flux [`join` package](/{{< latest "flux" >}}/stdlib/join/) to join two data sets
based on common values using the following join methods:
To illustrate a join operation, use data captured by Telegraf and stored in
InfluxDB - memory usage and processes.
{{< flex >}}
{{< flex-content "quarter" >}}
<p style="text-align:center"><strong>Inner join</strong></p>
{{< svg svg="static/svgs/join-diagram.svg" class="inner small center" >}}
{{< /flex-content >}}
{{< flex-content "quarter" >}}
<p style="text-align:center"><strong>Left outer join</strong></p>
{{< svg svg="static/svgs/join-diagram.svg" class="left small center" >}}
{{< /flex-content >}}
{{< flex-content "quarter" >}}
<p style="text-align:center"><strong>Right outer join</strong></p>
{{< svg svg="static/svgs/join-diagram.svg" class="right small center" >}}
{{< /flex-content >}}
{{< flex-content "quarter" >}}
<p style="text-align:center"><strong>Full outer join</strong></p>
{{< svg svg="static/svgs/join-diagram.svg" class="full small center" >}}
{{< /flex-content >}}
{{< /flex >}}
In this guide, we'll join two data streams, one representing memory usage and the other representing the
total number of running processes, then calculate the average memory usage per running process.
The join package lets you join data from different data sources such as
[InfluxDB](/{{< latest "flux" >}}/query-data/influxdb/), [SQL database](/{{< latest "flux" >}}/query-data/sql/),
[CSV](/{{< latest "flux" >}}/query-data/csv/), and [others](/{{< latest "flux" >}}/query-data/).
If you're just getting started with Flux queries, check out the following:
## Use join functions to join your data
- [Get started with Flux](/{{< latest "flux" >}}/get-started/) for a conceptual overview of Flux and parts of a Flux query.
- [Execute queries](/influxdb/v2.4/query-data/execute-queries/) to discover a variety of ways to run your queries.
{{< tabs-wrapper >}}
{{% tabs %}}
[Inner join](#)
[Left join](#)
[Right join](#)
[Full outer join](#)
[Join on time](#)
{{% /tabs %}}
## Define stream variables
In order to perform a join, you must have two streams of data.
Assign a variable to each data stream.
<!--------------------------------- BEGIN Inner --------------------------------->
{{% tab-content %}}
### Memory used variable
Define a `memUsed` variable that filters on the `mem` measurement and the `used` field.
This returns the amount of memory (in bytes) used.
1. Import the `join` package.
2. Define the **left** and **right** data streams to join:
###### memUsed stream definition
```js
memUsed = from(bucket: "example-bucket")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "mem" and r._field == "used")
```
- Each stream must have one or more columns with common values.
Column labels do not need to match, but column values do.
- Each stream should have identical [group keys](/{{< latest "flux" >}}/get-started/data-model/#group-key).
{{% truncate %}}
###### memUsed data output
```
Table: keys: [_start, _stop, _field, _measurement, host]
_start:time _stop:time _field:string _measurement:string host:string _time:time _value:int
------------------------------ ------------------------------ ---------------------- ---------------------- ------------------------ ------------------------------ --------------------------
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:50:00.000000000Z 10956333056
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:50:10.000000000Z 11014008832
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:50:20.000000000Z 11373428736
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:50:30.000000000Z 11001421824
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:50:40.000000000Z 10985852928
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:50:50.000000000Z 10992279552
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:51:00.000000000Z 11053568000
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:51:10.000000000Z 11092242432
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:51:20.000000000Z 11612774400
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:51:30.000000000Z 11131961344
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:51:40.000000000Z 11124805632
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:51:50.000000000Z 11332464640
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:52:00.000000000Z 11176923136
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:52:10.000000000Z 11181068288
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:52:20.000000000Z 11182579712
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:52:30.000000000Z 11238862848
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:52:40.000000000Z 11275296768
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:52:50.000000000Z 11225411584
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:53:00.000000000Z 11252690944
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:53:10.000000000Z 11227029504
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:53:20.000000000Z 11201646592
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:53:30.000000000Z 11227897856
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:53:40.000000000Z 11330428928
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:53:50.000000000Z 11347976192
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:54:00.000000000Z 11368271872
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:54:10.000000000Z 11269623808
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:54:20.000000000Z 11295637504
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:54:30.000000000Z 11354423296
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:54:40.000000000Z 11379687424
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:54:50.000000000Z 11248926720
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z used mem host1.local 2018-11-06T05:55:00.000000000Z 11292524544
```
{{% /truncate %}}
_For more information, see [join data requirements](/{{< latest "flux" >}}/join-data/#data-requirements)._
### Total processes variable
Define a `procTotal` variable that filters on the `processes` measurement and the `total` field.
This returns the number of running processes.
3. Use [`join.inner()`](/{{< latest "flux" >}}/stdlib/join/inner/) to join the two streams together.
Provide the following required parameters:
###### procTotal stream definition
```js
procTotal = from(bucket: "example-bucket")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "processes" and r._field == "total")
```
{{% truncate %}}
###### procTotal data output
```
Table: keys: [_start, _stop, _field, _measurement, host]
_start:time _stop:time _field:string _measurement:string host:string _time:time _value:int
------------------------------ ------------------------------ ---------------------- ---------------------- ------------------------ ------------------------------ --------------------------
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:50:00.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:50:10.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:50:20.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:50:30.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:50:40.000000000Z 469
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:50:50.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:51:00.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:51:10.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:51:20.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:51:30.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:51:40.000000000Z 469
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:51:50.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:52:00.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:52:10.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:52:20.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:52:30.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:52:40.000000000Z 472
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:52:50.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:53:00.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:53:10.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:53:20.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:53:30.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:53:40.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:53:50.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:54:00.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:54:10.000000000Z 470
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:54:20.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:54:30.000000000Z 473
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:54:40.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:54:50.000000000Z 471
2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z total processes host1.local 2018-11-06T05:55:00.000000000Z 471
```
{{% /truncate %}}
## Join the two data streams
With the two data streams defined, use the `join()` function to join them together.
`join()` requires two parameters:
##### `tables`
A map of tables to join with keys by which they will be aliased.
In the example below, `mem` is the alias for `memUsed` and `proc` is the alias for `procTotal`.
##### `on`
An array of strings defining the columns on which the tables will be joined.
_**Both tables must have all columns specified in this list.**_
- `left`: Stream of data representing the left side of the join.
- `right`: Stream of data representing the right side of the join.
- `on`: [Join predicate](/{{< latest "flux" >}}/join-data/#join-predicate-function-on).
For example: `(l, r) => l.column == r.column`.
- `as`: [Join output function](/{{< latest "flux" >}}/join-data/#join-output-function-as)
that returns a record with values from each input stream.
For example: `(l, r) => ({l with column1: r.column1, column2: r.column2})`.
```js
join(
tables: {mem:memUsed, proc:procTotal},
on: ["_time", "_stop", "_start", "host"],
import "join"
import "sql"
left =
from(bucket: "example-bucket-1")
|> range(start: "-1h")
|> filter(fn: (r) => r._measurement == "example-measurement")
|> filter(fn: (r) => r._field == "example-field")
right =
sql.from(
driverName: "postgres",
dataSourceName: "postgresql://username:password@localhost:5432",
query: "SELECT * FROM example_table",
)
join.inner(
left: left,
right: right,
on: (l, r) => l.column == r.column,
as: (l, r) => ({l with name: r.name, location: r.location}),
)
```
{{% truncate %}}
###### Joined output table
```
Table: keys: [_field_mem, _field_proc, _measurement_mem, _measurement_proc, _start, _stop, host]
_field_mem:string _field_proc:string _measurement_mem:string _measurement_proc:string _start:time _stop:time host:string _time:time _value_mem:int _value_proc:int
---------------------- ---------------------- ----------------------- ------------------------ ------------------------------ ------------------------------ ------------------------ ------------------------------ -------------------------- --------------------------
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:00.000000000Z 10956333056 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:10.000000000Z 11014008832 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:20.000000000Z 11373428736 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:30.000000000Z 11001421824 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:40.000000000Z 10985852928 469
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:50.000000000Z 10992279552 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:00.000000000Z 11053568000 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:10.000000000Z 11092242432 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:20.000000000Z 11612774400 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:30.000000000Z 11131961344 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:40.000000000Z 11124805632 469
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:50.000000000Z 11332464640 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:00.000000000Z 11176923136 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:10.000000000Z 11181068288 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:20.000000000Z 11182579712 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:30.000000000Z 11238862848 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:40.000000000Z 11275296768 472
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:50.000000000Z 11225411584 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:00.000000000Z 11252690944 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:10.000000000Z 11227029504 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:20.000000000Z 11201646592 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:30.000000000Z 11227897856 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:40.000000000Z 11330428928 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:50.000000000Z 11347976192 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:00.000000000Z 11368271872 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:10.000000000Z 11269623808 470
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:20.000000000Z 11295637504 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:30.000000000Z 11354423296 473
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:40.000000000Z 11379687424 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:50.000000000Z 11248926720 471
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:55:00.000000000Z 11292524544 471
```
{{% /truncate %}}
For more information and detailed examples, see [Perform an inner join](/{{< latest "flux" >}}/join-data/inner/)
in the Flux documentation.
Notice the output table includes the following columns:
{{% /tab-content %}}
<!--------------------------------- END Inner --------------------------------->
- `_field_mem`
- `_field_proc`
- `_measurement_mem`
- `_measurement_proc`
- `_value_mem`
- `_value_proc`
<!------------------------------ BEGIN Left outer ----------------------------->
{{% tab-content %}}
These represent the columns with values unique to the two input tables.
1. Import the `join` package.
2. Define the **left** and **right** data streams to join:
## Calculate and create a new table
With the two streams of data joined into a single table, use the
[`map()` function](/{{< latest "flux" >}}/stdlib/universe/map)
to build a new table by mapping the existing `_time` column to a new `_time`
column and dividing `_value_mem` by `_value_proc` and mapping it to a
new `_value` column.
- Each stream must have one or more columns with common values.
Column labels do not need to match, but column values do.
- Each stream should have identical [group keys](/{{< latest "flux" >}}/get-started/data-model/#group-key).
_For more information, see [join data requirements](/{{< latest "flux" >}}/join-data/#data-requirements)._
3. Use [`join.left()`](/{{< latest "flux" >}}/stdlib/join/left/) to join the two streams together.
Provide the following required parameters:
- `left`: Stream of data representing the left side of the join.
- `right`: Stream of data representing the right side of the join.
- `on`: [Join predicate](/{{< latest "flux" >}}/join-data/#join-predicate-function-on).
For example: `(l, r) => l.column == r.column`.
- `as`: [Join output function](/{{< latest "flux" >}}/join-data/#join-output-function-as)
that returns a record with values from each input stream.
For example: `(l, r) => ({l with column1: r.column1, column2: r.column2})`.
```js
join(tables: {mem: memUsed, proc: procTotal}, on: ["_time", "_stop", "_start", "host"])
|> map(fn: (r) => ({_time: r._time, _value: r._value_mem / r._value_proc}))
import "join"
import "sql"
left =
from(bucket: "example-bucket-1")
|> range(start: "-1h")
|> filter(fn: (r) => r._measurement == "example-measurement")
|> filter(fn: (r) => r._field == "example-field")
right =
sql.from(
driverName: "postgres",
dataSourceName: "postgresql://username:password@localhost:5432",
query: "SELECT * FROM example_table",
)
join.left(
left: left,
right: right,
on: (l, r) => l.column == r.column,
as: (l, r) => ({l with name: r.name, location: r.location}),
)
```
{{% truncate %}}
###### Mapped table
```
Table: keys: [_field_mem, _field_proc, _measurement_mem, _measurement_proc, _start, _stop, host]
_field_mem:string _field_proc:string _measurement_mem:string _measurement_proc:string _start:time _stop:time host:string _time:time _value:int
---------------------- ---------------------- ----------------------- ------------------------ ------------------------------ ------------------------------ ------------------------ ------------------------------ --------------------------
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:00.000000000Z 23311346
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:10.000000000Z 23434061
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:20.000000000Z 24147407
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:30.000000000Z 23407280
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:40.000000000Z 23423993
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:50:50.000000000Z 23338173
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:00.000000000Z 23518229
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:10.000000000Z 23600515
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:20.000000000Z 24708030
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:30.000000000Z 23685024
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:40.000000000Z 23720267
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:51:50.000000000Z 24060434
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:00.000000000Z 23730197
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:10.000000000Z 23789506
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:20.000000000Z 23792722
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:30.000000000Z 23861704
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:40.000000000Z 23888340
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:52:50.000000000Z 23833145
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:00.000000000Z 23941895
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:10.000000000Z 23887296
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:20.000000000Z 23833290
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:30.000000000Z 23838424
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:40.000000000Z 24056112
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:53:50.000000000Z 24093367
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:00.000000000Z 24136458
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:10.000000000Z 23977922
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:20.000000000Z 23982245
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:30.000000000Z 24005123
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:40.000000000Z 24160695
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:54:50.000000000Z 23883071
used total mem processes 2018-11-06T05:50:00.000000000Z 2018-11-06T05:55:00.000000000Z Scotts-MacBook-Pro.local 2018-11-06T05:55:00.000000000Z 23975635
```
{{% /truncate %}}
For more information and detailed examples, see [Perform a left outer join](/{{< latest "flux" >}}/join-data/left-outer/)
in the Flux documentation.
This table represents the average amount of memory in bytes per running process.
{{% /tab-content %}}
<!------------------------------- END Left outer ------------------------------>
<!----------------------------- BEGIN Right outer ----------------------------->
{{% tab-content %}}
## Real world example
The following function calculates the batch sizes written to an InfluxDB cluster by joining
fields from `httpd` and `write` measurements in order to compare `pointReq` and `writeReq`.
The results are grouped by cluster ID so you can make comparisons across clusters.
1. Import the `join` package.
2. Define the **left** and **right** data streams to join:
- Each stream must have one or more columns with common values.
Column labels do not need to match, but column values do.
- Each stream should have identical [group keys](/{{< latest "flux" >}}/get-started/data-model/#group-key).
_For more information, see [join data requirements](/{{< latest "flux" >}}/join-data/#data-requirements)._
3. Use [`join.right()`](/{{< latest "flux" >}}/stdlib/join/right/) to join the two streams together.
Provide the following required parameters:
- `left`: Stream of data representing the left side of the join.
- `right`: Stream of data representing the right side of the join.
- `on`: [Join predicate](/{{< latest "flux" >}}/join-data/#join-predicate-function-on).
For example: `(l, r) => l.column == r.column`.
- `as`: [Join output function](/{{< latest "flux" >}}/join-data/#join-output-function-as)
that returns a record with values from each input stream.
For example: `(l, r) => ({l with column1: r.column1, column2: r.column2})`.
```js
batchSize = (cluster_id, start=-1m, interval=10s) => {
httpd = from(bucket: "telegraf")
|> range(start: start)
|> filter(fn: (r) => r._measurement == "influxdb_httpd" and r._field == "writeReq" and r.cluster_id == cluster_id)
|> aggregateWindow(every: interval, fn: mean)
|> derivative(nonNegative: true, unit: 60s)
import "join"
import "sql"
write = from(bucket: "telegraf")
|> range(start: start)
|> filter(fn: (r) => r._measurement == "influxdb_write" and r._field == "pointReq" and r.cluster_id == cluster_id)
|> aggregateWindow(every: interval, fn: max)
|> derivative(nonNegative: true, unit: 60s)
left =
from(bucket: "example-bucket-1")
|> range(start: "-1h")
|> filter(fn: (r) => r._measurement == "example-measurement")
|> filter(fn: (r) => r._field == "example-field")
return join(tables: {httpd: httpd, write: write}, on: ["_time", "_stop", "_start", "host"])
|> map(fn: (r) => ({_time: r._time, _value: r._value_httpd / r._value_write}))
|> group(columns: cluster_id)
}
right =
sql.from(
driverName: "postgres",
dataSourceName: "postgresql://username:password@localhost:5432",
query: "SELECT * FROM example_table",
)
batchSize(cluster_id: "enter cluster id here")
join.right(
left: left,
right: right,
on: (l, r) => l.column == r.column,
as: (l, r) => ({l with name: r.name, location: r.location}),
)
```
For more information and detailed examples, see [Perform a right outer join](/{{< latest "flux" >}}/join-data/right-outer/)
in the Flux documentation.
{{% /tab-content %}}
<!------------------------------ END Right outer ------------------------------>
<!------------------------------ BEGIN Full outer ----------------------------->
{{% tab-content %}}
1. Import the `join` package.
2. Define the **left** and **right** data streams to join:
- Each stream must have one or more columns with common values.
Column labels do not need to match, but column values do.
- Each stream should have identical [group keys](/{{< latest "flux" >}}/get-started/data-model/#group-key).
_For more information, see [join data requirements](/{{< latest "flux" >}}/join-data/#data-requirements)._
3. Use [`join.full()`](/{{< latest "flux" >}}/stdlib/join/full/) to join the two streams together.
Provide the following required parameters:
- `left`: Stream of data representing the left side of the join.
- `right`: Stream of data representing the right side of the join.
- `on`: [Join predicate](/{{< latest "flux" >}}/join-data/#join-predicate-function-on).
For example: `(l, r) => l.column == r.column`.
- `as`: [Join output function](/{{< latest "flux" >}}/join-data/#join-output-function-as)
that returns a record with values from each input stream.
For example: `(l, r) => ({l with column1: r.column1, column2: r.column2})`.
{{% note %}}
Full outer joins must account for non-group-key columns in both `l` and `r`
records being null. Use conditional logic to check which record contains non-null
values for columns not in the group key.
For more information, see [Account for missing, non-group-key values](/{{< latest "flux" >}}/join-data/full-outer/#account-for-missing-non-group-key-values).
{{% /note %}}
```js
import "join"
import "sql"
left =
from(bucket: "example-bucket-1")
|> range(start: "-1h")
|> filter(fn: (r) => r._measurement == "example-measurement")
|> filter(fn: (r) => r._field == "example-field")
right =
sql.from(
driverName: "postgres",
dataSourceName: "postgresql://username:password@localhost:5432",
query: "SELECT * FROM example_table",
)
join.full(
left: left,
right: right,
on: (l, r) => l.id== r.id,
as: (l, r) => {
id = if exists l.id then l.id else r.id
return {name: l.name, location: r.location, id: id}
},
)
```
For more information and detailed examples, see [Perform a full outer join](/{{< latest "flux" >}}/join-data/full-outer/)
in the Flux documentation.
{{% /tab-content %}}
<!------------------------------- END Full outer ------------------------------>
<!----------------------------- BEGIN Join on time ---------------------------->
{{% tab-content %}}
1. Import the `join` package.
2. Define the **left** and **right** data streams to join:
- Each stream must also have a `_time` column.
- Each stream must have one or more columns with common values.
Column labels do not need to match, but column values do.
- Each stream should have identical [group keys](/{{< latest "flux" >}}/get-started/data-model/#group-key).
_For more information, see [join data requirements](/{{< latest "flux" >}}/join-data/#data-requirements)._
3. Use [`join.time()`](/{{< latest "flux" >}}/stdlib/join/time/) to join the two streams
together based on time values.
Provide the following parameters:
- `left`: ({{< req >}}) Stream of data representing the left side of the join.
- `right`: ({{< req >}}) Stream of data representing the right side of the join.
- `as`: ({{< req >}}) [Join output function](/{{< latest "flux" >}}/join-data/#join-output-function-as)
that returns a record with values from each input stream.
For example: `(l, r) => ({r with column1: l.column1, column2: l.column2})`.
- `method`: Join method to use. Default is `inner`.
```js
import "join"
import "sql"
left =
from(bucket: "example-bucket-1")
|> range(start: "-1h")
|> filter(fn: (r) => r._measurement == "example-m1")
|> filter(fn: (r) => r._field == "example-f1")
right =
from(bucket: "example-bucket-2")
|> range(start: "-1h")
|> filter(fn: (r) => r._measurement == "example-m2")
|> filter(fn: (r) => r._field == "example-f2")
join.time(method: "left", left: left, right: right, as: (l, r) => ({l with f2: r._value}))
```
For more information and detailed examples, see [Join on time](/{{< latest "flux" >}}/join-data/time/)
in the Flux documentation.
{{% /tab-content %}}
<!--------------------------=--- END Join on time -------------=--------------->
{{< /tabs-wrapper >}}
---
## When to use union and pivot instead of join functions
We recommend using the `join` package to join streams that have mostly different
schemas or that come from two separate data sources.
If you're joining two datasets queried from InfluxDB, using
[`union()`](/{{< latest "flux" >}}/stdlib/universe/union/) and [`pivot()`](/{{< latest "flux" >}}/stdlib/universe/pivot/)
to combine the data will likely be more performant.
For example, if you need to query fields from different InfluxDB buckets and align
field values in each row based on time:
```js
f1 =
from(bucket: "example-bucket-1")
|> range(start: "-1h")
|> filter(fn: (r) => r._field == "f1")
|> drop(columns: "_measurement")
f2 =
from(bucket: "example-bucket-2")
|> range(start: "-1h")
|> filter(fn: (r) => r._field == "f2")
|> drop(columns: "_measurement")
union(tables: [f1, f2])
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
```
{{< expand-wrapper >}}
{{% expand "View example input and output" %}}
#### Input
{{< flex >}}
{{% flex-content %}}
##### f1
| _time | _field | _value |
| :------------------- | :----- | -----: |
| 2020-01-01T00:01:00Z | f1 | 1 |
| 2020-01-01T00:02:00Z | f1 | 2 |
| 2020-01-01T00:03:00Z | f1 | 1 |
| 2020-01-01T00:04:00Z | f1 | 3 |
{{% /flex-content %}}
{{% flex-content %}}
##### f2
| _time | _field | _value |
| :------------------- | :----- | -----: |
| 2020-01-01T00:01:00Z | f2 | 5 |
| 2020-01-01T00:02:00Z | f2 | 12 |
| 2020-01-01T00:03:00Z | f2 | 8 |
| 2020-01-01T00:04:00Z | f2 | 6 |
{{% /flex-content %}}
{{< /flex >}}
#### Output
| _time | f1 | f2 |
| :------------------- | --: | --: |
| 2020-01-01T00:01:00Z | 1 | 5 |
| 2020-01-01T00:02:00Z | 2 | 12 |
| 2020-01-01T00:03:00Z | 1 | 8 |
| 2020-01-01T00:04:00Z | 3 | 6 |
{{% /expand %}}
{{< /expand-wrapper >}}

View File

@ -241,39 +241,66 @@ join:
-
code: |
```js
t1 = from(bucket: "example-bucket")
|> range(start: 2020-01-01T00:00:00Z)
|> filter(fn: (r) => r.m == "foo")
import "join"
import "sql"
t2 = from(bucket: "example-bucket")
|> range(start: 2020-01-01T00:00:00Z)
|> filter(fn: (r) => r.m == "bar")
left =
from(bucket: "example-bucket-1")
|> range(start: "-1h")
|> filter(fn: (r) => r._measurement == "example-m")
|> filter(fn: (r) => r._field == "example-f")
|> drop(columns: ["_measurement", "_field"])
join(tables: {t1: t1, t2: t2}, on: ["_time"])
right =
sql.from(
driverName: "postgres",
dataSourceName: "postgresql://username:password@localhost:5432",
query: "SELECT * FROM example_table",
)
join.inner(
left: left |> group(),
right: right,
on: (l, r) => l.sensorID == r.ID,
as: (l, r) => ({l with expired: r.expired}),
)
|> group(columns: ["_time", "_value"], mode: "except")
```
input: |
###### t1
| _time | _value |
|:----- | ------:|
| 2020-01-01T00:01:00Z | 1 |
| 2020-01-01T00:02:00Z | 2 |
| 2020-01-01T00:03:00Z | 1 |
| 2020-01-01T00:04:00Z | 3 |
###### left
| _time | sensorID | _value |
|:----- | :------- | ------:|
| 2020-01-01T00:01:00Z | 1234 | 1 |
| 2020-01-01T00:02:00Z | 1234 | 2 |
| 2020-01-01T00:03:00Z | 1234 | 1 |
| 2020-01-01T00:04:00Z | 1234 | 3 |
###### t2
| _time | _value |
|:----- | ------:|
| 2020-01-01T00:01:00Z | 5 |
| 2020-01-01T00:02:00Z | 2 |
| 2020-01-01T00:03:00Z | 3 |
| 2020-01-01T00:04:00Z | 4 |
| _time | sensorID | _value |
|:----- | :------- | ------:|
| 2020-01-01T00:01:00Z | 5678 | 2 |
| 2020-01-01T00:02:00Z | 5678 | 5 |
| 2020-01-01T00:03:00Z | 5678 | 1 |
| 2020-01-01T00:04:00Z | 5678 | 8 |
###### right
| ID | expired | serviced |
|:---- | :------ | ---------: |
| 1234 | false | 2022-01-01 |
| 5678 | true | 2022-01-01 |
output: |
| _time | _value_t1 | _value_t2 |
|:----- | ---------:| ---------:|
| 2020-01-01T00:01:00Z | 1 | 5 |
| 2020-01-01T00:02:00Z | 2 | 2 |
| 2020-01-01T00:03:00Z | 1 | 3 |
| 2020-01-01T00:04:00Z | 3 | 4 |
| _time | sensorID | _value | expired |
| :------------------- | :------- | -----: | :------ |
| 2020-01-01T00:01:00Z | 1234 | 1 | false |
| 2020-01-01T00:02:00Z | 1234 | 2 | false |
| 2020-01-01T00:03:00Z | 1234 | 1 | false |
| 2020-01-01T00:04:00Z | 1234 | 3 | false |
| _time | sensorID | _value | expired |
| :------------------- | :------- | -----: | :------ |
| 2020-01-01T00:01:00Z | 5678 | 2 | true |
| 2020-01-01T00:02:00Z | 5678 | 5 | true |
| 2020-01-01T00:03:00Z | 5678 | 1 | true |
| 2020-01-01T00:04:00Z | 5678 | 8 | true |
map_math:
-