docs-v2/content/flux/v0/join-data/full-outer.md

11 KiB
Raw Permalink Blame History

title description menu weight related list_code_example
Perform a full outer join Use [`join.full()`](/flux/v0/stdlib/join/full/) to perform an full outer join of two streams of data. Full outer joins output a row for all rows in both the **left** and **right** input streams and join rows that match according to the `on` predicate.
flux_v0
name parent
Full outer join Join data
103
/flux/v0/join-data/troubleshoot-joins/
/flux/v0/stdlib/join/
/flux/v0/stdlib/join/full/
```js import "join" left = from(bucket: "example-bucket-1") |> //... right = from(bucket: "example-bucket-2") |> //... join.full( left: left, right: right, on: (l, r) => l.id== r.id, as: (l, r) => { id = if exists l.id then l.id else r.id return {name: l.name, location: r.location, id: id} }, ) ```

Use join.full() to perform an full outer join of two streams of data. Full outer joins output a row for all rows in both the left and right input streams and join rows that match according to the on predicate.

{{< svg svg="static/svgs/join-diagram.svg" class="full" >}}

{{< expand-wrapper >}} {{% expand "View table illustration of a full outer join" %}} {{< flex >}} {{% flex-content "third" %}}

left

r1
r2
{{% /flex-content %}}
{{% flex-content "third" %}}

right

r1
r3
r4
{{% /flex-content %}}
{{% flex-content "third" %}}

Full outer join result

r1
r2
r3
r4
{{% /flex-content %}}
{{< /flex >}}
{{% /expand %}}
{{< /expand-wrapper >}}

Use join.full to join your data

  1. Import the join package.

  2. Define the left and right data streams to join:

    • Each stream must have one or more columns with common values. Column labels do not need to match, but column values do.
    • Each stream should have identical group keys.

    For more information, see join data requirements.

  3. Use join.full() to join the two streams together. Provide the following required parameters:

    • left: Stream of data representing the left side of the join.

    • right: Stream of data representing the right side of the join.

    • on: Join predicate. For example: (l, r) => l.column == r.column.

    • as: Join output function that returns a record with values from each input stream.

      Account for missing, non-group-key values

      In a full outer join, its possible for either the left (l) or right (r) to contain null values for the columns used in the join operation and default to a default record (group key columns are populated and other columns are null). l and r will never both use default records at the same time.

      To ensure non-null values are included in the output for non-group-key columns, check for the existence of a value in the l or r record, and return the value that exists:

      (l, r) => {
          id = if exists l.id then l.id else r.id
      
          return {_time: l.time, location: r.location, id: id}
      }
      

The following example uses a filtered selection from the machineProduction sample data set as the left data stream and an ad-hoc table created with array.from() as the right data stream.

{{% note %}}

Example data grouping

The example below ungroups the left stream to match the grouping of the right stream. After the two streams are joined together, the joined data is grouped by stationID and sorted by _time. {{% /note %}}

import "array"
import "influxdata/influxdb/sample"
import "join"

left =
    sample.data(set: "machineProduction")
        |> filter(fn: (r) => r.stationID == "g1" or r.stationID == "g2" or r.stationID == "g3")
        |> filter(fn: (r) => r._field == "oil_temp")
        |> limit(n: 5)

right =
    array.from(
        rows: [
            {station: "g1", opType: "auto", last_maintained: 2021-07-15T00:00:00Z},
            {station: "g2", opType: "manned", last_maintained: 2021-07-02T00:00:00Z},
            {station: "g4", opType: "auto", last_maintained: 2021-08-04T00:00:00Z},
        ],
    )

join.full(
    left: left |> group(),
    right: right,
    on: (l, r) => l.stationID == r.station,
    as: (l, r) => {
        stationID = if exists l.stationID then l.stationID else r.station

        return {
            stationID: stationID,
            _time: l._time,
            _field: l._field,
            _value: l._value,
            opType: r.opType,
            maintained: r.last_maintained,
        }
    },
)
    |> group(columns: ["stationID"])
    |> sort(columns: ["_time"])

{{< expand-wrapper >}} {{% expand "View example input and output data" %}}

Input

left

{{% note %}} _start and _stop columns have been omitted. {{% /note %}}

_time _measurement stationID _field _value
2021-08-01T00:00:00Z machinery g1 oil_temp 39.1
2021-08-01T00:00:11.51Z machinery g1 oil_temp 40.3
2021-08-01T00:00:19.53Z machinery g1 oil_temp 40.6
2021-08-01T00:00:25.1Z machinery g1 oil_temp 40.72
2021-08-01T00:00:36.88Z machinery g1 oil_temp 40.8
_time _measurement stationID _field _value
2021-08-01T00:00:00Z machinery g2 oil_temp 40.6
2021-08-01T00:00:27.93Z machinery g2 oil_temp 40.6
2021-08-01T00:00:54.96Z machinery g2 oil_temp 40.6
2021-08-01T00:01:17.27Z machinery g2 oil_temp 40.6
2021-08-01T00:01:41.84Z machinery g2 oil_temp 40.6
_time _measurement stationID _field _value
2021-08-01T00:00:00Z machinery g3 oil_temp 41.4
2021-08-01T00:00:14.46Z machinery g3 oil_temp 41.36
2021-08-01T00:00:25.29Z machinery g3 oil_temp 41.4
2021-08-01T00:00:38.77Z machinery g3 oil_temp 41.4
2021-08-01T00:00:51.2Z machinery g3 oil_temp 41.4

right

station opType last_maintained
g1 auto 2021-07-15T00:00:00Z
g2 manned 2021-07-02T00:00:00Z
g4 auto 2021-08-04T00:00:00Z

Output

_time stationID _field _value maintained opType
2021-08-01T00:00:00Z g1 oil_temp 39.1 2021-07-15T00:00:00Z auto
2021-08-01T00:00:11.51Z g1 oil_temp 40.3 2021-07-15T00:00:00Z auto
2021-08-01T00:00:19.53Z g1 oil_temp 40.6 2021-07-15T00:00:00Z auto
2021-08-01T00:00:25.1Z g1 oil_temp 40.72 2021-07-15T00:00:00Z auto
2021-08-01T00:00:36.88Z g1 oil_temp 40.8 2021-07-15T00:00:00Z auto
_time stationID _field _value maintained opType
2021-08-01T00:00:00Z g2 oil_temp 40.6 2021-07-02T00:00:00Z manned
2021-08-01T00:00:27.93Z g2 oil_temp 40.6 2021-07-02T00:00:00Z manned
2021-08-01T00:00:54.96Z g2 oil_temp 40.6 2021-07-02T00:00:00Z manned
2021-08-01T00:01:17.27Z g2 oil_temp 40.6 2021-07-02T00:00:00Z manned
2021-08-01T00:01:41.84Z g2 oil_temp 40.6 2021-07-02T00:00:00Z manned
_time stationID _field _value maintained opType
2021-08-01T00:00:00Z g3 oil_temp 41.4
2021-08-01T00:00:14.46Z g3 oil_temp 41.36
2021-08-01T00:00:25.29Z g3 oil_temp 41.4
2021-08-01T00:00:38.77Z g3 oil_temp 41.4
2021-08-01T00:00:51.2Z g3 oil_temp 41.4
_time stationID _field _value maintained opType
g4 2021-08-04T00:00:00Z auto

Things to note about the join output

  • Because the right stream does not have rows with the g3 stationID tag, the joined output includes rows with the g3 stationID tag from the left stream with null values in columns populated from the right stream.
  • Because the left stream does not have rows with the g4 stationID tag, the joined output includes rows with the g4 stationID tag from the right stream with null values in columns populated from the left stream.

{{% /expand %}} {{< /expand-wrapper >}}