docs-v2

11 KiB

Raw Permalink Blame History

title

description

weight

list_code_example

Perform a full outer join

Use [`join.full()`](/flux/v0/stdlib/join/full/) to perform an full outer join of two streams of data. Full outer joins output a row for all rows in both the **left** and **right** input streams and join rows that match according to the `on` predicate.

flux_v0

name	parent
Full outer join	Join data

103

/flux/v0/join-data/troubleshoot-joins/

/flux/v0/stdlib/join/

/flux/v0/stdlib/join/full/

```js import "join" left = from(bucket: "example-bucket-1") |> //... right = from(bucket: "example-bucket-2") |> //... join.full( left: left, right: right, on: (l, r) => l.id== r.id, as: (l, r) => { id = if exists l.id then l.id else r.id return {name: l.name, location: r.location, id: id} }, ) ```

Use join.full() to perform an full outer join of two streams of data. Full outer joins output a row for all rows in both the left and right input streams and join rows that match according to the on predicate.

{{< expand-wrapper >}} {{% expand "View table illustration of a full outer join" %}} {{< flex >}} {{% flex-content "third" %}}

left


r1	●	●
r2	●	●
{{% /flex-content %}}
{{% flex-content "third" %}}

right


r1	▲	▲
r3	▲	▲
r4	▲	▲
{{% /flex-content %}}
{{% flex-content "third" %}}

Full outer join result


r1	●	●	▲	▲
r2	●	●
r3			▲	▲
r4			▲	▲
{{% /flex-content %}}
{{< /flex >}}
{{% /expand %}}
{{< /expand-wrapper >}}

Use join.full to join your data

Import the join package.
Define the left and right data streams to join:
- Each stream must have one or more columns with common values. Column labels do not need to match, but column values do.
- Each stream should have identical group keys.
For more information, see join data requirements.
Use join.full() to join the two streams together. Provide the following required parameters:
- left: Stream of data representing the left side of the join.
- right: Stream of data representing the right side of the join.
- on: Join predicate. For example: (l, r) => l.column == r.column.
- as: Join output function that returns a record with values from each input stream.
  
  Account for missing, non-group-key values
  
  In a full outer join, it’s possible for either the left (l) or right (r) to contain null values for the columns used in the join operation and default to a default record (group key columns are populated and other columns are null). l and r will never both use default records at the same time.
  
  To ensure non-null values are included in the output for non-group-key columns, check for the existence of a value in the l or r record, and return the value that exists:
```
(l, r) => {
    id = if exists l.id then l.id else r.id

    return {_time: l.time, location: r.location, id: id}
}
```

The following example uses a filtered selection from the machineProduction sample data set as the left data stream and an ad-hoc table created with array.from() as the right data stream.

Example data grouping

The example below ungroups the left stream to match the grouping of the right stream. After the two streams are joined together, the joined data is grouped by stationID and sorted by _time. {{% /note %}}

import "array"
import "influxdata/influxdb/sample"
import "join"

left =
    sample.data(set: "machineProduction")
        |> filter(fn: (r) => r.stationID == "g1" or r.stationID == "g2" or r.stationID == "g3")
        |> filter(fn: (r) => r._field == "oil_temp")
        |> limit(n: 5)

right =
    array.from(
        rows: [
            {station: "g1", opType: "auto", last_maintained: 2021-07-15T00:00:00Z},
            {station: "g2", opType: "manned", last_maintained: 2021-07-02T00:00:00Z},
            {station: "g4", opType: "auto", last_maintained: 2021-08-04T00:00:00Z},
        ],
    )

join.full(
    left: left |> group(),
    right: right,
    on: (l, r) => l.stationID == r.station,
    as: (l, r) => {
        stationID = if exists l.stationID then l.stationID else r.station

        return {
            stationID: stationID,
            _time: l._time,
            _field: l._field,
            _value: l._value,
            opType: r.opType,
            maintained: r.last_maintained,
        }
    },
)
    |> group(columns: ["stationID"])
    |> sort(columns: ["_time"])

Input

left

{{% note %}} _start and _stop columns have been omitted. {{% /note %}}

_time	_measurement	stationID	_field	_value
2021-08-01T00:00:00Z	machinery	g1	oil_temp	39.1
2021-08-01T00:00:11.51Z	machinery	g1	oil_temp	40.3
2021-08-01T00:00:19.53Z	machinery	g1	oil_temp	40.6
2021-08-01T00:00:25.1Z	machinery	g1	oil_temp	40.72
2021-08-01T00:00:36.88Z	machinery	g1	oil_temp	40.8

_time	_measurement	stationID	_field	_value
2021-08-01T00:00:00Z	machinery	g2	oil_temp	40.6
2021-08-01T00:00:27.93Z	machinery	g2	oil_temp	40.6
2021-08-01T00:00:54.96Z	machinery	g2	oil_temp	40.6
2021-08-01T00:01:17.27Z	machinery	g2	oil_temp	40.6
2021-08-01T00:01:41.84Z	machinery	g2	oil_temp	40.6

_time	_measurement	stationID	_field	_value
2021-08-01T00:00:00Z	machinery	g3	oil_temp	41.4
2021-08-01T00:00:14.46Z	machinery	g3	oil_temp	41.36
2021-08-01T00:00:25.29Z	machinery	g3	oil_temp	41.4
2021-08-01T00:00:38.77Z	machinery	g3	oil_temp	41.4
2021-08-01T00:00:51.2Z	machinery	g3	oil_temp	41.4

right

station	opType	last_maintained
g1	auto	2021-07-15T00:00:00Z
g2	manned	2021-07-02T00:00:00Z
g4	auto	2021-08-04T00:00:00Z

Output

_time	stationID	_field	_value	maintained	opType
2021-08-01T00:00:00Z	g1	oil_temp	39.1	2021-07-15T00:00:00Z	auto
2021-08-01T00:00:11.51Z	g1	oil_temp	40.3	2021-07-15T00:00:00Z	auto
2021-08-01T00:00:19.53Z	g1	oil_temp	40.6	2021-07-15T00:00:00Z	auto
2021-08-01T00:00:25.1Z	g1	oil_temp	40.72	2021-07-15T00:00:00Z	auto
2021-08-01T00:00:36.88Z	g1	oil_temp	40.8	2021-07-15T00:00:00Z	auto

_time	stationID	_field	_value	maintained	opType
2021-08-01T00:00:00Z	g2	oil_temp	40.6	2021-07-02T00:00:00Z	manned
2021-08-01T00:00:27.93Z	g2	oil_temp	40.6	2021-07-02T00:00:00Z	manned
2021-08-01T00:00:54.96Z	g2	oil_temp	40.6	2021-07-02T00:00:00Z	manned
2021-08-01T00:01:17.27Z	g2	oil_temp	40.6	2021-07-02T00:00:00Z	manned
2021-08-01T00:01:41.84Z	g2	oil_temp	40.6	2021-07-02T00:00:00Z	manned

_time	stationID	_field	_value
2021-08-01T00:00:00Z	g3	oil_temp	41.4
2021-08-01T00:00:14.46Z	g3	oil_temp	41.36
2021-08-01T00:00:25.29Z	g3	oil_temp	41.4
2021-08-01T00:00:38.77Z	g3	oil_temp	41.4
2021-08-01T00:00:51.2Z	g3	oil_temp	41.4

_time	stationID	_field	_value	maintained	opType
	g4			2021-08-04T00:00:00Z	auto

Things to note about the join output

Because the right stream does not have rows with the g3 stationID tag, the joined output includes rows with the g3 stationID tag from the left stream with null values in columns populated from the right stream.
Because the left stream does not have rows with the g4 stationID tag, the joined output includes rows with the g4 stationID tag from the right stream with null values in columns populated from the left stream.

11 KiB Raw Permalink Blame History Unescape Escape

left

right

Full outer join result

Use join.full to join your data

Account for missing, non-group-key values

Example data grouping

Input

left

right

Output

Things to note about the join output

11 KiB

Raw Permalink Blame History