Commit Graph

48 Commits (178483c1a0de928b47653c091b71838e3f1c025d)

Author SHA1 Message Date
Marco Neumann 335d9f7357
chore: minimize proptest features (#7993) 2023-06-14 12:28:18 +00:00
Dom Dwyer 050093df1e
feat: truncate partition key parts at 200 bytes
This commit ensures all partition key parts are less than or equal to
200 bytes long.

If a string exceeds the 200 byte limit, it is truncated (avoiding
splitting unicode code-points or graphemes) and then a single "#"
sentinel value is appended. When reversed from the string, these column
values are indicated to be suitable for prefix-matching only - a
property that is encoded into the type system.

This commit takes a conservative approach of not splitting graphemes as
outlined in the module documentation, but this could be relaxed in the
future if needed.
2023-06-09 13:44:32 +02:00
Dom Dwyer 47214ec9a0
fix: prevent panics in partitioning logic
Changes the partitioning logic to be fallible. This prevents an invalid
partition template from causing a panic, previously possible through two
known code paths:

    * TagValue formatter referencing a non-tag column
    * Time formatter using an invalid strftime format string

If either occurs, the write attempt is now aborted and an error returned
to the user with a HTTP 500 status code.

Additionally unexpected partitioner errors now map to a catch-all error
instead of panicking.
2023-06-01 17:44:44 +02:00
Dom Dwyer 37bb5e0585
test: arbitrary reversible partition keys
This test constructs a partition key from an arbitrary selection of
pre-defined parts, and uses the resulting template to partition a write
containing an arbitrary selection of pre-defined tag columns.

Once a partition key is derived, the test asserts build_column_values()
reverses it into the original set of tag (column_name, value) tuples
present in the write.
2023-05-30 15:58:26 +02:00
Dom Dwyer 27bef292a3
feat: unambiguously reversible partition keys
This commit changes the format of partition keys when generated with
non-default partition key templates ONLY. A prior fixture test is
unchanged by this commit, ensuring the default partition keys remain
the same.

When a custom partition key template is provided, it may specify one or
more parts, with the TagValue template causing values extracted from tag
columns to appear in the derived partition key.

This commit changes the generated partition key in the following ways:

    * The delimiter of multi-part partition keys; the character used to
      delimit partition key parts is changed from "/" to "|" (the pipe
      character) as it is less likely to occur in user-provided input,
      reducing the encoding overhead.

    * The format of the extracted TagValue values (see below).

Building on the work of custom partition key overrides, where an
immutable partition template is resolved and set at table creation time,
the changes in this PR enable the derived partition key to be
unambiguously reversed into the set of tag (column_name, column_value)
tuples it was generated from for use in query pruning logic. This is
implemented by the build_column_values() method in this commit, which
requires both the template, and the derived partition key.

Prior to this commit, a partition key value extracted from a tag column
was in the form "tagname_x" where "x" is the value and "tagname" is the
name of the tag column it was extracted from. After this commit, the
partition key value is in the form "x"; the column name is removed from
the derived string to reduce the catalog storage overhead (a key driver
of COGS). In the case of a NULL tag value, the sentinel value "!" is
inserted instead of the prior "tagname_" marker. In the case of an empty
string tag value (""), the sentinel "^" value is inserted instead of the
"tagname_-" marker, ensuring the distinction between an empty value and
a not-present tag is preserved.

Additionally tag values utilise percent encoding to encode reserved
characters (part delimiter, empty sentinel character, % itself) to
eliminate deserialisation ambiguity.

Examples of how this has changed derived partition keys, for a template
of [Time(YYYY-MM-DD), TagValue(region), TagValue(bananas)]:

    Write: time=1970-01-01,region=west,other=ignored
        Old: "1970-01-01-region_west-bananas"
        New: "1970-01-01|west|!"

    Write: time=1970-01-01,other=ignored
        Old: "1970-01-01-region-bananas"
        New: "1970-01-01|!|!"
2023-05-30 15:58:25 +02:00
Carol (Nichols || Goulding) faae5eb438 chore: Rerun cargo hakari manage-deps 2023-02-27 11:56:15 +01:00
Paul Dix 84698b3532
feat: add size_data to mutable batch (#6425)
This method will be used in the new ingestion pipeline to approximate how much memory a butable batch will take to convert to arrow and persist. It is meant only as a very rough estimate to trigger persistence for hot partitions.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-16 17:20:16 +00:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Carol (Nichols || Goulding) 3145e2c05b
feat: Use workspace dep inheritance for the arrow crate 2022-10-26 10:34:29 -04:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Andrew Lamb d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900)
* chore: Update datafusion and  `arrow` / `parquet` / `arrow-flight` 25.0.0

* chore: Update for structure changes

* chore: Update for new projection pushdown

* chore: Run cargo hakari tasks

* fix: fmt

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
Andrew Lamb d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792)
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0

* fix: Update for coercion, fix explain plans for change in column name display

* chore: Update datafusion lock

* fix: Update for other API changes

* chore: Update to latest datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00
Andrew Lamb 66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight`  to 23.0.0

* chore: Update thrift / remove parquet_format

* fix: Update APIs

* chore: Update lock + Run cargo hakari tasks

* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
Andrew Lamb 1fd31ee3bf
chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591)
* chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0

* fix: enable dynamic comparison flag

* chore: derive Eq for clippy

* chore: update explain plans

* chore: Update sizes for ReadBuffer encoding

* chore: update more tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 17:45:03 +00:00
Andrew Lamb 6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` (#5519)
* chore: Update arrow/arrow-flight/parquet to 21.0.0

* chore: Update datafusion pin

* chore: Fix arrow update script

* chore: Update Cargo.lock

* chore: Update for new API
2022-08-31 13:30:47 +00:00
Andrew Lamb 16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360)
* chore: Update datafusion and arrow

* chore: Update Cargo.lock

* chore: update to Decimal128

* chore: Update tonic/prost/pbjson/etc

* chore: Run cargo hakari tasks

* fix: doctest in generated types

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Andrew Lamb 9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Andrew Lamb e2d871b00b
chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079)
* chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18

* chore: Run cargo hakari tasks

* fix: update cargo pin

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 15:01:03 +00:00
Andrew Lamb c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021)
* fix: correct nullability declaration of system tables

* chore: Update datafusion and arrow/parquet/arrow-flight

* chore: Run cargo hakari tasks

* fix: Update tests

* fix: Update tests

* fix: predicate pruning

* fix: add some tests

* fix: query_functions

* fix: fix read_buffer test

* fix: fix clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Andrew Lamb e91d00b10c
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851)
* chore: TEMP Update DataFusion to pre-release

* chore: update arrow et al to 16.0.0

* chore: Run cargo hakari tasks

* fix: update reader read_dictionary API

* chore: Update to real Datafusion release

* fix: Update parquet API

* fix: update test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-14 16:31:40 +00:00
Andrew Lamb 3592aa52d8
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743)
* chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0`

* chore: Update APIs

* chore: Run cargo hakari tasks

* feat: normalize parquet file metadata

* chore: update size tests

* chore: add docs on metadata stripping

* chore: TEMP UPDATE TO DF BRANCH

* chore: Update for new API

* fix: Update to latest DF

* fix: cargo hakari

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
2022-06-03 10:32:26 +00:00
Andrew Lamb 3a33e806c7
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619)
* chore: Update datafusion deps

* chore: update arrow/parquet/arrow flight deps

* chore: Run cargo hakari tasks

* chore: Update location of utils

* chore: Update some more APIs

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-17 14:13:03 +00:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 0541c6e40f
fix: Remove data_types crate where it's no longer used 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) d7304c1114
fix: Move TimestampSummary to the only place it's used 2022-05-06 14:45:38 -04:00
Carol (Nichols || Goulding) 12793bffbf
fix: Move Partition Template types to data_types2 2022-05-06 14:45:36 -04:00
Andrew Lamb 02893e598c
chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516)
* chore: Tool for automating arrow version update

* chore: Update datafusion and arrow/parquet/arrow-flight

* fix: update for changes in Arrow API

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-05 00:21:02 +00:00
Andrew Lamb 73bed810da
chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357)
* chore: Update datafusion

* chore: Update arrow/arrow-flight/parquet to 12

* chore: update datafusion correctly

* chore: Update prost, tonic, and dependents

* fix: Fixup some api changes

* fix: Update test output in db

* fix: Update test output in parquet_file

* fix: remove old pbjson types

* fix: Add "--experimental_allow_proto3_optional" flag

* chore: Run cargo hakari tasks

* fix: compile error

* chore: Update heappy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-20 11:12:17 +00:00
Andrew Lamb 5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119)
* chore: update datafusion

* chore(deps): Bump arrow from 10.0.0 to 11.0.0

Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0

Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update parquet to 11.0.0

* fix: error on create schema, test for same

* fix: upgrade zstd

* chore: Run cargo hakari tasks

* fix: fix logical merge conflict

* fix: hakari

* fix: hakari

* fix: update newly introduced dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Andrew Lamb 2c3d30ca32
chore: Update datafusion, arrow, flight and parquet (#4000)
* chore: Update datafusion, arrow, flight and parquet

* fix: api change

* fix: fmt

* fix: update test metadata size

* fix: Update sizes in parquet test

* fix: more metadata size update
2022-03-10 12:24:47 +00:00
dependabot[bot] 3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826)
Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
Andrew Lamb a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733)
* chore: Update datafusion

* chore: Update arrow

* fix: missing updates

* chore: Update cargo.lock

* fix: update for smaller parquet size

* fix: update test for smaller parquet files

* test: ensure parquet_file tests write multiple row groups

* fix: update callsite

* fix: Update for tests

* fix: harkari

* fix: use IoxObjectStore::existing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
Andrew Lamb 2062267d0f
chore: Update hashbrown (#3551)
* chore: Update hashbrown

* fix: hakari

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:34:10 +00:00
Andrew Lamb 5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517)
* chore: Update datafusion

* chore: update to arrow 8

* fix: update to use new DataFusion APIs

* fix: update case for sortedness

* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Andrew Lamb dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455)
* chore: update datafusion, arrow, prost, tonic, etc

* fix: update pprof as well

* chore: update hakari

* fix: update pbjson

* chore: update heappy

* fix: hakari

* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Marco Neumann f3f6f335a9
chore: upgrade to snafu 0.7 (#3440) 2022-01-11 19:22:36 +00:00
Carol (Nichols || Goulding) 39862e3886
fix: Disable default chrono features
Connects to #3117. Won't affect the workspace-hack crate until changes
to some other crates can be upstreamed.
2021-12-06 09:36:49 -05:00
Carol (Nichols || Goulding) 9fd4a560f5
feat: Results of running cargo hakari manage-deps 2021-11-19 09:21:57 -05:00
Raphael Taylor-Davies 6f268f8260
refactor: extract DML types (#2731) (#3084)
* refactor: extract DML types (#2731)

* chore: fmt
2021-11-11 12:34:07 +00:00
Raphael Taylor-Davies 6ceab054ab
refactor: move DbWrite to mutable_batch (#2986)
* refactor: move DbWrite to mutable_batch

* chore: fix doc

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-29 15:13:05 +00:00
Raphael Taylor-Davies 8a2410e161
feat: mutable batch write entry (#2724) (#2973)
* feat: mutable batch write entry (#2724)

* chore: lint

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-28 20:15:28 +00:00
Marco Neumann bc7244c48e chore: use Rust edition 2021 2021-10-25 10:58:20 +02:00
Raphael Taylor-Davies 297f995a44
feat: partition MutableBatch rows (#2926)
* feat: partition MutableBatch rows

* chore: review feedback

* chore: review feedback

* chore: fmt
2021-10-21 12:30:13 +00:00
Raphael Taylor-Davies 272e43492a
feat: mutable batch filtering (#2920)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-21 12:06:26 +00:00
Raphael Taylor-Davies ce0127a1f7
feat: MutableBatch write API (#2090) (#2724) (#2882)
* feat: MutableBatch write API (#2090) (#2724)

* chore: fix lint

* fix: handle dictionaries with unused mappings

* chore: review feedback

* chore: further review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-20 08:44:14 +00:00
Andrew Lamb a82dc6f5f0
chore: Update datafusion + arrow (#2903)
* chore: Update datafusion to latest, arrow to 6.0.0

* fix: Update tests

* fix: bubble internal error

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-19 17:14:08 +00:00
Raphael Taylor-Davies bdd6d67e7a
refactor: split out mutable_batch crate (#2841)
* refactor: split out mutable_batch crate

* refactor: restore chunk module for better diffs

* chore: fmt

* chore: review feedback

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-14 16:46:54 +00:00