Commit Graph

536 Commits (f7ff87758200ac3532b113f9a211eae6cbd79292)

Author SHA1 Message Date
二手掉包工程师 4b47d723b1
refactor: Rename time to iox_time (#4416)
Signed-off-by: hi-rustin <rustin.liu@gmail.com>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 00:19:59 +00:00
Marco Neumann 86e8f05ed1
fix: make all catalog IDs 64bit (#4418)
Closes #4365.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 16:49:34 +00:00
Nga Tran d963110842
feat: group chunk overlaps based on time range only (#4389)
* feat: overlap for NG querier

* chore: cleanup

* refactor: address review comments

* fix: typo

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 13:32:07 +00:00
Andrew Lamb 73bed810da
chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357)
* chore: Update datafusion

* chore: Update arrow/arrow-flight/parquet to 12

* chore: update datafusion correctly

* chore: Update prost, tonic, and dependents

* fix: Fixup some api changes

* fix: Update test output in db

* fix: Update test output in parquet_file

* fix: remove old pbjson types

* fix: Add "--experimental_allow_proto3_optional" flag

* chore: Run cargo hakari tasks

* fix: compile error

* chore: Update heappy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-20 11:12:17 +00:00
Carol (Nichols || Goulding) 94dcde4996
fix: Do fewer queries for metadata
By adding another _with_metadata catalog function. Also introduce a new
type rather than passing around tuples everywhere.
2022-04-13 10:43:20 -04:00
Carol (Nichols || Goulding) 02fee3b84f
feat: Request parquet metadata from the catalog when needed only 2022-04-13 10:43:19 -04:00
Dom Dwyer 6131381b8d refactor: extra debug in compactor
Continues pushing more debug through the compaction processing loop.
2022-04-08 11:20:19 +01:00
Dom Dwyer 3706ac042d refactor: add debug in compaction path
Adds debug!() and friends through the compaction path.
2022-04-07 17:13:45 +01:00
dependabot[bot] 438e739344
chore(deps): Bump parquet from 11.0.0 to 11.1.0 (#4240)
* chore(deps): Bump parquet from 11.0.0 to 11.1.0

Bumps [parquet](https://github.com/apache/arrow-rs) from 11.0.0 to 11.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/11.0.0...11.1.0)

---
updated-dependencies:
- dependency-name: parquet
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: Update tests

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-04-06 14:51:01 +00:00
Nga Tran ddc2c8304f
fix: have the compaction level set correctly (#4184)
* fix: have the compaction level set correctly, especially for compacted file from the compactor

* fix: typo
2022-03-30 21:23:40 +00:00
Marco Neumann 20bbb88dc5
refactor: remove table name from `TableSummary` (#4170)
This allows us to remove the table name from the low-level chunk
representations (like `ParquetFile`, RUB, ...) since table names are
already tracked by the higher-level data structures (e.g. catalog,
catalog chunk) that manage the low-level chunk representations.

This is similar to #4167.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 13:24:00 +00:00
Marco Neumann 036626a576
refactor: remove partition key from `ParquetChunk` (#4167)
The parquet chunk is always wrapped into some higher-level data
structure (e.g. a catalog chunk, a partition, ...) that knows exactly
"where" the chunk is located. There is no need for the parquet chunk to
back-reference container-level attributes. In the contrary:
double-bookkeeping makes the code more complex and costs additional
memory.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 09:24:56 +00:00
Marco Neumann 2b76c31157
refactor: make statistics null counts optional (#4160)
Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).

Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
2022-03-29 17:47:57 +00:00
Carol (Nichols || Goulding) f3f792fd08
feat: Add namespace_id to the parquet_files table; object store paths need it 2022-03-29 08:15:26 -04:00
Andrew Lamb 5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119)
* chore: update datafusion

* chore(deps): Bump arrow from 10.0.0 to 11.0.0

Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0

Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update parquet to 11.0.0

* fix: error on create schema, test for same

* fix: upgrade zstd

* chore: Run cargo hakari tasks

* fix: fix logical merge conflict

* fix: hakari

* fix: hakari

* fix: update newly introduced dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Marco Neumann 51da6dd7fa
feat: store sort key in NG metadata (#4110)
The sort key is optional and currently only produced by `iox_tests`.
Writing it within the ingester/compactor is tracked by #3968. The sort
key is read by the querier (and this will be verified by the query tests
and is required to merge #4103).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:24:46 +00:00
Dom Dwyer 1d5066c421 refactor: rename ObjectStore -> ObjectStoreImpl
Frees up the name for so we can use `dyn ObjectStore` throughout the
code instead of `ObjectStoreApi`.
2022-03-15 16:29:43 +00:00
Carol (Nichols || Goulding) ecd06c6ec3
fix: ParquetFileRepo create should be responsible for setting INITIAL_COMPACTION_LEVEL
When created in the catalog, parquet files should always have compaction
level 0. Updating the compaction level should always happen in the
compactor.

Only the catalog should need to know about the initial compaction level
value.
2022-03-10 13:51:18 -05:00
Carol (Nichols || Goulding) ff31407dce
refactor: Extract a ParquetFileParams type for create
This has the advantages of:

- Not needing to create fake parquet file IDs or fake deleted_at
  values that aren't used by create before insertion
- Not needing too many arguments for create
- Naming the arguments so it's easier to see what value is what
  argument, especially in tests
- Easier to reuse arguments or parts of arguments by using copies of
  params, which makes it easier to see differences, especially in tests
2022-03-10 13:51:18 -05:00
Paul Dix 27999ff72f
feat: add compaction_level and created_at to parquet_file (#3972) 2022-03-10 15:56:57 +00:00
Andrew Lamb 2c3d30ca32
chore: Update datafusion, arrow, flight and parquet (#4000)
* chore: Update datafusion, arrow, flight and parquet

* fix: api change

* fix: fmt

* fix: update test metadata size

* fix: Update sizes in parquet test

* fix: more metadata size update
2022-03-10 12:24:47 +00:00
Nga Tran c6cab3538f
refactor: move parquet chunk's new and decode to parquet_file crate (#3987) 2022-03-08 22:04:32 +00:00
Andrew Lamb e09f39d6a0
chore: Update datafusion (#3943)
* chore: Update datafusion

* refactor: update for new datafusion

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-04 19:37:46 +00:00
Andrew Lamb 677a272095
refactor: Clean up some future clippy warnings from nightly (#3892)
* refactor: clean up new clippy lints

* refactor: complete other cleanups

* fix: ignore overzealous clippy

* fix: re-remove old code
2022-03-03 19:14:27 +00:00
Carol (Nichols || Goulding) 8f3e44bf76
refactor: Extract a crate for shared data types in the new design 2022-03-02 12:16:15 -05:00
Marco Neumann 33851be3a5
chore: upgrade Rust to 1.59 (#3875)
Mostly a few new clippy crates around `flat_map`, `and_then`, and
"underscore locks" (!!!):
https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies 2a842fbb1a
feat: correctly sort data and store in catalog metadata (#3864)
* feat: respect sort order in ChunkTableProvider (#3214)

feat: persist sort order in catalog (#3845)

refactor: owned SortKey (#3845)

* fix: size tests

* refactor: immutable SortKey

* test: test sort order restart (#3845)

* chore: explicit None for sort key

* chore: test cleanup

* fix: handling of sort keys containing fields

* chore: remove unused selected_sort_key

* chore: more docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 17:56:27 +00:00
Marco Neumann f966f4c7a4
feat: create `ParquetChunk` in querier (#3857)
Adds a small adapter that is able to produce `ParquetChunk`s for NG.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 08:54:16 +00:00
Marco Neumann 49d1be30e7
feat: wire up `ParquetFilePath` for NG (#3853)
It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk`
in the new architecture, we somehow need it to accept new-gen paths.
Also path handling should be somewhat centralized since
ingester/compactor/querier all need to construct them. So having a
`ParquetFilePath` that supports both path styles seems to be a
not-to-bad solution. This should obviously be cleaned up in some
not-to-distant future.
2022-02-24 16:05:38 +00:00
Carol (Nichols || Goulding) 252ced7adf
feat: Add row count to the parquet_file record in the catalog (#3847)
Fixes #3842.
2022-02-24 15:20:50 +00:00
Marco Neumann d62a052394
feat: extend catalog so we can recover `ParquetChunk`s from it (#3852)
* refactor: less parquet data copying

* feat: `PartitionRepo::get_by_id`

* feat: `TableRepo::get_by_id`

* feat: `ParquetFile::file_size_bytes`

* feat: `ParquetFile::parquet_metadata`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 13:16:15 +00:00
dependabot[bot] b63f920d4c
chore(deps): Bump parquet from 9.0.2 to 9.1.0 (#3828)
* chore(deps): Bump parquet from 9.0.2 to 9.1.0

Bumps [parquet](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: parquet
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update chunk size test

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:25:15 +00:00
dependabot[bot] 3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826)
Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot] ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814)
* chore(deps): Bump tokio from 1.16.1 to 1.17.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build: update workspace-hack

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Andrew Lamb a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733)
* chore: Update datafusion

* chore: Update arrow

* fix: missing updates

* chore: Update cargo.lock

* fix: update for smaller parquet size

* fix: update test for smaller parquet files

* test: ensure parquet_file tests write multiple row groups

* fix: update callsite

* fix: Update for tests

* fix: harkari

* fix: use IoxObjectStore::existing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
Carol (Nichols || Goulding) 73828323ac
feat: Ingester Flight gRPC API (#3623)
* feat: Add a way to run ingester with an in-memory catalog from the CLI

If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.

Planning on using this in tests, so not documenting.

* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC

Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.

* feat: Add a flight API (handshake only) to ingester

* fix: Create partitions if using file-based write buffer

* fix: Change the server fixture to handle ingester server type

For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.

* feat: Start implementing ingester do_get, namely decoding the query

Skip serialization of the predicate for the moment.

* refactor: Rename ingest protos to ingester to match crate name

* refactor: Rename QueryResults to QueryData

* feat: Move ingester flight client to new querier crate

* fix: Off by one error, different starting indexes in sequencers

* fix: Create new CLI argument to pick the catalog type

* fix: Create a CLI option to set the number of topics to auto-create in the write buffer

* fix: Check the arrow flight service's health to tell that the ingester gRPC is up

* fix: Set postgres as the default catalog type

* fix: Return an error rather than panicking if CLI args aren't right
2022-02-09 19:07:44 +00:00
Carol (Nichols || Goulding) 2e30483f1f
refactor: Remove predicate module from predicate crate (#3648)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00
Nga Tran 17fbeaaade
feat: insert the persisted info into the catalog in one transaction (#3636)
* feat: add ProcessedTombstoneRepo

* feat: add function add_parquet_file_with_tombstones

* fix: remove unecessary use

* feat: handling transaction when adding parquet file and its processed tombstones

* feat: tests update catalog for parquet file and processed tombstones

* fix: make add parquet file & its processed tombstones fully transactional

* chore: cleanup

* test: add integration tests for new catalog update functions

* chore: remove catalog_update.rs

* chore: cleanup

* fix: assert the right values

* fix: create unique namespace

* fix: support non transaction create_many

* test: remove tests that do not work in a transaction

* fix: one more case with unique namespace

* chore: more verification around for better understanding why certain tests fail

* fix: compare difference rather than absolute becasue the DB already has data

* fix: fix the argument provided to SQL

* fix: return non-empty processed tombstones

* fix: insert the right parquet file

* chore: remove unsed file

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:44:15 +00:00
Carol (Nichols || Goulding) 62a2ad289b
feat: Implement deserializing IoxMetadata from protobuf (#3589)
Fixes #3587.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 16:05:21 +00:00
Marco Neumann 22778a3a80
chore: upgrade rskafka and parking_lot (#3592) 2022-02-01 11:50:42 +00:00
Carol (Nichols || Goulding) 093d5acfd4
fix: Unify temporary multiple definitions of IoxMetadata 2022-01-31 10:48:29 -05:00
Carol (Nichols || Goulding) 8f81ce5501
refactor: Share parquet_file::storage code between new and old metadata 2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding) bf89162fa5
refactor: Move IoxMetadata to parquet_file 2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding) 0f72a881ef
refactor: Rename Rust struct parquet_file::IoxMetadata to be IoxMetadataOld 2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding) 1b298bb5bd
refactor: Alias the old proto definitions to make clearer the new ones coming in 2022-01-31 10:36:33 -05:00
Dom 32d7c4cbfe
refactor: remove InfluxColumnType::IOx (#3565)
* refactor: remove InfluxColumnType::IOx

Remove unused column variant - see #3554 for context.

* refactor: reserve SEMANTIC_TYPE_IOX name in proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 21:15:36 +00:00
Andrew Lamb 5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517)
* chore: Update datafusion

* chore: update to arrow 8

* fix: update to use new DataFusion APIs

* fix: update case for sortedness

* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Andrew Lamb dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455)
* chore: update datafusion, arrow, prost, tonic, etc

* fix: update pprof as well

* chore: update hakari

* fix: update pbjson

* chore: update heappy

* fix: hakari

* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Andrew Lamb cdf5c21cd4
fix: Fix max timestamp value comparison in chunk metadata (#3453)
* fix: Fix max timestamp value comparison in chunk metadata

* refactor: rename contains to overlaps

Co-authored-by: Edd Robinson <me@edd.io>
2022-01-13 16:58:30 +00:00
Raphael Taylor-Davies c5cf03511c
fix: parquet column count statistics (#2124) (#3444)
* fix: parquet metadata total_count (#2124)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-11 21:56:24 +00:00
Marco Neumann f3f6f335a9
chore: upgrade to snafu 0.7 (#3440) 2022-01-11 19:22:36 +00:00
Marco Neumann 37bb7f2120 chore: `cargo update`
dependabot currently doesn't work due to
https://github.com/dependabot/dependabot-core/issues/4574

Excluded `quote` due to
https://github.com/dtolnay/quote/issues/204
2022-01-11 14:57:51 +01:00
Nga Tran ec8644a39a refactor: return clearer error message 2021-12-07 12:24:28 -05:00
Nga Tran 561c5ed8e7 refactor: make checking no data happen during reading inout stream 2021-12-07 12:03:41 -05:00
Nga Tran c992c82582 chore: Merge branch 'main' into ntran/compact_os_tests 2021-12-07 11:08:12 -05:00
Raphael Taylor-Davies 5fdaa5b4ab
chore: don't panic with invalid parquet (#3309)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-06 21:15:35 +00:00
Carol (Nichols || Goulding) 7499eac067
fix: Disable uuid serde feature; we're not actually serializing any UUIDs
Connects to #3117.
2021-12-06 09:37:31 -05:00
Carol (Nichols || Goulding) 02c297e850
fix: Always specify the parking_lot feature of tokio to get potential perf boost 2021-12-06 09:37:15 -05:00
Carol (Nichols || Goulding) 0b24b3c227
fix: Use a consistent version specifier when depending on the futures crate 2021-12-06 09:37:12 -05:00
Raphael Taylor-Davies bca561366b
feat: don't copy parquet files out of disk object store (#3282) (#3293)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-05 16:31:40 +00:00
Raphael Taylor-Davies 11067bfe3f
feat: simplify parquet reader (#3282) (#3291)
* feat: simplify parquet reader (#3282)

* chore: add back log line
2021-12-03 23:21:58 +00:00
Nga Tran 86f9fe0bcb refactor: no longer need to create and test no-row-groups parquet files 2021-12-03 15:14:04 -05:00
Nga Tran 152281e428 fix: Capture the right 'no data' while parquet has no data 2021-12-03 12:19:48 -05:00
kodiakhq[bot] 2857b6a990
Merge branch 'main' into er/feat/load_chunk_cli 2021-12-02 20:20:56 +00:00
Edd Robinson b4ea9887ba refactor: error name 2021-12-02 20:14:02 +00:00
Carol (Nichols || Goulding) 5d0fd1c603
fix: Allow dead code on fields that are now detected as never read 2021-12-02 11:52:01 -05:00
Edd Robinson 88aedc556e feat: add FromStr implementation 2021-12-02 12:59:52 +00:00
Nga Tran bf74608dc8 docs: not persist of the input stream is empty 2021-12-01 17:53:19 -05:00
Nga Tran f085af034e
refactor: not persist empty chunk resulting from deleting & deduplicating (#3274)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-01 20:57:30 +00:00
Nga Tran f53cdca010 feat: handling empty compacted stream 2021-11-30 18:13:36 -05:00
Raphael Taylor-Davies 197634ed50
feat: reload chunk back into read buffer (#3209) (#3216)
* feat: reload chunk back into read buffer (#3209)

* chore: fix logical conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-29 11:34:55 +00:00
kodiakhq[bot] d16a7759ca
Merge branch 'main' into cn/workspace-hack 2021-11-22 17:05:31 +00:00
Raphael Taylor-Davies 73d60539ad
refactor: use ChunkGenerator in parquet_catalog (#2209) (#3167)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-22 10:29:33 +00:00
Carol (Nichols || Goulding) 9fd4a560f5
feat: Results of running cargo hakari manage-deps 2021-11-19 09:21:57 -05:00
Raphael Taylor-Davies ca4e0ad13b
refactor: add parquet chunk generator (#2209) (#3163)
* refactor: add parquet chunk generator (#2209)

* fix: tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-19 12:35:18 +00:00
Carol (Nichols || Goulding) c8d80e5c28
fix: Change database paths to be under /dbs/ instead of under /[server id]/ 2021-11-05 10:14:06 -04:00
Andrew Lamb 1902c4f8a9
chore: Update DataFusion (#3012)
* chore: Update DataFusion

* fix: restore Cargo.log

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-02 18:06:21 +00:00
Marco Neumann 4c9570b519 refactor: move `catalog` protobuf to `preserved_catalog`
This makes it clearer what's going since the contained messages are
only for the preserved part, not the in-mem catalog and its management.
2021-11-01 18:07:25 +01:00
dependabot[bot] c540b40f05
chore(deps): bump tokio from 1.12.0 to 1.13.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.12.0...tokio-1.13.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-01 11:21:59 +00:00
Carol (Nichols || Goulding) 990f768cda
fix: Assign a UUID when creating a database 2021-10-28 13:20:28 -04:00
Carol (Nichols || Goulding) 8198c1ff2a
refactor: Rename IoxObjectStore constructors to better match what server does with Databases 2021-10-28 13:20:27 -04:00
Marco Neumann bc7244c48e chore: use Rust edition 2021 2021-10-25 10:58:20 +02:00
Andrew Lamb a82dc6f5f0
chore: Update datafusion + arrow (#2903)
* chore: Update datafusion to latest, arrow to 6.0.0

* fix: Update tests

* fix: bubble internal error

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-19 17:14:08 +00:00
Marco Neumann d8f35d8ee9 chore: remove unused `parquet_file` => `chrono` dep 2021-10-19 14:45:56 +02:00
Marco Neumann 28195b9c0c chore: new `parquet_catalog` crate 2021-10-14 14:34:59 +02:00
Andrew Lamb 0568452a0c
chore: Update datafusion (#2838)
* chore: update datafusion version

* refactor: Update to use new datafusion apis

* fix: do not upgrade other packages
2021-10-13 20:51:19 +00:00
Marco Neumann 1523e0edcd refactor: clean up preserved catalog interface
1. Remove `new_empty` logic. It's a leftover from the time when the
   `PreservedCatalog` owned the in-memory catalog.
2. Make `db_name` a part of the `PreservedCatalogConfig`.
2021-10-13 13:58:11 +02:00
Raphael Taylor-Davies 8414e6edbb
feat: migrate preserved catalog to TimeProvider (#2722) (#2808)
* feat: migrate preserved catalog to TimeProvider (#2722)

* fix: deterministic catalog prune tests

* fix: failing test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-12 14:43:05 +00:00
Raphael Taylor-Davies 3dfe400e6b
feat: migrate write path to TimeProvider (#2722) (#2807) 2021-10-12 12:09:08 +00:00
Raphael Taylor-Davies b39e01f7ba
feat: migrate PersistenceWindows to TimeProvider (#2722) (#2798) 2021-10-11 20:40:00 +00:00
Raphael Taylor-Davies 06c2c23322
refactor: create PreservedCatalogConfig struct (#2793)
* refactor: create PreservedCatalogConfig struct

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-11 15:43:05 +00:00
Carol (Nichols || Goulding) 5da2f7b1b0
Merge branch 'main' into cn/less-database-name 2021-10-11 10:35:42 -04:00
Raphael Taylor-Davies afe34751e7
refactor: split out schema crate (#2781)
* refactor: split out schema crate

* chore: fix doc
2021-10-11 09:45:08 +00:00
Carol (Nichols || Goulding) 8407735e00 fix: Pass the database name into PreservedCatalog 2021-10-08 15:25:10 -04:00
Carol (Nichols || Goulding) 276aef69c9 refactor: Move PreservedCatalog test helper functions to test helpers and use them more 2021-10-08 15:25:10 -04:00
Carol (Nichols || Goulding) 3aff4fcb07 refactor: Extract test helper functions for common catalog operations
This will make the next change easier, and I think it makes the tests
easier to read.
2021-10-08 15:25:10 -04:00
kodiakhq[bot] 559a7e0221
Merge branch 'main' into cn/chunk-addr-smaller 2021-10-08 17:26:20 +00:00
Carol (Nichols || Goulding) fbe76935f4 fix: Remove some calls to iox_object_store.database_name 2021-10-08 09:50:14 -04:00
Marco Neumann 64bda1fc08 feat: improve `Debug`/`Display` for test `ChunkId`s 2021-10-08 13:55:56 +02:00
Marco Neumann d3de6bb6e4 refactor: `max_persisted_timestamp` => `flush_timestamp`
There might be data left before this timestamp that wasn't persisted
(e.g. incoming data while the persistence was running).
2021-10-08 12:36:23 +02:00
Marco Neumann 63a932fa37 refactor: "min unpersisted ts" => "max persisted ts"
Store the "maximum persisted timestamp" instead of the "minimum
unpersisted timestamp". This avoids the need to calculate the next
timestamp from the current one (which was done via "max TS + 1ns").

The old calculation was prone to overflow panics. Since the
timestamps in this calculation originate from user-provided data (and
not the wall clock), this was an easy DoS vector that could be triggered
via the following line protocol:

```text
table_1 foo=1 <i64::MAX>
```

which is

```text
table_1 foo=1 9223372036854775807
```

Bonus points: the timestamp persisted in the partition
checkpoints is now the very same that was used by the split query during
persistence. Consistence FTW!

Fixes #2225.
2021-10-08 11:52:49 +02:00
kodiakhq[bot] 7d6be3f500
Merge branch 'main' into crepererum/issue2748 2021-10-07 09:04:18 +00:00
Marco Neumann 63d74be490 refactor: make `ChunkId` a UUID 2021-10-07 10:23:27 +02:00
Marco Neumann 2a52fd90d9 fix: transaction pruning logic for "nothing to do" 2021-10-07 10:14:42 +02:00
kodiakhq[bot] d72a494198
Merge branch 'main' into crepererum/in_mem_expr_part5 2021-10-05 16:20:24 +00:00
Marco Neumann b8aa4c33ce refactor: use protobuf bytes for transaction UUIDs 2021-10-05 12:27:48 +02:00
Marco Neumann bb7a27e5ed refactor: use proper sets during delete predicate collection
We no longer need hacky pointer tricks to de-duplicate delete predicates
when collecting them for catalog checkpoints. This was once required
when the delete predicates didn't implement `Eq` and `Hash` but now it's
all way easier.
2021-10-05 10:37:34 +02:00
Marco Neumann 28ccf2a8c3 refactor: `TransactionHandle::delete_predicate` cannot fail 2021-10-05 09:41:46 +02:00
Marco Neumann 10c1a72402 refactor: remove unused fields from `DeletePredicate` 2021-10-05 09:29:24 +02:00
Marco Neumann 97881079e8 refactor: make `ChunkOrder` non-zero
This will make it easier to handle missing values.

Helps with #2633.
2021-10-04 17:49:12 +02:00
Marco Neumann 75ac6e8646 refactor: make `DeletePredicate::range` non-optional 2021-10-04 16:36:20 +02:00
Marco Neumann d1835a3eee fix: doc links 2021-10-04 16:36:20 +02:00
Marco Neumann 5a5a929b9e refactor: introduce `DeletePredicate`
`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):

- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint

Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
2021-10-04 16:36:20 +02:00
Edd Robinson e72f7e958c test: update expected results 2021-10-04 12:20:21 +01:00
Andrew Lamb 7316f3407a
fix: Reduce log noise when no files are deleted (#2671)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-30 08:55:30 +00:00
Carol (Nichols || Goulding) 92583aee82 fix: Remove streaming API since we're not streaming anyway 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) d05528bcfd refactor: Use s3_request for put requests
Which meant we also needed to change the byte stream to be a closure
that can generate a byte stream
2021-09-29 08:19:32 -04:00
Raphael Taylor-Davies 86cee568d5
feat: use upstream pbjson (#2650)
* feat: use upstream pbjson

* chore: fmt
2021-09-28 16:29:26 +00:00
kodiakhq[bot] b16e7ea91a
Merge branch 'main' into crepererum/issue2518c 2021-09-22 16:09:04 +00:00
Marco Neumann d7b697dfe9 chore: remove unused `object_store` => `tracker` dep 2021-09-22 11:13:40 +02:00
Marco Neumann 981ee0c6df refactor: accept unknown chunks in persisted delete predicates
Due to the timing of the "persist" lifecycle action and that delete
predicates might arrive at any time + the fact that we don't wanna hold
transaction locks for too long, we should accept delete predicates for
chunks that are currently "persisting" even though that lifecycle action
might fail.
2021-09-22 09:29:50 +02:00
Marco Neumann 6682178d6f feat: teach preserved catalog to handle delete predicates 2021-09-20 15:51:14 +02:00
Marco Neumann cef5aeee52 refactor: introduce `ChunkId` type 2021-09-20 13:10:41 +02:00
Marco Neumann acf698c366 fix: delete predicate sorting 2021-09-20 10:48:32 +02:00
Marco Neumann 0c5ba3786b refactor: rename closure to make syntax a bit clearer 2021-09-20 10:48:32 +02:00
Marco Neumann 4c4fd59724 docs: extend comment about (not) cleanup up delete predicates 2021-09-20 10:48:32 +02:00
Marco Neumann 492d991f49 feat: delete catalog pres. catalog <=> in-mem catalog API
First step towards #2518. Creates the Rust API to communicate delete
predicates between the preserved catalog and the in-memory catalog and
adds tests ensuring that the in-mem catalog produces the wanted errors
as well as correct checkpoints (similar to how this is done for the
parquet file tracking already).

**This does NOT contain the actual preservation!**
2021-09-20 10:48:32 +02:00
Marco Neumann 831e55d79e refactor: make error messages more precise 2021-09-20 09:42:55 +02:00
Marco Neumann 9c80d32af5 refactor: use normal google timestamps in parquet metadata again
We changed from Google timestamp (which use variable-sized integers) to
our own fixed-sized integer timestamps so that the size of the parquet
metadata does not depend on the timestamp. However with the introduction
of compression this is the case anyways (since slightly different
timestamps lead to different compression results) and we need now
derministic timestamps for tests. So there is now point in using our own
timestamp type. Switching back to the variable-sized type also shrinks
the post-compression results a bit.
2021-09-20 09:34:03 +02:00
Marco Neumann afc507ae14 feat: compress encoded parquet metadata
Depending on the number of columns, this should safe between 60% and
75%.
2021-09-20 09:33:18 +02:00
Marco Neumann 2820db5583 refactor: split preserved catalog `api` into `core` and `interface`
This makes it clearer which traits and functions users of the preserved
catalog must implement. This also splits the error types into smaller
enums that are easier to understand.

This change should make it easier to implement new functionality (like
capturing delete predicates).
2021-09-16 10:30:11 +02:00
Raphael Taylor-Davies c66095cad1
feat: remove metrics crate (#2552) 2021-09-15 19:43:33 +00:00
kodiakhq[bot] de732b4273
Merge branch 'main' into crepererum/parquet_file_wo_query 2021-09-15 07:15:19 +00:00
Marco Neumann 509c07330d refactor: decouple `parquet_file` from `query` 2021-09-14 18:26:16 +02:00
kodiakhq[bot] d60aa5940b
Merge branch 'main' into crepererum/chunk_order_type 2021-09-14 16:25:17 +00:00
Marco Neumann bfaba78dc3 refactor: move `predicate` into its own crate
Two reasons:

1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
   small follow-up PR).
2. `predicate` will have more and more features (like serialization)
   which justifies a new home
2021-09-14 17:13:02 +02:00
Marco Neumann becef1c75f refactor: introduce `ChunkOrder` type 2021-09-14 17:10:23 +02:00
Marco Neumann 1d8edd4683 fix: metadata size increased 2021-09-14 13:03:26 +02:00
Marco Neumann 45cb00d8c0 refactor: track chunk order in chunks 2021-09-14 13:00:55 +02:00
Marco Neumann 4769b67d14 feat: API-level code to prune old transaction from catalog 2021-09-14 10:26:38 +02:00
Marco Neumann f93984cd94 refactor: clarify wording
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-09-14 09:43:55 +02:00
Marco Neumann e7edb65b1d feat: show number of stripped bytes in catalog dump 2021-09-14 09:43:55 +02:00
Raphael Taylor-Davies 44918e4afc
feat: migrate chunk metrics (#2491)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-09 16:02:16 +00:00
Marco Neumann 4a863993ec feat: "dump catalog" debug CLI 2021-09-02 08:08:20 +02:00
Marco Neumann 581ee64049 feat: add functions to dump catalog data to text 2021-09-02 08:07:07 +02:00
Marco Neumann 06c941d798 refactor: split up `make_record_batch` 2021-09-01 11:26:05 +02:00
Marco Neumann 6ce586a2ac docs: add docstrings to `PreservedCatalog` members 2021-09-01 11:26:05 +02:00
Marco Neumann 70a5ffeae7 test: allow creation of deterministic chunks and transactions 2021-09-01 11:26:05 +02:00
Marco Neumann 06833110ab test: allow creation of less complex parquet chunks 2021-09-01 11:26:05 +02:00
Marco Neumann 27248850e5 refactor: use `byte::Bytes` for metadata in protobuf messages
That simplifies printing a bit since we `Vec<u8>` prints quite badly.
2021-09-01 11:26:05 +02:00
Marco Neumann a312f81bf2 refactor: move `storage_testing` to `storage::tests` 2021-08-27 15:59:59 +02:00
Marco Neumann a2efe3299d refactor: restructure catalog code in `parquet_file`
No functional change (except for slightly changing error messages). This
will make it easier to add more functionality.
2021-08-27 15:06:31 +02:00
Carol (Nichols || Goulding) 7ca177978e fix: Add missing await from a logical merge conflict 2021-08-26 09:27:16 -04:00
Carol (Nichols || Goulding) 18ba3b5c59 feat: Create database directories with a generation ID 2021-08-26 09:14:22 -04:00
Marco Neumann 026202a05c fix: correctly account for parquet metadata size
We need to hold the parquet metadata in memory so that we're able to
create catalog checkpoints. We used to do that by holding the decoded
structure (provided by the upstream `parquet` crate) in memory and
serializing that data on demand to Apache Thrift.

There are two drawbacks:

1. We did not account for the memory usage of the decoded structures (or
   at least not fully).
2. We actually don't need the decoded data in-memory, since for the
   checkpoint creation we only need to write the serialized data.

So this PR changes our wrapper so it holds the serialized data which is
then only decoded when it's really necessary. Since the serialized data
is a simple byte vector, we can also easily account for the size.

Note that this makes the accounted size of parquet chunks larger.
However this data was always there, we just ignored it up until now. If
the size of the parquet metadata really becomes an issue, we could trait
some CPU time for memory by compressing it.
2021-08-26 13:24:32 +02:00
Andrew Lamb 3ca0d5d42f
Merge branch 'main' into cn/bump 2021-08-19 14:08:49 -04:00
Raphael Taylor-Davies b0e8b75a8a fix: TestCatalogState unique chunk ID 2021-08-19 17:19:12 +01:00
Carol (Nichols || Goulding) 7246f2702a fix: Bump transaction version because of a change in the Parquet files 2021-08-19 09:32:37 -04:00
Raphael Taylor-Davies 5a841600d9 feat: make catalog state test deterministic (#2349) 2021-08-19 14:04:27 +01:00
Carol (Nichols || Goulding) 6390156c0e fix: Remove error types not used anywhere 2021-08-18 11:32:39 -04:00
Carol (Nichols || Goulding) ef0e1a3f60 refactor: Extract a transaction file path type 2021-08-18 11:32:39 -04:00
Carol (Nichols || Goulding) 6d5cb9c117 refactor: Extract a ParquetFilePath to handle paths to parquet files in a db's object store 2021-08-18 11:32:39 -04:00
Ning Sun c012e996ab
refactor: remove display methods, use fmt::Display instead. (#2272)
* refactor: remove display methods, use fmt::Display instead.

Signed-off-by: Ning Sun <sunng@protonmail.com>

* refactor: update a few calls from .display to .to_string()

* fix: consistently use `Path` rather than occasionally `DirsAndFileName`

* fix: fixup for merge conflicts

* fix: update test

* fix: Catch another case or two

* fix: fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-16 18:00:22 +00:00
Carol (Nichols || Goulding) 564238ad8c refactor: Organize uses 2021-08-12 15:05:32 -04:00
Carol (Nichols || Goulding) ae6b0e669b refactor: Extract a database persister type that wraps object store
Connects to #2193.
2021-08-12 15:05:32 -04:00
Carol (Nichols || Goulding) daa534ee32 refactor: Incorporate Path parsing into the TransactionFile type 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) ee3173efb1 refactor: Simplify implementation of parse_file_path 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) dbd1718fd2 refactor: Use the TransactionKey type 2021-08-12 09:06:14 -04:00
Carol (Nichols || Goulding) 7f7a911a9a refactor: Extract a TransactionFile type to manage transaction paths 2021-08-12 09:06:06 -04:00
Dom 3de6b44e23
build: use new rustdoc lint name (#2261)
* fix: nocache feature code rot

The MBChunk::snapshot code when using the "nocache" option no longer
compiles - this commit updates it to match the not(nocache) code.

* build: use updated broken_intra_doc_links name

The broken_intra_doc_links lint was renamed
rustdoc::broken_intra_doc_links

https://doc.rust-lang.org/rustdoc/lints.html
2021-08-11 19:48:51 +00:00
Marco Neumann 8721c5fcd6 fix: improve error messages 2021-08-09 10:54:23 +02:00
Marco Neumann 950286e5b7 feat: make replay planning work w/ unordered checkpoints 2021-08-09 10:54:23 +02:00
Andrew Lamb d41b44d312
feat: use zstd compression when writing parquet files (#2218)
* feat: use ZSTD when writing parquet files

* fix: test
2021-08-06 18:45:55 +00:00
Andrew Lamb e92e94caad
chore: Update deps (including arrow 5.1.0, tonic -> 0.5, and prost 0.5) (#2172)
* chore: Update deps (including arrow 5.0.0 --> arrow 5.1.0)

* chore: update all the things

* refactor: Update serving readiness check due to change in Tonic API

* chore: update more deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-08-05 15:57:38 +00:00
Andrew Lamb 1ccaa433e8
fix: Temporarily disable parquet predicate pushdown (#2164) 2021-07-30 20:24:30 +00:00
Carol (Nichols || Goulding) 9d15798288 fix: Address or allow Clippy warnings new with Rust 1.54 2021-07-30 09:59:59 -04:00
kodiakhq[bot] 545222303f
Merge branch 'main' into cn/cc-only 2021-07-29 17:18:16 +00:00
Carol (Nichols || Goulding) ad0a9549de fix: Avoid an unnecessary parsing of iox metadata
In one case where ParquetChunk::new was being called, the calling code
had just parsed the IoxMetadata too. In the other case, the calling code
had just *created* the IoxMetadata being parsed. In both cases, this
re-parsing wasn't actually needed; the two bits of info
ParquetChunk::new can be easily passed in.
2021-07-28 14:25:56 -04:00
Carol (Nichols || Goulding) af7866a638 refactor: Remove first/last write times from ParquetFile chunks 2021-07-28 14:12:36 -04:00
Marco Neumann 04e797c706 refactor: pass sequencer numbers directly to DB checkpoint
First of all using a partition checkpoint as some kind of intermediate
representation was kinda a hack because partition checkpoints should
only created for to-be-persisted partitions, not for the others.
API-wise it should only be possible to construct a partition checkpoint
from a flush handle.

Also we were only able to construct partition checkpoints for partitions
that had unpersisted data, otherwise there was no sane way to fill the
`min_unpersisted_timestamp`. We must however scan all partitions no
matter if there is unpersisted data so that we can determine the maximum
seen sequence numbers. This was caught by a replay test resulting in a
catalog state where the last database checkpoint had lower maximum seen
sequence numbers than some partition checkpoint, bailing out with an
error.

So overall it turns out that passing the sequencer numbers directly
instead of wrapping them into a partition checkpoint is the better
implementation.
2021-07-28 17:28:34 +02:00
Andrew Lamb 5fb3e00f2a
fix: Properly record total_count and null_count in statistics (#2103)
* fix: Properly record total_count and null_count in statistics

* fix: fix statistics calculation in mutable_buffer

* refactor: expose null counts in read_buffer

* refactor: expose null_count in parquet_file

* fix: update server crate tests

* fix: update query_tests tests

* docs: tweak comments

* refactor: Use storage_stats rather than adding `null_count`

* refactor: rename test data field for clarity

* fix: fixup merge conflicts

* refactor: rename initial_non_null_count to initial_total_count

* refactor: caculate null_count as row_count - to_add
2021-07-26 18:13:36 +00:00
Carol (Nichols || Goulding) 0acb0efbc9 fix: Bump METADATA and TRANSACTION versions 2021-07-26 10:52:42 -04:00
Jake Goulding d928bc84e6 feat: Thread time_of_{first,last}_write through Parquet metadata 2021-07-23 14:07:35 -04:00
Carol (Nichols || Goulding) 9604ce7084 fix: Don't pass table name around when it's only returned back
The read_statistics, read_statistics_from_parquet_row_group,
load_parquet_from_store, and load_parquet_from_store_for_chunk functions
weren't ever using table name, they just passed it around and passed it
back.
2021-07-23 13:48:16 -04:00
Carol (Nichols || Goulding) 3c794153dd refactor: Organize uses 2021-07-23 13:48:15 -04:00
kodiakhq[bot] 5b5453a020
Merge branch 'main' into pd/add-parquet-cache 2021-07-22 20:21:53 +00:00
Paul Dix 88e29dede9 chore: remove extraneous example code from parquet storage 2021-07-22 16:21:13 -04:00
Andrew Lamb 01c79f1a1a
fix: Print all timestamps using RFC3339 format (#2098)
* fix: Use IOx pretty printer rather than arrow pretty printer

* chore: update tests in the query crate

* chore: update influxdb_iox tests

* chore: Update end to end tests

* chore: update query_tests

* chore: update mutable_buffer tests

* refactor: update parquet_file tests

* refactor: update db tests

* chore: update kafka integration test output

* fix: merge conflict
2021-07-22 19:04:52 +00:00
Marco Neumann 50241bae9e refactor: do not abuse `uint64::MAX` as sentinal for `None` 2021-07-22 12:51:43 +02:00
Paul Dix d95b5df03e refactor: move cache to ObjectStore
Since the consumers of ObjectStore always use the concrete type rather than the ObjectStoreApi trait, it makes more sense to just change the concrete type to have a pointer to the cache. This removes the cache from the ObjectStoreApi trait and changes the ObjectStore to be a regular struct rather than a tuple around the ObjectStoreIntegration. Future work will have the server configure the cache on the ObjectStore struct when its options are set.
2021-07-21 18:27:56 -04:00
Paul Dix d0ea812041 feat: add skeleton for object store file cache 2021-07-21 18:27:56 -04:00
Marco Neumann 57a9d5ade0 refactor: correctly track "seen" ranges in persistence checkpoints
Now we can handle all these cases:

There are two partitions w/ a single write each:

1. A reads sequence number 1
2. B reads sequence number 2
3. we persist A which only knows the sequences up until 1
=> the DB checkpoint needs the global max, otherwise we forget sequences
   during replay (2 in this case, so B would be gone)

1. B reads sequence number 1
2. A reads sequence number 2
3. we persist A which (w/o this commit) would not track the sequencer at
   all in this checkpoint (since there is nothing to replay)
=> we MUST also remember that we already read up until 2, otherwise we'll
   re-read 2 after replay
=> the partition checkpoint needs the local seen max (no matter if there's
   something to to persist)
2021-07-21 19:19:49 +02:00
Marco Neumann a5fc1c7d38 fix: collect min AND max in database checkpoints
This is required to correctly handle the following case:

1. There are two partitions A and B w/ a single write each (from the same
   sequencer).
2. We persist A:
   - The partition checkpoint for A will be empty because after persistence
     there will be nothing to replay (the single write is persisted and
     we're ready).
   - The database checkpoint that contains the global minimum of all ranges
     recognizes that for the sequencer there is indeed something left (the
     minimum sequence number from B).
3. DB restart happens, replay starts
4. We scan all persisted files, figure out that we have a DB checkpoint
   with a sequence minimum but (w/o the change in this commit) there is no
   maximum. Only partition checkpoints contain maxima, and the only partition
   checkpoint that was persisted was the one for partition A and that one was
   empty (see above).
5. So now how do we recover partition B?
2021-07-21 14:48:29 +02:00
Andrew Lamb 4da8a16c18
chore: update to arrow 5.0 and master datafusion (#2049)
* chore: update to arrow 5.0 and master datafusion

* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Jake Goulding 42b56ad657 refactor: Use SNAFU's context instead of `ok_or_else` 2021-07-16 09:59:54 -04:00
Jake Goulding 939d15a21f perf: Avoid clone when an error doesn't occur 2021-07-16 09:59:54 -04:00
Marco Neumann f57ba6afdb
fix: use fixed-size timestamps for parquet metadata (#2032)
This fixes flaky tests that rely on predictable files sizes.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-16 13:14:02 +00:00
Andrew Lamb 0c86d1dccf
feat: Record parquet bytes size in catalog / parquet_file (#2006)
* feat: Store object store size in parquet_file

* fix: update TRANSACTION_VERSION to 8

* refactor: rename os_bytes --> file_size_bytes
2021-07-15 12:07:11 +00:00
Marco Neumann 40047a76bc refactor: `remove_parquet` cannot fail 2021-07-15 12:07:56 +02:00
Raphael Taylor-Davies 1d00fa2fd8
refactor: track memory metrics in catalog (#1995)
* refactor: track memory metrics in catalog

* chore: update comment
2021-07-14 16:23:00 +00:00