Commit Graph

3297 Commits (f4693e36c065c065599141d12d452b73a4674321)

Author SHA1 Message Date
Marco Neumann f4693e36c0 refactor: `catalog_checkpoint_interval` => `catalog_transactions_until_checkpoint` 2021-06-14 10:34:32 +02:00
Marco Neumann 2eb2aca091 fix: fix discrepancy of ckpting config over CLI and protobuf 2021-06-14 10:27:47 +02:00
Marco Neumann 88ec1ef0cf test: enable checkpointing in catalog benchmark
This now creates a checkpoint every 10 transactions. To make it a bit
more fair increase the chunk count to 109, so we have some transactions
after the last checkpoint. With that we improve performance from 10.5s
to 1.2s (or even 0.3s if we would keep the chunk count at 100).
2021-06-14 10:08:32 +02:00
Marco Neumann 2e6f51cbfb fix: fix `server_benchmarks::benches::catalog_persistence` 2021-06-14 10:08:32 +02:00
Marco Neumann 898c638630 feat: wire up catalog checkpointing
Closes #1381.
2021-06-14 10:08:32 +02:00
Marco Neumann df866f72e0 refactor: store parquet metadata in chunk
This will be useful for #1381.

At the moment we parse schema and stats eagerly and store them alongside
the parquet metadata in memory. Technically this is not required since
this is basically duplicate data. In the future we might trade-off some
of this memory against CPU consumption by parsing schema and stats on
demand.
2021-06-14 10:08:31 +02:00
Marco Neumann e6699ff15a test: ensure that `find_last_transaction_timestamp` considers checkpoints 2021-06-14 10:04:50 +02:00
Marco Neumann eae73591f3 feat: add `catalog_checkpoint_interval` lifecycle config 2021-06-14 10:04:50 +02:00
Nga Tran 11729b9aa7
test: select non-key from 2 chunks with different key/tag sets (#1703)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-11 18:52:36 +00:00
kodiakhq[bot] cf523e23b9
Merge pull request #1700 from influxdata/er/refactor/rb_chunk
refactor: export Read Buffer Chunk as RBChunk
2021-06-11 17:57:50 +00:00
Edd Robinson ff19beb0ad refactor: export rb chunk as RBChunk 2021-06-11 18:33:10 +01:00
kodiakhq[bot] d7428f568f
Merge pull request #1681 from influxdata/layeredtracing
feat: Implement LayeredTracing
2021-06-11 14:19:53 +00:00
kodiakhq[bot] a8759c8b7e
Merge branch 'main' into layeredtracing 2021-06-11 14:15:03 +00:00
kodiakhq[bot] 80db086426
Merge pull request #1693 from influxdata/ntran/dedupe_final_union
feat: add UnionExec on top of the scan activities
2021-06-11 13:50:43 +00:00
Nga Tran 736cf1ff6f
Merge branch 'main' into ntran/dedupe_final_union 2021-06-11 09:45:54 -04:00
Nga Tran 7dd0416960 refactor: address review comments 2021-06-11 09:43:39 -04:00
Andrew Lamb 4224b693d9
refactor: combine preservation.rs and persistence.rs (#1692)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-11 11:33:14 +00:00
Nga Tran e34d157f28 fix: comments 2021-06-11 07:30:49 -04:00
kodiakhq[bot] ef76b350db
Merge pull request #1690 from influxdata/crepererum/inline_parquet_table_struct
refactor: inline `Table` into `parquet_file::chunk::Chunk`
2021-06-11 11:26:50 +00:00
kodiakhq[bot] 71e2a8fbaa
Merge branch 'main' into crepererum/inline_parquet_table_struct 2021-06-11 11:22:48 +00:00
Nga Tran ea9edef716 fix: testing option 2021-06-11 07:18:33 -04:00
Nga Tran fb639ee54f feat: add UnionExec on top of the scan activities 2021-06-11 07:06:08 -04:00
kodiakhq[bot] 42d5076772
Merge pull request #1688 from influxdata/fixjaeger
fix: Expose jaeger knobs and default max-packet-size to something that works everywhere
2021-06-11 11:02:39 +00:00
kodiakhq[bot] 95cb5bce8b
Merge branch 'main' into fixjaeger 2021-06-11 10:58:17 +00:00
Andrew Lamb 0cbe74dbde
fix: persistence to parquet by swapping order of arguments (#1687)
* fix: fix order of arguments

* test: for persistence
2021-06-11 10:55:40 +00:00
Marco Neumann f8a518bbed refactor: inline `Table` into `parquet_file::chunk::Chunk`
Note that the resulting size estimations are different because we were
double-counting `Table`. `mem::size_of::<Self>()` is recursive for
non-boxed types since the child will be part of the parent structure.

Issue: #1295.
2021-06-11 11:54:31 +02:00
Marko Mikulicic 5a68abaa53
feat: Implement LayeredTracing
__Rationale__

We currently use the `tracing` framework to output to both log outputs (e.g. stdout for k8s) and distributed tracing collectors (e.g. opentelemetry jaeger).

However, due to a limitation in the `tracing` SDK, we can only have one "filter" level that applies
to both logs and tracing outputs. This is unpractical because tracing collectors are designed
to receive high verbosity data (which will be then sampled within the opentelemetry library),
while logs generally are limited to the DEBUG level on production.

This PR adds a `FilteredLayer` tracing subscriber layer, that wraps a subscriber layer with a independent
filter, which can filter events goint to the wrapper subscriber layer more agressively than the global layer.

This will allow us to emit logs at INFO or DEBUG level while passing all events to opentelemetry at TRACE
level (and opentelemetry SDK will then sample the events so that only a small part will be sent to the
ot collector)

__Note__

This PR just implements the `FilteredLayer` and a test. Another PR will integrate this with
our log/tracing setup code.
2021-06-11 04:32:47 +02:00
Marko Mikulicic 369c2237f6
fix: Expose jaeger knobs and default max-packet-size to something that works everywhere
`--traces-exporter-jaeger-max-packet-size` is important also when you run the jaeger collector
on "localhost" by running `docker run jaegertracing/all-in-one ....` which on mac doesn't really
work on the real localhost but has a few hops between tunneling interfaces, so you'd get mysteriously
dropped packets that can easily drive you to doubt your own sanity on an otherwise calm Thursday evening.
2021-06-11 01:42:26 +02:00
Andrew Lamb 13dd4b23fd
fix: make pruning debug log less confusing (#1684) 2021-06-10 18:35:04 +00:00
kodiakhq[bot] 1c7b13f0a5
Merge pull request #1676 from influxdata/ntran/dedup_merge_exec
feat: hook SortPreservingMergeExec into deduplication framework
2021-06-10 17:18:00 +00:00
kodiakhq[bot] 16b268402e
Merge branch 'main' into ntran/dedup_merge_exec 2021-06-10 17:13:49 +00:00
Nga Tran 46d4ab1f2a refactor: address review comments 2021-06-10 13:13:02 -04:00
Raphael Taylor-Davies 11b25b3aaf
refactor: swap order of partition and table in in-memory catalog (#1678)
* refactor: swap order of partition and table in in-memory catalog

* chore: review feedback

* chore: validate panic message

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 16:40:30 +00:00
Marco Neumann 13bb290a7c
chore: enforce `clippy::future_not_send` for `server` + top-level crate (#1679)
* chore: enforce `clippy::future_not_send` for `server`

* chore: enforce `clippy::future_not_send` for top-level crate
2021-06-10 15:01:12 +00:00
kodiakhq[bot] ab92c0c321
Merge pull request #1668 from influxdata/crepererum/issue1381
feat: catalog checkpointing infrastructure
2021-06-10 14:05:46 +00:00
Marco Neumann 28d1dc4da1 chore: bump preserved catalog version 2021-06-10 16:01:13 +02:00
Marco Neumann 80ee36cd1a refactor: slightly streamline path parsing code in pres. catalog 2021-06-10 15:59:28 +02:00
Marco Neumann 7e7332c9ce refactor: make comparison a bit less confusing 2021-06-10 15:42:21 +02:00
Marco Neumann fd581e2ec9 docs: fix confusion wording in `CatalogState::files` 2021-06-10 15:42:21 +02:00
Marco Neumann edbaaedfc3 docs: clarify behavior of unspecified transaction encoding 2021-06-10 15:42:21 +02:00
Marco Neumann be9b3a4853 fix: protobuf lint fixes 2021-06-10 15:42:21 +02:00
Marco Neumann 294c304491 feat: impl catalog checkpointing infrastructure
This implements a way to add checkpoints to the preserved catalog and
speed up replay.

Note: This leaves the "hook it up into the actual DB" for a future PR.

Issue: #1381.
2021-06-10 15:42:21 +02:00
Marco Neumann 188cacec54 refactor: use `Arc` to pass `ParquetFileMetaData`
This will be handy when the catalog state must be able to return
metadata objects so that we can create checkpoints, esp. when we use
multi-chunk parquet files in some midterm future.
2021-06-10 15:42:21 +02:00
Marco Neumann c7412740e4 refactor: prepare to read and write multiple file types for catalog
Prepares #1381.
2021-06-10 15:42:21 +02:00
Marco Neumann 33e364ed78 feat: add encoding info to transaction protobuf
This should help with #1381.
2021-06-10 15:42:21 +02:00
Marco Neumann 876f642860 test: add benchmark for catalog loading 2021-06-10 15:42:21 +02:00
Andrew Lamb 29dadba4f3
chore: update dependencies (#1673)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 13:01:31 +00:00
kodiakhq[bot] fd8fbeb89d
Merge pull request #1677 from influxdata/crepererum/clippy_future_not_send_part2
chore: enforce `clippy::future_not_send` for `influxdb_iox_client,influxdb2_client,query,tracker`
2021-06-10 12:54:47 +00:00
Marco Neumann 7bacef6835 chore: enforce `clippy::future_not_send` for `tracker` 2021-06-10 09:52:57 +02:00
Marco Neumann 7b1106ff64 chore: enforce `clippy::future_not_send` for `query` 2021-06-10 09:48:35 +02:00