Commit Graph

450 Commits (32e5f8c7154e31bdb30faa301973601ca8a242b2)

Author SHA1 Message Date
Nga Tran 40a5d7d4ba chore: Merge branch 'main' into tran/pushdown_parquet 2021-05-24 16:31:06 -04:00
Nga Tran e72ae81a8e feat: support predicate pushdown for parquet files 2021-05-24 16:22:52 -04:00
kodiakhq[bot] db96286ed7
Merge branch 'main' into er/refactor/scalar_comp 2021-05-24 17:02:14 +00:00
Andrew Lamb c464ffadad
refactor: remove special case timestamp_range in parquet chunk (#1543)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-24 16:19:44 +00:00
Andrew Lamb 14ba25f86d
chore: Update datafusion and use released version of arrow crates (#1546)
* chore: Update datafusion and use released version of arrow crate

* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Edd Robinson abe64c6edc test: uncomment tests to fix 2021-05-24 16:18:53 +01:00
Carol (Nichols || Goulding) 5c5064bdac
fix: Set default line timestamp and default partition time to same value (#1512)
* refactor: Rearrange to allow injection of the current time in tests

* test: Failing test showing a point can be in the wrong partition

* fix: Only get the default time once per ShardedEntry creation, in router
2021-05-24 14:55:11 +00:00
Andrew Lamb 27e5b8fabf
refactor: Remove multiple table support from Parquet Chunk (#1541) 2021-05-24 08:40:31 -04:00
Nga Tran 1f70d1f9c8 chore: remove a couple more comments 2021-05-21 17:06:53 -04:00
Nga Tran f113abacb5 feat: more unit & e2e tests plus cleanup and addressing review comments of Andrew and Edd 2021-05-21 16:48:43 -04:00
Nga Tran 1093542578 fix: now all tests pass. Next step is cleaning up and addressing review comments 2021-05-21 13:29:20 -04:00
Nga Tran 784ef88fcd chore: merge main to branch and add more tests that expose a wrong result bug on unsigned int 2021-05-21 12:38:06 -04:00
Nga Tran 93afc9c213 chore: more tests 2021-05-21 11:39:12 -04:00
Raphael Taylor-Davies 5b619733d9
refactor: split lifecycle tracking from chunk state (#1361) (#1099) (#1397)
* refactor: split lifecycle tracking from chunk state (#1361) (#1099)

* chore: namespace internal errors

* chore: fix logical conflict

* chore: don't remove moving chunk size metric
2021-05-21 09:27:44 +00:00
Nga Tran e44a3a87db feat: fnow predicate is actuallu pushed down to RUB but there are bugs and not working yet 2021-05-20 16:56:15 -04:00
kodiakhq[bot] f028a356f4
Merge branch 'main' into crepererum/issue1382-c 2021-05-20 15:51:47 +00:00
kodiakhq[bot] aac00d4fa6
Merge branch 'main' into crepererum/remove_snapshotting 2021-05-20 14:14:58 +00:00
Marco Neumann 0e37d500eb feat: remove snapshot feature
The parquet files produced by this code path are only semi-specified and
will miss many important metadata aspects that we will require for data
lineage.
2021-05-20 14:59:04 +02:00
Marko Mikulicic 462a5590c6
fix: fmt 2021-05-20 14:58:50 +02:00
Marko Mikulicic c908cf0f98
fix: review suggestion
Co-authored-by: Edd Robinson <me@edd.io>
2021-05-20 14:40:02 +02:00
Marko Mikulicic aa90329c1f
feat: Add remote_template for simpler remote configuration 2021-05-20 12:45:08 +02:00
Marco Neumann 7e55544eef fix: correctly track chunk ID counter during catalog replay 2021-05-20 10:32:40 +02:00
Marco Neumann 93251f22c7 feat: read perserved catalog during DB startup
Closes #1382.
2021-05-20 10:28:31 +02:00
Marko Mikulicic 91d7189e6d
feat: Log cached connections 2021-05-20 10:27:20 +02:00
Raphael Taylor-Davies 37880ee89a
refactor: store chunk IDs only in catalog (#1521)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Nga Tran 00dacb5394 feat: add tests to verify the correctness as well as the explain of the plan 2021-05-19 17:31:16 -04:00
Nga Tran 11561111d5 chore: merge main to branch 2021-05-19 15:11:15 -04:00
Nga Tran 087d61f229 feat: Part 1 of predicate push down - Send predicates to MUB, RUB, and Parquet File. Note that MUB has not handled predicates yet 2021-05-19 13:59:51 -04:00
Marko Mikulicic ce2f8351be
fix: Cache outbound gRPC connections 2021-05-19 18:28:45 +02:00
Marco Neumann 8db26485a4 refactor: empty transaction during catalog creation
That involves some refactoring which we are going to need anyway for
hooking up the "read" path of the catalog into the DB startup, namely:

- make `Db::new` require a preserved catalog
- introduce a helper function that can provide that
- as a consequence, all test-creations of a Db are now async

This prepares for #1382.
2021-05-18 17:42:07 +02:00
kodiakhq[bot] c3cc58b2ff
Merge branch 'main' into crepererum/issue1382 2021-05-17 17:57:26 +00:00
Raphael Taylor-Davies 4f0e46bcd5
refactor: track ingest metrics in one place (#1503)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-17 16:25:01 +00:00
Marco Neumann 18f0a7f614 docs: reference open issue 2021-05-17 14:01:51 +02:00
Marco Neumann cdf0ada6a6 test: test preserved catalog <-> Db write wiring 2021-05-17 13:57:31 +02:00
Raphael Taylor-Davies 91a45fd380
feat: simplify shutdown (#1502)
* feat: simplify shutdown

* chore: fix lint

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-17 11:50:14 +00:00
Marco Neumann 4299371cf2 refactor: remove some code 2021-05-17 12:32:48 +02:00
Marco Neumann 840c11dab2 feat: wire up catalog preservation write path
Required a bit of refactoring:

- Add an extra layer between DB an catalog which is the "preserved
  catalog" wrapper. This is required to make the ownership model
  somewhat sane, because during the read operations the "preserved
  catalog" is going to act on the in-mem catalog.
- Move "parquet file written" logic into binding `preserved catalog <->
  catalog state`, so we have a single place where new parquet files are
  announced. For now this only works for chunks that are already known
  (i.e. the writing->written transation when coming from read buffer),
  however in the next PR this will be extended to also handle totally
  new parquet files during transaction playback.

**NOTE: This does NOT include the read path yet!**

Issue: #1382.
2021-05-17 11:33:22 +02:00
Andrew Lamb 07db4932ee
refactor: rename data_types/src/chunk.rs -> data_types/src/chunk_metadata.rs (#1500) 2021-05-15 10:18:01 +00:00
Raphael Taylor-Davies f9178dbb5f
feat: push metrics into catalog (#1488)
* feat: push metrics into catalog

* chore: minor cleanup

* fix: include db labels in chunk metric domains

* chore: fmt

* fix: don't allow dropping moving chunks

* chore: further tweaks

* chore: review feedback

* feat: use new_unregistered() for metric instruments instead of default

* chore: use &[KeyValue] instead of &Vec<KeyValue>

* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
kodiakhq[bot] fdc8461c7f
Merge branch 'main' into cn/wb-clock 2021-05-14 13:00:06 +00:00
Marko Mikulicic 35c2ca17fc
fix: Add ingest_fields_total
ingest_lines_total count lines (which apparently are the same as points, quite confusingly)

No yaks harmed in the making of this PR.

(NOTE: the code around metric, especially dealing with happy and error paths is very painful;
to be done in another PR)
2021-05-13 17:55:07 +02:00
Nga Tran 9583636748 feat: we now can read parquet files form all kind of object stores 2021-05-12 18:05:34 -04:00
Carol (Nichols || Goulding) 8be95856ab test: Add a test with multiple threads using a process clock 2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding) cecb4afc58 docs: Add some documentation on the assumptions around this design 2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding) b3fb61a0b3 refactor: Rename now_nanos to system_clock_now for clarity 2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding) 425aacc391 refactor: Extract ProcessClock into its own type 2021-05-12 13:31:26 -04:00
Carol (Nichols || Goulding) b749353d21 refactor: Use a compare_exchange loop instead of Arc Mutex 2021-05-12 10:58:08 -04:00
Carol (Nichols || Goulding) 5dfd152549 test: Use the now_nanos helper function more in tests 2021-05-12 10:58:08 -04:00
Carol (Nichols || Goulding) f28c9ae04c docs: Add unit and semantic information about the process clock 2021-05-12 10:58:08 -04:00
Carol (Nichols || Goulding) 513d4731be feat: Add a process clock to Db and use it for Sequenced Entries
Connects to #1157.
2021-05-12 10:58:06 -04:00
Carol (Nichols || Goulding) f98807936d test: Some tests don't call await, so they don't need to be async 2021-05-12 10:57:05 -04:00
Edd Robinson 696e4e0cfd fix: ensure metrics not overwriting 2021-05-11 20:57:31 +01:00
Raphael Taylor-Davies 4409d2c8af
feat: instrument catalog locks (#1464)
* feat: instrument catalog locks (#1355)

* chore: add metrics test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-11 18:59:11 +00:00
Andrew Lamb 9d0c3a2b1a
refactor: Remove multi-table per chunk code in MUB (#1471)
* refactor: Remove multi-table per chunk code in MUB

* fix: clippy

* fix: bench build

* fix: merge conflicts
2021-05-11 17:49:07 +00:00
Raphael Taylor-Davies d1da954fe4
feat: don't store encoded strings twice in RLE dictionaries (#1469) 2021-05-11 15:22:25 +00:00
Edd Robinson 3622a92c8b feat: wire in rb column metrics 2021-05-11 13:00:52 +01:00
Marco Neumann 795f5bfcb7 refactor: make `StatValues::{min,max}` optional + handle NaNs
This will allow us to:

- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
  anymore)
2021-05-10 17:12:25 +02:00
Andrew Lamb f037c1281a
feat: Calculate all system tables "on demand" (#1452)
* feat: compute system.columns table on demand

* feat: compute system.chunk_columns on demand

* feat: compute system.operations on demand

* fix: fixup schemas

* fix: Log errors

* fix: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-10 14:43:55 +00:00
Marko Mikulicic 9f5350a6c5
fix: Load only databases for which a config exists
Closes #1450
2021-05-10 13:14:22 +02:00
Nga Tran c6b933eb63 chore: merge main to branch 2021-05-07 18:40:17 -04:00
Nga Tran 971500681f refactor: address Andrew's and Carol's comment 2021-05-07 17:33:19 -04:00
Nga Tran ba015ee4df refactor: clean up and add comments 2021-05-07 09:31:41 -04:00
Edd Robinson eae3fec571 feat: wire up regex UDF as predicate filter expr 2021-05-07 13:44:51 +01:00
Andrew Lamb b5ea71f45f
feat: Expose the storage usage for each column in system.chunk_columns (#1441)
* feat: Expose the storage usage for each column in system.chunk_columns

* fix: fixup logical conflicts

* refactor: move coalsce logic into the read buffer

* fix: Update system_tables to not use coalese

* fix: Improve comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-07 12:36:49 +00:00
Raphael Taylor-Davies 9320f59de0
feat: add shard sink indirection (#1447)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-07 11:04:51 +00:00
Andrew Lamb d7253c72c0
feat: Only calculate system.chunks table "on demand" (#1446)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-07 10:26:56 +00:00
Carol (Nichols || Goulding) febc1538ff
chore: Update Rust version (#1445)
* chore: Update Rust version

* refactor: Make struct constructor field orderings consistent

Sometimes I changed the struct definition, sometimes changed the struct
construction instance, depending on consistency with code around each
(other similar structs, function argument orders, etc)

More info: https://rust-lang.github.io/rust-clippy/master/index.html#inconsistent_struct_constructor

* refactor: Use flatten where appropriate

One instance is a false positive with a clippy bug.

More info:

- https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_identity
- https://rust-lang.github.io/rust-clippy/master/index.html#manual_flatten

* refactor: Use Option map instead of match

More info: https://rust-lang.github.io/rust-clippy/master/index.html#manual_map

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 22:07:10 +00:00
Nga Tran 55bf848bd2 feat: Now we can query directly from files in object store 2021-05-06 18:02:17 -04:00
Raphael Taylor-Davies 7f6b11266d
feat: instrument catalog locks (#1355) (#1439)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 17:09:52 +00:00
Raphael Taylor-Davies 44de42906f
refactor: use Arc<str> instead of Arc<String> (#1442) 2021-05-06 17:05:08 +00:00
Raphael Taylor-Davies 49c0b8b90c
feat: pull-based metrics (#1355) (#1414)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 15:54:30 +00:00
Raphael Taylor-Davies 216903a949
refactor: move protobuf conversion logic to generated_types (#1437)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 15:49:27 +00:00
Andrew Lamb 884baf7329
feat: add column_type and influxdb_column_type, remove row_count from system.columns (#1415)
* feat: add column_type and influxdb_column_type, remove row_count from system.columns

* fix: update tests

* fix: more test update

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fmt

* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-06 12:59:30 +00:00
Marko Mikulicic 578dc0db25
feat: Add more logs to shed light on the curious incident with missing metrics in the nighttime 2021-05-06 14:42:48 +02:00
Raphael Taylor-Davies 10f89a3e8d
refactor: split entry out into separate crate (#1428)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 11:36:23 +00:00
Nga Tran a5c92fae8a chore: merge main to branch 2021-05-05 13:48:42 -04:00
Raphael Taylor-Davies 411cf134e9
refactor: explode arrow_deps (#1425)
* refactor: explode arrow_deps

* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
kodiakhq[bot] 4395ede244
Merge branch 'main' into debug-chunk-metrics 2021-05-05 15:43:32 +00:00
Marko Mikulicic 2b0d7cfb91
feat: Add debug to update_chunk_state metrics 2021-05-05 17:37:57 +02:00
Nga Tran fcb37a0b1d feat: more testing scenarios for quering parquet files 2021-05-05 10:57:02 -04:00
Carol (Nichols || Goulding) 4a64e22e64 refactor: Use trait object and deref instead of cloning Arc in tests 2021-05-05 10:55:12 -04:00
Carol (Nichols || Goulding) e32fa43a53 docs: Add note about implication of write buffer errors 2021-05-05 10:55:12 -04:00
Carol (Nichols || Goulding) 7d5c988fba feat: Actually route SequencedEntry to the Write Buffer, if present
Connects to #1157.

Rearrange some code and comments to be consistent with the design. Make
some more places not care whether they're getting an owned or borrowed
SequencedEntry.
2021-05-05 10:55:11 -04:00
Carol (Nichols || Goulding) 54c5f984d5 fix: Use stdlib's path manipulation rather than format
The syntax highlighting in my editor broke because of the unmatched
double quote, which got me to look a bit closer at this test. These
tests would have failed on Windows.
2021-05-05 10:55:11 -04:00
Carol (Nichols || Goulding) 231abd221f refactor: Extract a TestDbBuilder 2021-05-05 10:55:11 -04:00
Carol (Nichols || Goulding) 62dfb47825 refactor: Reorganize test imports 2021-05-05 10:55:11 -04:00
Marco Neumann 9e61b470e7 feat: change MemoryStream to accept multiple record batches 2021-05-05 13:29:16 +02:00
Marco Neumann 34754ebcdb refactor: move MemoryStream to arrow_deps 2021-05-05 13:29:16 +02:00
Edd Robinson 733d502350 refactor: fix tests 2021-05-04 18:38:42 +01:00
Edd Robinson 9aa144e0f4 feat: add per-stage current chunk storage 2021-05-04 17:43:53 +01:00
Andrew Lamb 3b7c5ac350
fix(storage rpc): do not send back tags with empty values (#1403) 2021-05-04 10:35:24 +00:00
Marko Mikulicic b579ef8646
feat: Add jemalloc stats 2021-05-03 12:10:48 +02:00
kodiakhq[bot] 3c5595d046
Merge branch 'main' into ntran/unload_chunks 2021-04-30 22:02:38 +00:00
Paul Dix 979f5f9347 refactor: write buffer to use sequenced entry and new segment
This refactors the write buffer to use the sequenced entry structure and the new segment definition. It removes the old replicated write and write_buffer.fbs.

Finally, it updates the SequencedEntry wrapper type around the Flatbuffer structure to be a trait so that SequencedEntry can be initialized from a borrowed Flatbuffer or an owned Vec<u8>.

How writes go into segments in the buffer and any kind of validation will likely have to be updated based on what kinds of guarantees we want to make in the buffer. However, that should probably come after we've rethought the design a bit around the new layout of chunks in the Parquet persistence.
2021-04-30 17:00:23 -04:00
Raphael Taylor-Davies a967ebfabd
refactor: rename closing to closed (#1396)
* refactor: rename closing to closed

* refactor: further renames
2021-04-30 20:59:45 +00:00
Nga Tran 34a3388a49 feat: unload chunks from read buffer but keep them in object store 2021-04-30 16:12:02 -04:00
Raphael Taylor-Davies c2f7e7efea
feat: warn on dropping from open partition (#1395)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-30 18:56:49 +00:00
Edd Robinson ebe671e59e refactor: include default labels 2021-04-30 14:03:09 +01:00
Andrew Lamb 40b9b09cdc
refactor: rename assert_table_eq to assert_batches_eq (#1368) 2021-04-30 10:51:08 +00:00
Nga Tran c9b33c6b7d chore: Merge branch 'main' into ntran/test_query_parquets 2021-04-29 14:22:34 -04:00