Commit Graph

138 Commits (a449d5ef7433fcadcffe5991971c28849be69541)

Author SHA1 Message Date
Marco Neumann 9f451423d5 feat: log files that are deleted 2021-05-26 12:49:44 +02:00
Marco Neumann 24ec1a472e fix: do NOT delete parquet files that are reachable by time travel 2021-05-26 12:38:54 +02:00
Marco Neumann 5983336366 refactor: rename `parquet_file::{utils => test_utils}` 2021-05-26 11:09:29 +02:00
Marco Neumann d7e3bc569e refactor: shorten time we hold the transaction lock during clean-up 2021-05-26 11:04:57 +02:00
Marco Neumann 18f5dd9ae1 test: ensure transaction lock exists during cleanup planning 2021-05-26 11:04:57 +02:00
Marco Neumann b55eae98da fix: do not delete non-parquet files during catalog-driven cleanup 2021-05-26 11:04:57 +02:00
Marco Neumann 5ed16ff294 refactor: improve error message in `parquet_file::cleanup` 2021-05-26 11:04:57 +02:00
Marco Neumann 14fdf3b7c7 feat: implement object store cleanup core routine 2021-05-26 11:02:40 +02:00
Marco Neumann cc78b5317d feat: add method to get all parquet files from catalog state 2021-05-26 11:02:40 +02:00
Marco Neumann 953114af2e feat: add method to abort catalog transaction 2021-05-26 11:02:40 +02:00
Marco Neumann 92fcd7e940 feat: add a way to get OS, server ID and DB name from catalog 2021-05-26 11:02:40 +02:00
Marco Neumann 9daa4d00d6 test: re-organize `parquet_file` test utils a bit 2021-05-26 11:02:39 +02:00
Marco Neumann 38183928c8 refactor: extract path generator for data location 2021-05-26 10:59:40 +02:00
Marco Neumann 19a2733d30 feat: preserve transaction metadata in parquets 2021-05-25 09:56:12 +02:00
Marco Neumann fe8e6301fe refactor: move `read_schema_from_parquet_metadata` back to `parquet_file::metadata`
Let us pool all metadata handling in a single module, which makes it
easier to review.
2021-05-25 09:37:53 +02:00
Marco Neumann ac83d99f66 feat: add a way to get current revision and UUID from transaction handle 2021-05-25 09:37:53 +02:00
Marco Neumann fdc553b257 refactor: replace unwrap with expect 2021-05-25 09:37:53 +02:00
Andrew Lamb c464ffadad
refactor: remove special case timestamp_range in parquet chunk (#1543)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-24 16:19:44 +00:00
Andrew Lamb 14ba25f86d
chore: Update datafusion and use released version of arrow crates (#1546)
* chore: Update datafusion and use released version of arrow crate

* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Andrew Lamb 27e5b8fabf
refactor: Remove multiple table support from Parquet Chunk (#1541) 2021-05-24 08:40:31 -04:00
Marco Neumann 8bdddfd475 docs: mention that catalog wiping does not delete parquet files 2021-05-20 10:22:20 +02:00
Marco Neumann b1a06246d6 feat: implement function to wipe a preserved catalog 2021-05-20 10:22:20 +02:00
Marco Neumann 6c405aa6f9 feat: check if preserved catalog exists when creating an empty one 2021-05-20 10:22:20 +02:00
Marco Neumann c6a6005f65 feat: add `PreservedCatalog.exists` 2021-05-20 10:22:20 +02:00
Raphael Taylor-Davies 37880ee89a
refactor: store chunk IDs only in catalog (#1521)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Marco Neumann 8db26485a4 refactor: empty transaction during catalog creation
That involves some refactoring which we are going to need anyway for
hooking up the "read" path of the catalog into the DB startup, namely:

- make `Db::new` require a preserved catalog
- introduce a helper function that can provide that
- as a consequence, all test-creations of a Db are now async

This prepares for #1382.
2021-05-18 17:42:07 +02:00
Marco Neumann cdf0ada6a6 test: test preserved catalog <-> Db write wiring 2021-05-17 13:57:31 +02:00
Marco Neumann 68729dd5ee refactor: avoid string allocation 2021-05-17 12:32:34 +02:00
Marco Neumann adcd8132e7 docs: more comments regarding catalog transaction handling 2021-05-17 12:05:08 +02:00
Marco Neumann a99d53e771 docs: document `OpenTransaction::handle_action*` 2021-05-17 11:48:51 +02:00
Marco Neumann 4fb800c7a6 refactor: make PreservedCatalog easier to integrate 2021-05-17 11:33:22 +02:00
Marco Neumann f4d7154746 fix: table summaries must include timestamp as well 2021-05-17 11:33:22 +02:00
Marco Neumann 7cced3242f feat: add a way to parse infos from parquet paths 2021-05-17 11:33:22 +02:00
Marco Neumann 5969caccb0 feat: return parquet metadata from `write_to_object_store` 2021-05-17 11:33:22 +02:00
Raphael Taylor-Davies f9178dbb5f
feat: push metrics into catalog (#1488)
* feat: push metrics into catalog

* chore: minor cleanup

* fix: include db labels in chunk metric domains

* chore: fmt

* fix: don't allow dropping moving chunks

* chore: further tweaks

* chore: review feedback

* feat: use new_unregistered() for metric instruments instead of default

* chore: use &[KeyValue] instead of &Vec<KeyValue>

* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
Nga Tran 9583636748 feat: we now can read parquet files form all kind of object stores 2021-05-12 18:05:34 -04:00
Marco Neumann 795f5bfcb7 refactor: make `StatValues::{min,max}` optional + handle NaNs
This will allow us to:

- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
  anymore)
2021-05-10 17:12:25 +02:00
Nga Tran c6b933eb63 chore: merge main to branch 2021-05-07 18:40:17 -04:00
Nga Tran f2c19ec080 refactor: further address Carol's comment 2021-05-07 17:40:40 -04:00
Nga Tran 971500681f refactor: address Andrew's and Carol's comment 2021-05-07 17:33:19 -04:00
Carol (Nichols || Goulding) e2cc4634bf fix: Use PathBuf rather than debug formatting and back to String
This is the same fix I made in 54c5f98, just found a few more spots :)
2021-05-07 15:58:11 -04:00
Nga Tran 31d49db0ed chore: a litlle more cleanup 2021-05-07 09:38:41 -04:00
Nga Tran ba015ee4df refactor: clean up and add comments 2021-05-07 09:31:41 -04:00
Marco Neumann 1a998d4116 feat: preserve parquet metadata in catalog
Closes #1380.
2021-05-07 09:51:44 +02:00
Marco Neumann c3d523fc4f refactor: add col prefixes to make_chunk & Co 2021-05-07 09:51:44 +02:00
Marco Neumann 5db504300d refactor: use parsed paths instead of raw strings for catalog paths 2021-05-07 09:51:44 +02:00
Nga Tran 55bf848bd2 feat: Now we can query directly from files in object store 2021-05-06 18:02:17 -04:00
Andrew Lamb 884baf7329
feat: add column_type and influxdb_column_type, remove row_count from system.columns (#1415)
* feat: add column_type and influxdb_column_type, remove row_count from system.columns

* fix: update tests

* fix: more test update

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fmt

* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-06 12:59:30 +00:00
Andrew Lamb 86771ea629
chore: update arrow/datafusion deps (#1433)
* chore: update datafusion deps

* chore: update arrow deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-05 22:37:31 +00:00
Nga Tran a5c92fae8a chore: merge main to branch 2021-05-05 13:48:42 -04:00
Nga Tran 3bdb451529 chore: merge main to branch 2021-05-05 13:18:39 -04:00
Raphael Taylor-Davies 411cf134e9
refactor: explode arrow_deps (#1425)
* refactor: explode arrow_deps

* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Nga Tran 2b46f51e5b chore: address Dom's comment 2021-05-05 12:55:41 -04:00
Nga Tran a1f3413c89 refactor: move private test helpers to utils module to be used by many modules 2021-05-05 11:41:46 -04:00
Nga Tran fcb37a0b1d feat: more testing scenarios for quering parquet files 2021-05-05 10:57:02 -04:00
Marco Neumann 1f42eb89cd feat: implement parquet metadata handling
Closes #1379 and contributes to #1380.
2021-05-05 13:29:16 +02:00
Marco Neumann 056c29aaa2 feat: add a way to retrieve timestamp range from parquet chunk 2021-05-05 13:29:16 +02:00
Marco Neumann c54109113e feat: add a way to retrieve storage path from parquet chunks 2021-05-05 13:29:16 +02:00
Marco Neumann 136c35cb88 feat: implement transaction handling for catalog
Closes #1253.
2021-05-03 10:04:35 +02:00
Nga Tran 34a3388a49 feat: unload chunks from read buffer but keep them in object store 2021-04-30 16:12:02 -04:00
Nga Tran e87973babe refactor: address review comments 2021-04-29 13:15:43 -04:00
Nga Tran 402d9c748c chore: cargo fmt 2021-04-28 16:52:52 -04:00
Nga Tran 2a2760bd18 feat: complete tests where data in both RUB and OS 2021-04-28 16:14:07 -04:00
Nga Tran 140d96dbea feat: tests ffor loading data to object store and make sure twe still query read buffer 2021-04-28 15:59:17 -04:00
Marco Neumann eddc9319ff docs: deny broken intradoc links 2021-04-27 13:22:28 +02:00
Carol (Nichols || Goulding) 272cdb85ce fix: Use the ServerId type everywhere, for writing, querying, anything 2021-04-26 18:44:32 +00:00
Carol (Nichols || Goulding) b8face3335 refactor: Organize use statements 2021-04-26 18:44:32 +00:00
Jake Goulding 67f5ad841d refactor: Introduce ServerId and CurrentServerId types 2021-04-26 18:44:32 +00:00
Nga Tran 657bfa1b20 refactor: address Andrew's comments 2021-04-16 17:44:46 -04:00
Nga Tran b3e110a241 refactor: address Jake's comment 2021-04-16 17:27:40 -04:00
Nga Tran 4c23ca8888 feat: full implementation of parquet's read_filter for review 2021-04-16 16:03:24 -04:00
Andrew Lamb e226b5a820
feat: Use TimestampNanosecondArray for timestamps in IOx (#1230)
* refactor: Create Arrow arrays using iterators

* feat: use Timestamp64(TimeUnit::Nanosecond) for timestamps

* feat: add support for timestamp array

* fix: update more tests

* fix: remove unecessary code

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-16 15:55:33 +00:00
Nga Tran 231ebb54d4 chore: fix a format 2021-04-14 16:32:25 -04:00
Nga Tran 4e2d59d9a5 feat: saimplement a few more functions as part of supporting query dfrom parquet files 2021-04-14 16:06:47 -04:00
Nga Tran 05bf28ce85 feat: Add 2 main functions table_schema and table_names for Parquet Chunk ato pay a foundation for querying it 2021-04-13 18:23:55 -04:00
Nga Tran 4a6d6bd7ad feat: initial work for querying data from parquet file in object store 2021-04-13 13:57:46 -04:00
Raphael Taylor-Davies 1997324344
feat: mutable buffer snapshotting (#1179)
* feat: mutable buffer snapshotting

* chore: review feedback
2021-04-13 12:14:54 +00:00
Nga Tran 453aeaf1a0 feat: Add tests for writing RB chunks to Object Store 2021-04-09 17:39:23 -04:00
Nga Tran f501a74aea refactor: Address review comments 2021-04-07 21:28:03 -04:00
Nga Tran be6e1e48e4 feat: add writer_id and object_store in Db 2021-04-07 18:36:07 -04:00
Raphael Taylor-Davies c2355aca6d
feat: add basic memory tracking (#1125)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-07 15:38:24 +00:00
Nga Tran 6e01fbc382 feat: ause TableSummary as metadata for parquet chunk's tables and read buffer's read_filter ot get data 2021-04-05 15:37:34 -04:00
Nga Tran 4bdf8963e6 feat: continue buidling foundation for writing RB chunks to parquet files 2021-04-02 16:06:25 -04:00
Nga Tran 49267114d3 chore: merge main into branch and resolve conflicts 2021-04-01 13:22:49 -04:00
Nga Tran 1463c6645f feat: Add ChunkState::ObjectStore and rename ParquetChunk to Chunk 2021-04-01 11:53:03 -04:00
Nga Tran 19a453a483 feat: finally have some framework with clear todos for writing a chunk into parquet files 2021-03-31 16:21:53 -04:00
Nga Tran cd409b471f feat: continue the implementation 2021-03-30 21:31:51 -04:00
Nga Tran 0bcd52d5c9 feat: Add more changes 2021-03-30 18:31:09 -04:00