Marco Neumann
9f451423d5
feat: log files that are deleted
2021-05-26 12:49:44 +02:00
Marco Neumann
24ec1a472e
fix: do NOT delete parquet files that are reachable by time travel
2021-05-26 12:38:54 +02:00
Marco Neumann
5983336366
refactor: rename `parquet_file::{utils => test_utils}`
2021-05-26 11:09:29 +02:00
Marco Neumann
d7e3bc569e
refactor: shorten time we hold the transaction lock during clean-up
2021-05-26 11:04:57 +02:00
Marco Neumann
18f5dd9ae1
test: ensure transaction lock exists during cleanup planning
2021-05-26 11:04:57 +02:00
Marco Neumann
b55eae98da
fix: do not delete non-parquet files during catalog-driven cleanup
2021-05-26 11:04:57 +02:00
Marco Neumann
5ed16ff294
refactor: improve error message in `parquet_file::cleanup`
2021-05-26 11:04:57 +02:00
Marco Neumann
14fdf3b7c7
feat: implement object store cleanup core routine
2021-05-26 11:02:40 +02:00
Marco Neumann
cc78b5317d
feat: add method to get all parquet files from catalog state
2021-05-26 11:02:40 +02:00
Marco Neumann
953114af2e
feat: add method to abort catalog transaction
2021-05-26 11:02:40 +02:00
Marco Neumann
92fcd7e940
feat: add a way to get OS, server ID and DB name from catalog
2021-05-26 11:02:40 +02:00
Marco Neumann
9daa4d00d6
test: re-organize `parquet_file` test utils a bit
2021-05-26 11:02:39 +02:00
Marco Neumann
38183928c8
refactor: extract path generator for data location
2021-05-26 10:59:40 +02:00
Marco Neumann
19a2733d30
feat: preserve transaction metadata in parquets
2021-05-25 09:56:12 +02:00
Marco Neumann
fe8e6301fe
refactor: move `read_schema_from_parquet_metadata` back to `parquet_file::metadata`
...
Let us pool all metadata handling in a single module, which makes it
easier to review.
2021-05-25 09:37:53 +02:00
Marco Neumann
ac83d99f66
feat: add a way to get current revision and UUID from transaction handle
2021-05-25 09:37:53 +02:00
Marco Neumann
fdc553b257
refactor: replace unwrap with expect
2021-05-25 09:37:53 +02:00
Andrew Lamb
c464ffadad
refactor: remove special case timestamp_range in parquet chunk ( #1543 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-24 16:19:44 +00:00
Andrew Lamb
14ba25f86d
chore: Update datafusion and use released version of arrow crates ( #1546 )
...
* chore: Update datafusion and use released version of arrow crate
* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Andrew Lamb
27e5b8fabf
refactor: Remove multiple table support from Parquet Chunk ( #1541 )
2021-05-24 08:40:31 -04:00
Marco Neumann
8bdddfd475
docs: mention that catalog wiping does not delete parquet files
2021-05-20 10:22:20 +02:00
Marco Neumann
b1a06246d6
feat: implement function to wipe a preserved catalog
2021-05-20 10:22:20 +02:00
Marco Neumann
6c405aa6f9
feat: check if preserved catalog exists when creating an empty one
2021-05-20 10:22:20 +02:00
Marco Neumann
c6a6005f65
feat: add `PreservedCatalog.exists`
2021-05-20 10:22:20 +02:00
Raphael Taylor-Davies
37880ee89a
refactor: store chunk IDs only in catalog ( #1521 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Marco Neumann
8db26485a4
refactor: empty transaction during catalog creation
...
That involves some refactoring which we are going to need anyway for
hooking up the "read" path of the catalog into the DB startup, namely:
- make `Db::new` require a preserved catalog
- introduce a helper function that can provide that
- as a consequence, all test-creations of a Db are now async
This prepares for #1382 .
2021-05-18 17:42:07 +02:00
Marco Neumann
cdf0ada6a6
test: test preserved catalog <-> Db write wiring
2021-05-17 13:57:31 +02:00
Marco Neumann
68729dd5ee
refactor: avoid string allocation
2021-05-17 12:32:34 +02:00
Marco Neumann
adcd8132e7
docs: more comments regarding catalog transaction handling
2021-05-17 12:05:08 +02:00
Marco Neumann
a99d53e771
docs: document `OpenTransaction::handle_action*`
2021-05-17 11:48:51 +02:00
Marco Neumann
4fb800c7a6
refactor: make PreservedCatalog easier to integrate
2021-05-17 11:33:22 +02:00
Marco Neumann
f4d7154746
fix: table summaries must include timestamp as well
2021-05-17 11:33:22 +02:00
Marco Neumann
7cced3242f
feat: add a way to parse infos from parquet paths
2021-05-17 11:33:22 +02:00
Marco Neumann
5969caccb0
feat: return parquet metadata from `write_to_object_store`
2021-05-17 11:33:22 +02:00
Raphael Taylor-Davies
f9178dbb5f
feat: push metrics into catalog ( #1488 )
...
* feat: push metrics into catalog
* chore: minor cleanup
* fix: include db labels in chunk metric domains
* chore: fmt
* fix: don't allow dropping moving chunks
* chore: further tweaks
* chore: review feedback
* feat: use new_unregistered() for metric instruments instead of default
* chore: use &[KeyValue] instead of &Vec<KeyValue>
* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
Nga Tran
9583636748
feat: we now can read parquet files form all kind of object stores
2021-05-12 18:05:34 -04:00
Marco Neumann
795f5bfcb7
refactor: make `StatValues::{min,max}` optional + handle NaNs
...
This will allow us to:
- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
anymore)
2021-05-10 17:12:25 +02:00
Nga Tran
c6b933eb63
chore: merge main to branch
2021-05-07 18:40:17 -04:00
Nga Tran
f2c19ec080
refactor: further address Carol's comment
2021-05-07 17:40:40 -04:00
Nga Tran
971500681f
refactor: address Andrew's and Carol's comment
2021-05-07 17:33:19 -04:00
Carol (Nichols || Goulding)
e2cc4634bf
fix: Use PathBuf rather than debug formatting and back to String
...
This is the same fix I made in 54c5f98
, just found a few more spots :)
2021-05-07 15:58:11 -04:00
Nga Tran
31d49db0ed
chore: a litlle more cleanup
2021-05-07 09:38:41 -04:00
Nga Tran
ba015ee4df
refactor: clean up and add comments
2021-05-07 09:31:41 -04:00
Marco Neumann
1a998d4116
feat: preserve parquet metadata in catalog
...
Closes #1380 .
2021-05-07 09:51:44 +02:00
Marco Neumann
c3d523fc4f
refactor: add col prefixes to make_chunk & Co
2021-05-07 09:51:44 +02:00
Marco Neumann
5db504300d
refactor: use parsed paths instead of raw strings for catalog paths
2021-05-07 09:51:44 +02:00
Nga Tran
55bf848bd2
feat: Now we can query directly from files in object store
2021-05-06 18:02:17 -04:00
Andrew Lamb
884baf7329
feat: add column_type and influxdb_column_type, remove row_count from system.columns ( #1415 )
...
* feat: add column_type and influxdb_column_type, remove row_count from system.columns
* fix: update tests
* fix: more test update
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fmt
* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-06 12:59:30 +00:00
Andrew Lamb
86771ea629
chore: update arrow/datafusion deps ( #1433 )
...
* chore: update datafusion deps
* chore: update arrow deps
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-05 22:37:31 +00:00
Nga Tran
a5c92fae8a
chore: merge main to branch
2021-05-05 13:48:42 -04:00
Nga Tran
3bdb451529
chore: merge main to branch
2021-05-05 13:18:39 -04:00
Raphael Taylor-Davies
411cf134e9
refactor: explode arrow_deps ( #1425 )
...
* refactor: explode arrow_deps
* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Nga Tran
2b46f51e5b
chore: address Dom's comment
2021-05-05 12:55:41 -04:00
Nga Tran
a1f3413c89
refactor: move private test helpers to utils module to be used by many modules
2021-05-05 11:41:46 -04:00
Nga Tran
fcb37a0b1d
feat: more testing scenarios for quering parquet files
2021-05-05 10:57:02 -04:00
Marco Neumann
1f42eb89cd
feat: implement parquet metadata handling
...
Closes #1379 and contributes to #1380 .
2021-05-05 13:29:16 +02:00
Marco Neumann
056c29aaa2
feat: add a way to retrieve timestamp range from parquet chunk
2021-05-05 13:29:16 +02:00
Marco Neumann
c54109113e
feat: add a way to retrieve storage path from parquet chunks
2021-05-05 13:29:16 +02:00
Marco Neumann
136c35cb88
feat: implement transaction handling for catalog
...
Closes #1253 .
2021-05-03 10:04:35 +02:00
Nga Tran
34a3388a49
feat: unload chunks from read buffer but keep them in object store
2021-04-30 16:12:02 -04:00
Nga Tran
e87973babe
refactor: address review comments
2021-04-29 13:15:43 -04:00
Nga Tran
402d9c748c
chore: cargo fmt
2021-04-28 16:52:52 -04:00
Nga Tran
2a2760bd18
feat: complete tests where data in both RUB and OS
2021-04-28 16:14:07 -04:00
Nga Tran
140d96dbea
feat: tests ffor loading data to object store and make sure twe still query read buffer
2021-04-28 15:59:17 -04:00
Marco Neumann
eddc9319ff
docs: deny broken intradoc links
2021-04-27 13:22:28 +02:00
Carol (Nichols || Goulding)
272cdb85ce
fix: Use the ServerId type everywhere, for writing, querying, anything
2021-04-26 18:44:32 +00:00
Carol (Nichols || Goulding)
b8face3335
refactor: Organize use statements
2021-04-26 18:44:32 +00:00
Jake Goulding
67f5ad841d
refactor: Introduce ServerId and CurrentServerId types
2021-04-26 18:44:32 +00:00
Nga Tran
657bfa1b20
refactor: address Andrew's comments
2021-04-16 17:44:46 -04:00
Nga Tran
b3e110a241
refactor: address Jake's comment
2021-04-16 17:27:40 -04:00
Nga Tran
4c23ca8888
feat: full implementation of parquet's read_filter for review
2021-04-16 16:03:24 -04:00
Andrew Lamb
e226b5a820
feat: Use TimestampNanosecondArray for timestamps in IOx ( #1230 )
...
* refactor: Create Arrow arrays using iterators
* feat: use Timestamp64(TimeUnit::Nanosecond) for timestamps
* feat: add support for timestamp array
* fix: update more tests
* fix: remove unecessary code
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-16 15:55:33 +00:00
Nga Tran
231ebb54d4
chore: fix a format
2021-04-14 16:32:25 -04:00
Nga Tran
4e2d59d9a5
feat: saimplement a few more functions as part of supporting query dfrom parquet files
2021-04-14 16:06:47 -04:00
Nga Tran
05bf28ce85
feat: Add 2 main functions table_schema and table_names for Parquet Chunk ato pay a foundation for querying it
2021-04-13 18:23:55 -04:00
Nga Tran
4a6d6bd7ad
feat: initial work for querying data from parquet file in object store
2021-04-13 13:57:46 -04:00
Raphael Taylor-Davies
1997324344
feat: mutable buffer snapshotting ( #1179 )
...
* feat: mutable buffer snapshotting
* chore: review feedback
2021-04-13 12:14:54 +00:00
Nga Tran
453aeaf1a0
feat: Add tests for writing RB chunks to Object Store
2021-04-09 17:39:23 -04:00
Nga Tran
f501a74aea
refactor: Address review comments
2021-04-07 21:28:03 -04:00
Nga Tran
be6e1e48e4
feat: add writer_id and object_store in Db
2021-04-07 18:36:07 -04:00
Raphael Taylor-Davies
c2355aca6d
feat: add basic memory tracking ( #1125 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-07 15:38:24 +00:00
Nga Tran
6e01fbc382
feat: ause TableSummary as metadata for parquet chunk's tables and read buffer's read_filter ot get data
2021-04-05 15:37:34 -04:00
Nga Tran
4bdf8963e6
feat: continue buidling foundation for writing RB chunks to parquet files
2021-04-02 16:06:25 -04:00
Nga Tran
49267114d3
chore: merge main into branch and resolve conflicts
2021-04-01 13:22:49 -04:00
Nga Tran
1463c6645f
feat: Add ChunkState::ObjectStore and rename ParquetChunk to Chunk
2021-04-01 11:53:03 -04:00
Nga Tran
19a453a483
feat: finally have some framework with clear todos for writing a chunk into parquet files
2021-03-31 16:21:53 -04:00
Nga Tran
cd409b471f
feat: continue the implementation
2021-03-30 21:31:51 -04:00
Nga Tran
0bcd52d5c9
feat: Add more changes
2021-03-30 18:31:09 -04:00