Paul Dix
d95b5df03e
refactor: move cache to ObjectStore
...
Since the consumers of ObjectStore always use the concrete type rather than the ObjectStoreApi trait, it makes more sense to just change the concrete type to have a pointer to the cache. This removes the cache from the ObjectStoreApi trait and changes the ObjectStore to be a regular struct rather than a tuple around the ObjectStoreIntegration. Future work will have the server configure the cache on the ObjectStore struct when its options are set.
2021-07-21 18:27:56 -04:00
Paul Dix
d0ea812041
feat: add skeleton for object store file cache
2021-07-21 18:27:56 -04:00
Marco Neumann
57a9d5ade0
refactor: correctly track "seen" ranges in persistence checkpoints
...
Now we can handle all these cases:
There are two partitions w/ a single write each:
1. A reads sequence number 1
2. B reads sequence number 2
3. we persist A which only knows the sequences up until 1
=> the DB checkpoint needs the global max, otherwise we forget sequences
during replay (2 in this case, so B would be gone)
1. B reads sequence number 1
2. A reads sequence number 2
3. we persist A which (w/o this commit) would not track the sequencer at
all in this checkpoint (since there is nothing to replay)
=> we MUST also remember that we already read up until 2, otherwise we'll
re-read 2 after replay
=> the partition checkpoint needs the local seen max (no matter if there's
something to to persist)
2021-07-21 19:19:49 +02:00
Marco Neumann
a5fc1c7d38
fix: collect min AND max in database checkpoints
...
This is required to correctly handle the following case:
1. There are two partitions A and B w/ a single write each (from the same
sequencer).
2. We persist A:
- The partition checkpoint for A will be empty because after persistence
there will be nothing to replay (the single write is persisted and
we're ready).
- The database checkpoint that contains the global minimum of all ranges
recognizes that for the sequencer there is indeed something left (the
minimum sequence number from B).
3. DB restart happens, replay starts
4. We scan all persisted files, figure out that we have a DB checkpoint
with a sequence minimum but (w/o the change in this commit) there is no
maximum. Only partition checkpoints contain maxima, and the only partition
checkpoint that was persisted was the one for partition A and that one was
empty (see above).
5. So now how do we recover partition B?
2021-07-21 14:48:29 +02:00
Andrew Lamb
4da8a16c18
chore: update to arrow 5.0 and master datafusion ( #2049 )
...
* chore: update to arrow 5.0 and master datafusion
* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Jake Goulding
42b56ad657
refactor: Use SNAFU's context instead of `ok_or_else`
2021-07-16 09:59:54 -04:00
Jake Goulding
939d15a21f
perf: Avoid clone when an error doesn't occur
2021-07-16 09:59:54 -04:00
Marco Neumann
f57ba6afdb
fix: use fixed-size timestamps for parquet metadata ( #2032 )
...
This fixes flaky tests that rely on predictable files sizes.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-16 13:14:02 +00:00
Andrew Lamb
0c86d1dccf
feat: Record parquet bytes size in catalog / parquet_file ( #2006 )
...
* feat: Store object store size in parquet_file
* fix: update TRANSACTION_VERSION to 8
* refactor: rename os_bytes --> file_size_bytes
2021-07-15 12:07:11 +00:00
Marco Neumann
40047a76bc
refactor: `remove_parquet` cannot fail
2021-07-15 12:07:56 +02:00
Raphael Taylor-Davies
1d00fa2fd8
refactor: track memory metrics in catalog ( #1995 )
...
* refactor: track memory metrics in catalog
* chore: update comment
2021-07-14 16:23:00 +00:00
Andrew Lamb
d35b74c226
fix: Fix doc build warnings ( #1945 )
...
* fix: Fix doc build warnings
* refactor: add deny bare_urls to crates
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 08:03:42 +00:00
Andrew Lamb
670826daf9
refactor: make object_store construction interface consistent ( #1944 )
...
* refactor: make object_store construction interface consistent
* fix: benchmarks
* fix: doc build
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-12 12:56:36 +00:00
Marco Neumann
18893e76e0
refactor: convert some table name and part. key String to Arcs
...
This has the (somewhat nice) side effect that it shrinks the in-mem
catalog a bit as well because nw `ParquetChunk` is a bit smaller making
the chunk stage enum smaller as well.
2021-07-08 14:34:28 +02:00
Marco Neumann
b528ac2b55
feat: store schemas per table
...
This way we can:
- check for schema matches even for writes going into different
partitions
- solve #1768 and #1884 in some future PR
Closes #1897 .
2021-07-08 09:18:09 +02:00
Andrew Lamb
e6d995cbd8
chore: Update to Rust 1.53.0 ( #1922 )
...
* chore: Update to Rust 1.53.0
* fix: Update to latest clippy standards
* fix: bad refactor
* fix: Update escaping
* test: update test output
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-07 18:02:03 +00:00
Marco Neumann
4ca2d3e148
chore: move persistence windows related code into own crate
...
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
2021-07-05 10:23:58 +02:00
Marco Neumann
d96e15c3f7
docs: explain why we store checkpoints in parquet files
2021-07-05 09:42:46 +02:00
Marco Neumann
cdab1bed05
feat: persist part+db checkpoint in parquets and catalog
...
This will be required for replay on server startup.
2021-07-05 09:42:46 +02:00
Jacob Marble
0779b0d9bd
feat: add gRPC listener for new write protocol ( #1842 )
...
* feat: add gRPC listener for new write protocol
* chore: clippy happy
* chore: lint
* chore: cargo fmt --all
* chore: cargo clippy
* chore: protobuf-lint
* chore: more formatting
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann
4204127b05
refactor: use protobuf for in-parquet metadata
2021-06-30 16:51:37 +02:00
Marco Neumann
ddc9cd49ca
chore: bump preserved catalog version
2021-06-29 14:23:06 +02:00
Marco Neumann
3ebb6a3037
refactor: do not capture txn-specific information in parquet files
...
This helps with #1821 .
2021-06-29 14:22:36 +02:00
kodiakhq[bot]
eda9532eb2
Merge branch 'main' into crepererum/issue1821-cleanup-lock
2021-06-29 10:48:43 +00:00
Marco Neumann
48df13de05
refactor: use parking lot for catalog cleanup
2021-06-29 12:47:29 +02:00
Marco Neumann
f824f235b4
fix: fix info log message
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-29 12:35:05 +02:00
Marco Neumann
778a611fb8
docs: add clarifying comment for rebuild test
2021-06-29 11:58:19 +02:00
Marco Neumann
17f89ea8d0
docs: fix comment about lock downgrade
2021-06-29 11:53:55 +02:00
Marco Neumann
2cd5ce98be
refactor: do not pass locks around for catalog cleanup
2021-06-29 10:21:41 +02:00
Marco Neumann
730a23faa3
refactor: improve locking around the parquet file cleanup
...
Instead of (ab)using the transaction lock to prevent the cleanup job
from removing just-written parquet files, use a dedicated lock. This
will later allow us to write parquet files before starting a transaction
(i.e. w/o holding the transaction lock).
This will help with #1821 .
2021-06-29 10:20:03 +02:00
Marco Neumann
6ec24353bf
refactor: only rebuild a single txn for pres. catalogs
...
Stop relying on in-parquet transaction information during catalog
rebuilds. This has some downsides (no fork detection, only a single
transaction hence no time travel) but will allow that we remove
transaction information from parquet files, so that we can finally move
the actual parquet file storage out of the transaction lock.
This will help with #1821 .
2021-06-28 15:10:44 +02:00
Andrew Lamb
0a03605bbc
refactor: pull Channel --> Stream adapater into its own module ( #1793 )
...
* refactor: pull Channel --> Stream adapater into its own module
* docs: Update query/src/exec/stream.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-06-24 10:35:45 +00:00
kodiakhq[bot]
59993e8b8f
Merge branch 'main' into crepererum/issue1623
2021-06-23 12:40:05 +00:00
Marco Neumann
c395409b51
feat: include UUIDv4 into parquet file names
...
Change schema from
```text
<server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet
```
to
```text
<server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet
```
So parquet files will NEVER be overwritten. This is especially helpful
when dealing with old catalog leftovers (i.e. a parquet file that
belonged to an old but wiped catalog). It also simplifies the reasoning
about file references in the future and follows what other dataset
formats are usually doing (i.e. never replace files).
Also use `ChunkAddr` where it makes sense.
2021-06-23 14:30:28 +02:00
kodiakhq[bot]
70817a474c
Merge branch 'main' into crepererum/issue1740-d
2021-06-23 12:29:54 +00:00
Raphael Taylor-Davies
5cd911c74a
fix: correct row count for object store chunks ( #1789 )
2021-06-23 12:06:49 +00:00
Marco Neumann
1636f47565
refactor: remove dead code
2021-06-23 10:51:22 +02:00
Marco Neumann
cf55df68b5
refactor: remove some `Arc`s around the in-mem catalog
...
This is for #1740 .
2021-06-23 10:51:22 +02:00
Marco Neumann
e36b6f9c7a
docs: fix intra-doc link
2021-06-23 10:25:05 +02:00
Marco Neumann
67508094b4
fix: double ref
...
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2021-06-23 10:25:05 +02:00
Marco Neumann
d2be641864
refactor: make checkpointing easier to use
...
Don't mix commit+checkpoint in a single call so that the caller has to
reason about the error type and which of the two operations has failed.
Splitting it also makes it easier to create the correct checkpoint data.
2021-06-23 10:25:05 +02:00
Marco Neumann
4a961694ec
refactor: make caller sync mem<>OS view during catalog transactions
...
This is for #1740 . Greatly simplifies the integration of the persisted
catalog into the DB.
2021-06-23 10:25:05 +02:00
Marco Neumann
d1db0dfaeb
refactor: remove type parameter from preserved catalog
...
For #1740 .
2021-06-22 10:53:10 +02:00
Marco Neumann
ff60627500
refactor: make preserved catalog NOT own the in-mem catalog
...
Works towards #1740 .
2021-06-21 18:39:43 +02:00
Marco Neumann
881729bd23
refactor: make caller responsible to create checkpoint data
...
This decouples the in-mem and preserved catalog a bit and works
towards #1740 .
2021-06-21 18:33:23 +02:00
Marco Neumann
aba973a6e1
refactor: make catalog `wipe` a freestanding function
...
It does not interact with the `CatalogState` so users can call this
function without that type.
2021-06-21 09:31:23 +02:00
Andrew Lamb
258a6b1956
chore: remove more dead code ( #1760 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 21:28:22 +00:00
Andrew Lamb
de67bd3efe
refactor: Remove PartitionChunk::table_schema ( #1756 )
...
* refactor: Remove PartitionChunk::table_schema
* docs: update comments
2021-06-18 16:13:16 +00:00
Raphael Taylor-Davies
f6dbc8d6f2
refactor: add ChunkAddr to describe location of chunk in catalog ( #1745 )
...
* refactor: add ChunkPath to describe location of chunk in catalog
* refactor: rename ChunkPath to ChunkAddr
* chore: further renames
* chore: even more renames
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-17 12:04:37 +00:00
Marco Neumann
e056d97cf6
test: always test transaction aborts
2021-06-16 11:01:14 +02:00
Marco Neumann
caaf95c6ec
refactor: remove lock from `TestCatalogState`
2021-06-16 10:51:15 +02:00
Marco Neumann
c8c412f6fe
refactor: rework catalog state interface
...
This now allows not only for copy-based transaction handling but also
for eager exec and rollbacks. This will be useful to properly implement
transaction aborts for the "real" catalog.
2021-06-16 10:51:15 +02:00
Marco Neumann
e064a6bbba
test: add test suite for `CatalogState` impls
...
This makes it easier to check if `CatalogState` correctly implement all
features, including transaction aborting.
2021-06-16 10:50:47 +02:00
Andrew Lamb
b756e09904
refactor: Rename parquet_file::Chunk --> ParquetChunk ( #1722 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 11:21:49 +00:00
Marco Neumann
64c815dd50
fix: bump catalog version ( #1726 )
...
This should have been done in #1714 . Also add a note so that future devs
might hopefully not forget. In any case though the code also works w/o
this bump, it's just that the error message is a bit less nice ("cannot
parse IOxMetadata" instead of "unsupported catalog version").
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 10:26:30 +00:00
Marco Neumann
55fc5e564b
refactor: remove serverID and DB name args from catalog state
...
They are no longer required.
2021-06-15 09:35:41 +02:00
Marco Neumann
776b6c011c
feat: remove path parsing functionality
...
Paths to parquet files are an implementation detail and should not be
parsed.
Closes #1506 .
2021-06-14 16:24:50 +02:00
Marco Neumann
250ccdcdcd
refactor: use `IOxMetadata` instead of path parsing for parquet chunks
2021-06-14 16:24:50 +02:00
Marco Neumann
d51e7a127c
feat: include table name, partition key, and chunk ID in `IoxMetadata`
2021-06-14 16:24:50 +02:00
kodiakhq[bot]
b57f397057
Merge branch 'main' into crepererum/checkpoint_during_restore
2021-06-14 13:54:03 +00:00
Marco Neumann
0a7dcc3779
test: adjust read-write parquet test to newest test data
2021-06-14 14:24:24 +02:00
Marco Neumann
d6f6ddfdaa
fix: fix NULL handling in parquet stats
2021-06-14 14:24:09 +02:00
Marco Neumann
eae56630fb
test: add test for all-NULL float column metadata
2021-06-14 13:48:34 +02:00
Marco Neumann
3f9bcf7cd9
fix: fix NaN handling in parquet stats
2021-06-14 13:44:52 +02:00
Marco Neumann
ea96210e98
test: enable unblocked test
2021-06-14 13:44:52 +02:00
Marco Neumann
518f7c6f15
refactor: wrap upstream parquet MD into struct + clean up interface
...
This prevents users from `parquet_file::metadata` to also depend on
`parquet` directly. Furthermore they don't need to important dozend of
functions and can instead just use `IoxParquetMetaData` directly.
2021-06-14 13:17:01 +02:00
Marco Neumann
030d0d2b9a
feat: create checkpoint during catalog rebuild
2021-06-14 10:55:56 +02:00
Marco Neumann
df866f72e0
refactor: store parquet metadata in chunk
...
This will be useful for #1381 .
At the moment we parse schema and stats eagerly and store them alongside
the parquet metadata in memory. Technically this is not required since
this is basically duplicate data. In the future we might trade-off some
of this memory against CPU consumption by parsing schema and stats on
demand.
2021-06-14 10:08:31 +02:00
Marco Neumann
e6699ff15a
test: ensure that `find_last_transaction_timestamp` considers checkpoints
2021-06-14 10:04:50 +02:00
Marco Neumann
f8a518bbed
refactor: inline `Table` into `parquet_file::chunk::Chunk`
...
Note that the resulting size estimations are different because we were
double-counting `Table`. `mem::size_of::<Self>()` is recursive for
non-boxed types since the child will be part of the parent structure.
Issue: #1295 .
2021-06-11 11:54:31 +02:00
Marco Neumann
28d1dc4da1
chore: bump preserved catalog version
2021-06-10 16:01:13 +02:00
Marco Neumann
80ee36cd1a
refactor: slightly streamline path parsing code in pres. catalog
2021-06-10 15:59:28 +02:00
Marco Neumann
7e7332c9ce
refactor: make comparison a bit less confusing
2021-06-10 15:42:21 +02:00
Marco Neumann
fd581e2ec9
docs: fix confusion wording in `CatalogState::files`
2021-06-10 15:42:21 +02:00
Marco Neumann
be9b3a4853
fix: protobuf lint fixes
2021-06-10 15:42:21 +02:00
Marco Neumann
294c304491
feat: impl catalog checkpointing infrastructure
...
This implements a way to add checkpoints to the preserved catalog and
speed up replay.
Note: This leaves the "hook it up into the actual DB" for a future PR.
Issue: #1381 .
2021-06-10 15:42:21 +02:00
Marco Neumann
188cacec54
refactor: use `Arc` to pass `ParquetFileMetaData`
...
This will be handy when the catalog state must be able to return
metadata objects so that we can create checkpoints, esp. when we use
multi-chunk parquet files in some midterm future.
2021-06-10 15:42:21 +02:00
Marco Neumann
c7412740e4
refactor: prepare to read and write multiple file types for catalog
...
Prepares #1381 .
2021-06-10 15:42:21 +02:00
Marco Neumann
33e364ed78
feat: add encoding info to transaction protobuf
...
This should help with #1381 .
2021-06-10 15:42:21 +02:00
Marco Neumann
4fe2d7af9c
chore: enforce `clippy::future_not_send` for `parquet_file`
2021-06-09 18:18:27 +02:00
Andrew Lamb
ab0aed0f2e
refactor: Remove a layer of channels in parquet read stream ( #1648 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 16:47:04 +00:00
Raphael Taylor-Davies
1e7ef193a6
refactor: use field metadata to store influx types ( #1642 )
...
* refactor: use field metadata to store influx types
make SchemaBuilder non-consuming
* chore: remove unused variants
* chore: fix lints
2021-06-07 13:26:39 +00:00
Marco Neumann
c830542464
feat: add info log when cleanup limit is reached
2021-06-04 11:12:29 +02:00
Marco Neumann
91df8a30e7
feat: limit number of files during storage cleanup
...
Since the number of parquet files can potentially be unbound (aka very
very large) and we do not want to hold the transaction lock for too
long and also want to limit memory consumption of the cleanup routine,
let's limit the number of files that we collect for cleanup.
2021-06-03 17:43:11 +02:00
Marco Neumann
85139abbbb
fix: use structured logging for cleanup logs
2021-06-03 11:23:29 +02:00
Andrew Lamb
32c6ed1f34
refactor: More cleanup related to multi-table chunks ( #1604 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-02 17:00:23 +00:00
Marco Neumann
e5b65e10ac
test: ensure that `find_last_transaction_timestamp` indeed returns the last timestamp
2021-06-02 10:15:06 +02:00
Marco Neumann
98e413d5a9
fix: do not unwrap broken timestamps in serialized catalog
2021-06-02 10:15:06 +02:00
Marco Neumann
fc0a74920f
fix: use clearer error text
2021-06-02 09:41:19 +02:00
Marco Neumann
2a0b2698c6
fix: use structured logging
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-02 09:41:19 +02:00
Marco Neumann
64bf8c5182
docs: add code comment explaining why we parse transaction timestamps
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-02 09:41:19 +02:00
Marco Neumann
77aeb5ca5d
refactor: use protobuf-native Timestamp instead of string
2021-06-02 09:41:19 +02:00
Marco Neumann
9b9400803b
refactor!: bump transaction version to 2
2021-06-02 09:41:19 +02:00
Marco Neumann
5f77b7b92b
feat: add `parquet_file::catalog::find_last_transaction_timestamp`
2021-06-02 09:41:19 +02:00
Marco Neumann
9aee961e2a
test: test loading catalogs from broken protobufs
2021-06-02 09:41:19 +02:00
Marco Neumann
0a625b50e6
feat: store transaction timestamp in preserved catalog
2021-06-02 09:41:19 +02:00
Andrew Lamb
d8fbb7b410
refactor: Remove last vestiges of multi-table chunks from PartitionChunk API ( #1588 )
...
* refactor: Remove last vestiges of multi-table chunks from PartitionChunk API
* fix: remove test that can no longer fail
* fix: update tests + code review comments
* fix: clippy
* fix: clippy
* fix: restore test_measurement_fields_error test
2021-06-01 16:12:33 +00:00
Andrew Lamb
d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files ( #1580 )
...
* refactor: use ParquetExec to read parquet files
* fix: test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb
64328dcf1c
feat: cache schema on catalog chunks too ( #1575 )
2021-06-01 12:42:46 +00:00
Andrew Lamb
00e735ef0d
chore: remove unused dependencies ( #1583 )
2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies
db432de137
feat: add distinct count to StatValues ( #1568 )
2021-05-28 17:41:34 +00:00
kodiakhq[bot]
6098c7cd00
Merge branch 'main' into crepererum/issue1376
2021-05-28 07:13:15 +00:00
Andrew Lamb
f3bec93ef1
feat: Cache TableSummary in Catalog rather than computing it on demand ( #1569 )
...
* feat: Cache `TableSummary` in catalog Chunks
* refactor: use consistent table summary
2021-05-27 16:03:05 +00:00
Marco Neumann
dd2a976907
feat: add a flag to ignore metadata errors during catalog rebuild
2021-05-27 13:10:14 +02:00
Marco Neumann
bc7389dc38
fix: fix typo
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-27 12:51:01 +02:00
Marco Neumann
48307e4ab2
docs: adjust error description to reflect internal errors
...
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-05-27 12:51:01 +02:00
Marco Neumann
d6f0dc7059
feat: implement catalog rebuilding from files
...
Closes #1376 .
2021-05-27 12:51:01 +02:00
Marco Neumann
024323912a
docs: explain what `PreservedCatalog::wipe` offers
2021-05-27 12:48:41 +02:00
Raphael Taylor-Davies
4fcc04e6c9
chore: enable arrow prettyprint feature ( #1566 )
2021-05-27 10:28:14 +00:00
Marco Neumann
9f451423d5
feat: log files that are deleted
2021-05-26 12:49:44 +02:00
Marco Neumann
24ec1a472e
fix: do NOT delete parquet files that are reachable by time travel
2021-05-26 12:38:54 +02:00
Marco Neumann
5983336366
refactor: rename `parquet_file::{utils => test_utils}`
2021-05-26 11:09:29 +02:00
Marco Neumann
d7e3bc569e
refactor: shorten time we hold the transaction lock during clean-up
2021-05-26 11:04:57 +02:00
Marco Neumann
18f5dd9ae1
test: ensure transaction lock exists during cleanup planning
2021-05-26 11:04:57 +02:00
Marco Neumann
b55eae98da
fix: do not delete non-parquet files during catalog-driven cleanup
2021-05-26 11:04:57 +02:00
Marco Neumann
5ed16ff294
refactor: improve error message in `parquet_file::cleanup`
2021-05-26 11:04:57 +02:00
Marco Neumann
14fdf3b7c7
feat: implement object store cleanup core routine
2021-05-26 11:02:40 +02:00
Marco Neumann
cc78b5317d
feat: add method to get all parquet files from catalog state
2021-05-26 11:02:40 +02:00
Marco Neumann
953114af2e
feat: add method to abort catalog transaction
2021-05-26 11:02:40 +02:00
Marco Neumann
92fcd7e940
feat: add a way to get OS, server ID and DB name from catalog
2021-05-26 11:02:40 +02:00
Marco Neumann
9daa4d00d6
test: re-organize `parquet_file` test utils a bit
2021-05-26 11:02:39 +02:00
Marco Neumann
38183928c8
refactor: extract path generator for data location
2021-05-26 10:59:40 +02:00
Marco Neumann
19a2733d30
feat: preserve transaction metadata in parquets
2021-05-25 09:56:12 +02:00
Marco Neumann
fe8e6301fe
refactor: move `read_schema_from_parquet_metadata` back to `parquet_file::metadata`
...
Let us pool all metadata handling in a single module, which makes it
easier to review.
2021-05-25 09:37:53 +02:00
Marco Neumann
ac83d99f66
feat: add a way to get current revision and UUID from transaction handle
2021-05-25 09:37:53 +02:00
Marco Neumann
fdc553b257
refactor: replace unwrap with expect
2021-05-25 09:37:53 +02:00
Andrew Lamb
c464ffadad
refactor: remove special case timestamp_range in parquet chunk ( #1543 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-24 16:19:44 +00:00
Andrew Lamb
14ba25f86d
chore: Update datafusion and use released version of arrow crates ( #1546 )
...
* chore: Update datafusion and use released version of arrow crate
* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Andrew Lamb
27e5b8fabf
refactor: Remove multiple table support from Parquet Chunk ( #1541 )
2021-05-24 08:40:31 -04:00
Marco Neumann
8bdddfd475
docs: mention that catalog wiping does not delete parquet files
2021-05-20 10:22:20 +02:00
Marco Neumann
b1a06246d6
feat: implement function to wipe a preserved catalog
2021-05-20 10:22:20 +02:00
Marco Neumann
6c405aa6f9
feat: check if preserved catalog exists when creating an empty one
2021-05-20 10:22:20 +02:00
Marco Neumann
c6a6005f65
feat: add `PreservedCatalog.exists`
2021-05-20 10:22:20 +02:00
Raphael Taylor-Davies
37880ee89a
refactor: store chunk IDs only in catalog ( #1521 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-20 04:07:14 +00:00
Marco Neumann
8db26485a4
refactor: empty transaction during catalog creation
...
That involves some refactoring which we are going to need anyway for
hooking up the "read" path of the catalog into the DB startup, namely:
- make `Db::new` require a preserved catalog
- introduce a helper function that can provide that
- as a consequence, all test-creations of a Db are now async
This prepares for #1382 .
2021-05-18 17:42:07 +02:00
Marco Neumann
cdf0ada6a6
test: test preserved catalog <-> Db write wiring
2021-05-17 13:57:31 +02:00
Marco Neumann
68729dd5ee
refactor: avoid string allocation
2021-05-17 12:32:34 +02:00
Marco Neumann
adcd8132e7
docs: more comments regarding catalog transaction handling
2021-05-17 12:05:08 +02:00
Marco Neumann
a99d53e771
docs: document `OpenTransaction::handle_action*`
2021-05-17 11:48:51 +02:00
Marco Neumann
4fb800c7a6
refactor: make PreservedCatalog easier to integrate
2021-05-17 11:33:22 +02:00
Marco Neumann
f4d7154746
fix: table summaries must include timestamp as well
2021-05-17 11:33:22 +02:00
Marco Neumann
7cced3242f
feat: add a way to parse infos from parquet paths
2021-05-17 11:33:22 +02:00
Marco Neumann
5969caccb0
feat: return parquet metadata from `write_to_object_store`
2021-05-17 11:33:22 +02:00
Raphael Taylor-Davies
f9178dbb5f
feat: push metrics into catalog ( #1488 )
...
* feat: push metrics into catalog
* chore: minor cleanup
* fix: include db labels in chunk metric domains
* chore: fmt
* fix: don't allow dropping moving chunks
* chore: further tweaks
* chore: review feedback
* feat: use new_unregistered() for metric instruments instead of default
* chore: use &[KeyValue] instead of &Vec<KeyValue>
* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
Nga Tran
9583636748
feat: we now can read parquet files form all kind of object stores
2021-05-12 18:05:34 -04:00
Marco Neumann
795f5bfcb7
refactor: make `StatValues::{min,max}` optional + handle NaNs
...
This will allow us to:
- handle all-NULL columns correctly
- be in-line with Parquet (where min/max are optional)
- handle NaNs at least somewhat sane (they do not "poison" stats
anymore)
2021-05-10 17:12:25 +02:00
Nga Tran
c6b933eb63
chore: merge main to branch
2021-05-07 18:40:17 -04:00
Nga Tran
f2c19ec080
refactor: further address Carol's comment
2021-05-07 17:40:40 -04:00
Nga Tran
971500681f
refactor: address Andrew's and Carol's comment
2021-05-07 17:33:19 -04:00
Carol (Nichols || Goulding)
e2cc4634bf
fix: Use PathBuf rather than debug formatting and back to String
...
This is the same fix I made in 54c5f98
, just found a few more spots :)
2021-05-07 15:58:11 -04:00
Nga Tran
31d49db0ed
chore: a litlle more cleanup
2021-05-07 09:38:41 -04:00
Nga Tran
ba015ee4df
refactor: clean up and add comments
2021-05-07 09:31:41 -04:00
Marco Neumann
1a998d4116
feat: preserve parquet metadata in catalog
...
Closes #1380 .
2021-05-07 09:51:44 +02:00
Marco Neumann
c3d523fc4f
refactor: add col prefixes to make_chunk & Co
2021-05-07 09:51:44 +02:00
Marco Neumann
5db504300d
refactor: use parsed paths instead of raw strings for catalog paths
2021-05-07 09:51:44 +02:00
Nga Tran
55bf848bd2
feat: Now we can query directly from files in object store
2021-05-06 18:02:17 -04:00
Andrew Lamb
884baf7329
feat: add column_type and influxdb_column_type, remove row_count from system.columns ( #1415 )
...
* feat: add column_type and influxdb_column_type, remove row_count from system.columns
* fix: update tests
* fix: more test update
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fmt
* fix: copy/paste type conversion to avoid cross dependency between data_types and internal_types
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-05-06 12:59:30 +00:00
Andrew Lamb
86771ea629
chore: update arrow/datafusion deps ( #1433 )
...
* chore: update datafusion deps
* chore: update arrow deps
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-05 22:37:31 +00:00
Nga Tran
a5c92fae8a
chore: merge main to branch
2021-05-05 13:48:42 -04:00
Nga Tran
3bdb451529
chore: merge main to branch
2021-05-05 13:18:39 -04:00
Raphael Taylor-Davies
411cf134e9
refactor: explode arrow_deps ( #1425 )
...
* refactor: explode arrow_deps
* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Nga Tran
2b46f51e5b
chore: address Dom's comment
2021-05-05 12:55:41 -04:00
Nga Tran
a1f3413c89
refactor: move private test helpers to utils module to be used by many modules
2021-05-05 11:41:46 -04:00
Nga Tran
fcb37a0b1d
feat: more testing scenarios for quering parquet files
2021-05-05 10:57:02 -04:00
Marco Neumann
1f42eb89cd
feat: implement parquet metadata handling
...
Closes #1379 and contributes to #1380 .
2021-05-05 13:29:16 +02:00
Marco Neumann
056c29aaa2
feat: add a way to retrieve timestamp range from parquet chunk
2021-05-05 13:29:16 +02:00
Marco Neumann
c54109113e
feat: add a way to retrieve storage path from parquet chunks
2021-05-05 13:29:16 +02:00
Marco Neumann
136c35cb88
feat: implement transaction handling for catalog
...
Closes #1253 .
2021-05-03 10:04:35 +02:00
Nga Tran
34a3388a49
feat: unload chunks from read buffer but keep them in object store
2021-04-30 16:12:02 -04:00
Nga Tran
e87973babe
refactor: address review comments
2021-04-29 13:15:43 -04:00
Nga Tran
402d9c748c
chore: cargo fmt
2021-04-28 16:52:52 -04:00
Nga Tran
2a2760bd18
feat: complete tests where data in both RUB and OS
2021-04-28 16:14:07 -04:00
Nga Tran
140d96dbea
feat: tests ffor loading data to object store and make sure twe still query read buffer
2021-04-28 15:59:17 -04:00
Marco Neumann
eddc9319ff
docs: deny broken intradoc links
2021-04-27 13:22:28 +02:00
Carol (Nichols || Goulding)
272cdb85ce
fix: Use the ServerId type everywhere, for writing, querying, anything
2021-04-26 18:44:32 +00:00
Carol (Nichols || Goulding)
b8face3335
refactor: Organize use statements
2021-04-26 18:44:32 +00:00
Jake Goulding
67f5ad841d
refactor: Introduce ServerId and CurrentServerId types
2021-04-26 18:44:32 +00:00
Nga Tran
657bfa1b20
refactor: address Andrew's comments
2021-04-16 17:44:46 -04:00
Nga Tran
b3e110a241
refactor: address Jake's comment
2021-04-16 17:27:40 -04:00
Nga Tran
4c23ca8888
feat: full implementation of parquet's read_filter for review
2021-04-16 16:03:24 -04:00
Andrew Lamb
e226b5a820
feat: Use TimestampNanosecondArray for timestamps in IOx ( #1230 )
...
* refactor: Create Arrow arrays using iterators
* feat: use Timestamp64(TimeUnit::Nanosecond) for timestamps
* feat: add support for timestamp array
* fix: update more tests
* fix: remove unecessary code
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-16 15:55:33 +00:00
Nga Tran
231ebb54d4
chore: fix a format
2021-04-14 16:32:25 -04:00
Nga Tran
4e2d59d9a5
feat: saimplement a few more functions as part of supporting query dfrom parquet files
2021-04-14 16:06:47 -04:00
Nga Tran
05bf28ce85
feat: Add 2 main functions table_schema and table_names for Parquet Chunk ato pay a foundation for querying it
2021-04-13 18:23:55 -04:00
Nga Tran
4a6d6bd7ad
feat: initial work for querying data from parquet file in object store
2021-04-13 13:57:46 -04:00
Raphael Taylor-Davies
1997324344
feat: mutable buffer snapshotting ( #1179 )
...
* feat: mutable buffer snapshotting
* chore: review feedback
2021-04-13 12:14:54 +00:00
Nga Tran
453aeaf1a0
feat: Add tests for writing RB chunks to Object Store
2021-04-09 17:39:23 -04:00
Nga Tran
f501a74aea
refactor: Address review comments
2021-04-07 21:28:03 -04:00
Nga Tran
be6e1e48e4
feat: add writer_id and object_store in Db
2021-04-07 18:36:07 -04:00
Raphael Taylor-Davies
c2355aca6d
feat: add basic memory tracking ( #1125 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-07 15:38:24 +00:00
Nga Tran
6e01fbc382
feat: ause TableSummary as metadata for parquet chunk's tables and read buffer's read_filter ot get data
2021-04-05 15:37:34 -04:00
Nga Tran
4bdf8963e6
feat: continue buidling foundation for writing RB chunks to parquet files
2021-04-02 16:06:25 -04:00
Nga Tran
49267114d3
chore: merge main into branch and resolve conflicts
2021-04-01 13:22:49 -04:00
Nga Tran
1463c6645f
feat: Add ChunkState::ObjectStore and rename ParquetChunk to Chunk
2021-04-01 11:53:03 -04:00
Nga Tran
19a453a483
feat: finally have some framework with clear todos for writing a chunk into parquet files
2021-03-31 16:21:53 -04:00
Nga Tran
cd409b471f
feat: continue the implementation
2021-03-30 21:31:51 -04:00
Nga Tran
0bcd52d5c9
feat: Add more changes
2021-03-30 18:31:09 -04:00