Commit Graph

190 Commits (42d4ad61e1aaad2f903cb1e75b451d724d84a5b7)

Author SHA1 Message Date
Carol (Nichols || Goulding) d0707725cf Merge remote-tracking branch 'origin/main' into pd-mutable-buffer-data-eviction 2021-02-22 10:21:59 -05:00
Edd Robinson 92eb8b9e85 refactor: make certain Database method sync
A couple of methods don't seem to have any await points in their
implementations, so it feels like they could just be `sync`.
2021-02-19 17:14:17 +00:00
Andrew Lamb 9b91e0624c
feat: implement field_columns plan (#819)
* feat: implement field_columns plan

* fix: fix doc tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-02-17 20:43:24 +00:00
Andrew Lamb 94a93e56ff
feat: implement `tag_keys` in gRPC planner and across mutable buffer (#795)
* feat: move tag_column_names into rpc planner

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: compile error

* refactor: remove PassThrough error type

* fix: Avoid extra layers of errors in mutable buffer chunk

* fix: use HashMap::get rather than values() and find

* fix: push filtering down to chunk in gRPC planner

* fix: fixup trait bounds to be non-silly

* fix: remove incorrect comment

* fix: remove cruft

* fix: clippy + fmt

* fix: correct comment

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-02-15 16:47:52 +00:00
Edd Robinson 8a85158a98 refactor: add arc clone lint 2021-02-15 12:40:19 +00:00
Paul Dix dc465e5d02 feat: Add function to check db size and drop partitions
Adds functionality to the server Db to check the mutable buffer size and drop partitions based on the database rules.
2021-02-13 17:19:40 -06:00
Paul Dix 83bfa6d949 feat: Add created_at, last_write_at tracking to partition and sorting
This commit adds created_at and last_write_at instants to partitions in the mutable buffer. It adds a method on the mutable buffer database to get back the partitions in sorted order based on either the created_at or last_write_at instants. Ordering based on the summary stats from a column are still left to do.

Finally, it modifies the helper function to create replicated write to take a Partitioner trait that can generate partition keys based on lines, rather than taking the DatabaseRules struct directly. This makes it easier to write test cases where data is split into multiple partitions in the mutable buffer.
2021-02-13 17:19:40 -06:00
Marko Mikulicic 9e39e91139 chore: Cleaning things in prep for rust 2021
Also remove a NUL byte in a test string literal; some editors drop them.
2021-02-12 16:48:17 +00:00
Andrew Lamb a316b16960
feat: Change table_names to return either Some(set) or None, rather than a plan (try 2) (#776)
* feat: Change table_names to return either Some(set) or None, rather than a plan

* docs: improve comments

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: merge conflict

* fix: don't clone a string unless needed

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-02-09 12:20:59 -05:00
Paul Dix e5da2ab589
feat: add ability to roll up summaries from multiple chunks (#763) 2021-02-08 18:11:21 -05:00
Paul Dix 47bc28460e
refactor: rename partition, table, and column in parition_meta for clarity (#757)
* refactor: rename partition, table, and column in parition_meta for clarity
2021-02-05 08:00:22 -05:00
Paul Dix de7bc7d645
feat: add column name to the partition metadata summaries (#755) 2021-02-05 07:20:16 -05:00
Andrew Lamb 3ec483b769
refactor: Reduce async in mutable buffer, use std::sync (#749)
* refactor: Reduce async in mutable buffer, use std::sync

* fix: logical confict with new code
2021-02-05 06:47:40 -05:00
Carol (Nichols || Goulding) fbf776c6b3
chore: Clean up Cargo.tomls (#754)
* fix: test_helpers crate should only be a dev-dep

* fix: object_store no longer has a build script, so no longer needs a build dep

* chore: Alphabetize all Cargo.tomls
2021-02-04 18:56:02 -05:00
Paul Dix 5c3661dd91 chore: refactor how columns are kept in the table in the mutable buffer
This one is a bit of a yak shave in advance of adding column names to the summary statistics. I needed the column and its name (or identifier) to be together, rather than the id to index map that existed before. I think the table_id and column_id stuff should be refactored out over time since they add a ton of complexity to the code and don't add much value. Having those as Strings would be much easier and probably be a drop in the bucket for memory usage. Basically, I don't think they need to be interned. But that would be an even more massive refactor touching so many things in the MutableBuffer, I leave it as a later exercise.

Hopefully this makes the code simpler and cleaner in the interim and it gives me the column_id with the column so that I can easily look up the name when generating the summary statistics for a chunk.
2021-02-04 16:31:55 -05:00
Paul Dix 1f8043a3f8 feat: add approximate memory size tracking to mutable buffer
This updates the mutable buffer, partitions, chunks, dictionary, tables, and individual columns to be able to return their approximate memory size used. This doesn't aim to be exact. There are spots where I'm not counting table or column pointers or the partition key. My expectation is that the data size will dominate and a few pointers here and there won't matter.
2021-02-04 13:50:43 -05:00
Andrew Lamb d66eae1a44
feat: Implement TableProvider for Trait for `Db` (#730)
* feat: Implement TableProvider for Db

Gets us selection pushdown in plans, sets us up for predicate pushdown

Includes: SendableRecordBatchStreams for mutable buffer and read buffer results

fixup snapshots

* docs: comments
2021-02-03 14:18:47 -05:00
Andrew Lamb abc26a33c1
chore: Update dependencies (again) (#718)
* chore: Update dependencies (again)

* refactor: update for changes in DataFusion API

* fix: fmt

* fix: clippy
2021-02-02 18:33:01 -05:00
Andrew Lamb 288861e646
feat: implement table_schema in partition chunk, mutable buffer, read buffer (#705)
fix: sort output schema by name

fix: Update data_types/src/schema.rs

Co-authored-by: Edd Robinson <me@edd.io>

refactor: Update read_buffer/src/lib.rs

Co-authored-by: Edd Robinson <me@edd.io>

Co-authored-by: Edd Robinson <me@edd.io>
2021-02-01 13:54:58 -05:00
Andrew Lamb f3bd8bd0e3
chore: update deps (tokio 1.0 and ecosystem) (#707)
* chore: Update arrow + tokio deps

* chore: Use bleeding edge azure

* chore: Update aws + other deps

* fix: fmt

* fix: Switch to in-house version of routerify

* fix: Upgrade to hyper 0.14

The hyper::error module is now private; hyper::Error is the public
re-export

* fix: Upgrade cloud storage to get tokio upgrade

* fix: Upgrade open_telemetry

* fix: Do not call `panic::set_hook` during another panic

Doing so leads to a double panic which aborts the process.

* fix: new h2 error who dis

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
2021-01-29 16:11:55 -05:00
Andrew Lamb 2282a68e65
refactor: Move selection to the data_types crate and remove redundant implemenation (#704) 2021-01-29 13:35:07 -05:00
Andrew Lamb efb1e0f8ae
feat: Add selection interface to mutable buffer and query interface (#700)
* feat: Add selection interface to mutable buffer and query interface

* docs: Update mutable_buffer/src/table.rs

* refactor: rename for consistency

* refactor: use map and filter_map  rather than fold
2021-01-27 14:31:10 -05:00
Andrew Lamb 504ca67532
test: revamp rpc query testing so it works in multiple chunk scenarios (#696)
* test: revamp testing so it works in multiple scenarios, fix bug found by same

* fix: Update docs in server/src/db.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: use tsp rather than different functions

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-01-25 16:34:19 -05:00
Andrew Lamb c3b0371c84
feat: Initial RPC Query Frontend (#692)
* feat: Initial RPC Query Frontend

* docs: s/immutable buffer/mutable buffer

* docs: Correct type in docstring
2021-01-25 08:33:39 -05:00
Andrew Lamb 75b0a62fa5
refactor: Delete remove dead code (#686) 2021-01-21 19:20:39 -05:00
Andrew Lamb 747b96d801
chore: Upgrade arrow dependencies, reduce duplication with upstream (#676) 2021-01-21 08:58:11 -05:00
Andrew Lamb 7969808f09
feat: Chunk Migration APIs and query data in the read buffer via SQL (#668)
* feat: Chunk Migration APIs and query data in the read buffer via SQL

* fix: Make code more consistent

* fix: fmt / clippy

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: Remove unecessary Result and make chunks() infallable

* chore: Apply more suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Edd Robinson <me@edd.io>
2021-01-19 13:28:26 -05:00
Andrew Lamb 71627120b9
refactor: consolidate line protocol schema creation into data_types and port code to use it (#663)
* refactor: consolidate line protocol schema creation into data_types, and port code to use it

refactor: Port mutable buffer to use SchemaBuilder

* fix: doctest

* refactor: remove unecessary clippyisms

* docs: Improve comments via suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>

* refactor: use more idomatic try_ naming and TryInto trait

* docs: Change from line protocol data model to InfluxDB data model

* refactor: rename LP --> Influx in code

* feat: add support for UInteger type

Co-authored-by: Edd Robinson <me@edd.io>
2021-01-15 17:29:30 -05:00
Hu Ming 99605b27d7
chore: rename (#660) 2021-01-14 12:49:03 -05:00
Andrew Lamb a5240af080
docs: Document desired crate dependencies in comments (#638)
* docs: Document the desire for read buffer and mutable buffer to be independent of query layer

* docs: Document desire for the query layer to not depend on storage systems

* fix: Apply suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>

Co-authored-by: Edd Robinson <me@edd.io>
2021-01-12 17:49:03 -05:00
Andrew Lamb 6376891da3
feat: implement query planning in terms of chunks (#647) 2021-01-12 16:04:45 -05:00
Andrew Lamb 2938c8f8fc
feat: implement chunk listing and snapshotting in mutable buffer (#641)
* feat: implement chunk listing and snapshotting in mutable buffer

* fix: update to use latest version of string interner and remove custom clone

* docs: fix comment
2021-01-12 12:46:18 -05:00
Andrew Lamb fd28d8a01b
refactor: Use u32 for Chunk ids consistently (#639) 2021-01-11 16:07:22 -05:00
Andrew Lamb a4be6f74c7
refactor: Remove partition key from the Chunk trait (#622) 2021-01-08 06:11:07 -05:00
Andrew Lamb c672bb341d
feat: Extract SQL planning out of databases (#618) 2021-01-07 13:13:30 -05:00
Andrew Lamb 654b520005
feat: Interface for writing and querying mutable buffer, read buffer and parquet (#615)
* refactor: Create database with mutable buffer, read buffer and parquet files

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: rename planners to clarify what they are

* refactor: simplify traits

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-01-06 17:25:46 -05:00
Andrew Lamb 0de73ec341
refactor: consistently name open/closed chunks in mutable_buffer (#612)
* refactor: consistently name open/closed chunks in mutable_buffer

* docs: update some comments

* docs: more comment tweaks

* fix: fix test
2021-01-04 11:37:42 -05:00
Andrew Lamb 08d52ea043
feat: implement partition chunk rollover + ids and timestamps (#601)
* feat: implement partition chunk rollover + ids and timestamps

* feat: add last_write_timestamp

* refactor: Use DateTime<Utc> rather than Instant

* refactor: avoid use of structure to generate ids
2020-12-29 11:00:18 -05:00
Andrew Lamb 5fa77c32cc
feat: Add "Chunks" to the Mutable Buffer (#596)
* refactor: Update docs, remove unused field

* refactor: rename partition -> chunk

* feat: Introduce new partition, which is a holder for Chunks

* refactor: Remove use of wal from mutable database

* refactor: cleanups, remove last direct use of chunks

* fix: delete old benchmarks

* fix: clippy sacrifice

* docs: tidy up comments

* refactor: remove unused error types

* chore: remove commented out tests
2020-12-28 07:10:25 -05:00
Andrew Lamb 48c43b136c
refactor: rename write_buffer --> mutable_buffer (#595)
* refactor: git mv write_buffer mutable_buffer

* refactor: update crate name references

* refactor: update some more references
2020-12-22 10:49:53 -05:00