Fixes#904
The line protocol parser was lacking the unsigned integer type, which
suffixes values with `u`. This adds unsigned integer support to the line
protocol parser, and fills a few corresponding gaps in the mutable
buffer.
* refactor: Create `ChunkPredicate` predicate in the Server crate
* refactor: move chunk predicate out of chunk
* fix: remove comment in server/src/db/pred.rs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: make read_group and read_window_aggregate work across chunks
* refactor: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: Update query/src/frontend/influxrpc.rs
Improve logic and use strings directly
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fmt
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit adds created_at and last_write_at instants to partitions in the mutable buffer. It adds a method on the mutable buffer database to get back the partitions in sorted order based on either the created_at or last_write_at instants. Ordering based on the summary stats from a column are still left to do.
Finally, it modifies the helper function to create replicated write to take a Partitioner trait that can generate partition keys based on lines, rather than taking the DatabaseRules struct directly. This makes it easier to write test cases where data is split into multiple partitions in the mutable buffer.
* feat: Change table_names to return either Some(set) or None, rather than a plan
* docs: improve comments
* docs: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: merge conflict
* fix: don't clone a string unless needed
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: test_helpers crate should only be a dev-dep
* fix: object_store no longer has a build script, so no longer needs a build dep
* chore: Alphabetize all Cargo.tomls
This one is a bit of a yak shave in advance of adding column names to the summary statistics. I needed the column and its name (or identifier) to be together, rather than the id to index map that existed before. I think the table_id and column_id stuff should be refactored out over time since they add a ton of complexity to the code and don't add much value. Having those as Strings would be much easier and probably be a drop in the bucket for memory usage. Basically, I don't think they need to be interned. But that would be an even more massive refactor touching so many things in the MutableBuffer, I leave it as a later exercise.
Hopefully this makes the code simpler and cleaner in the interim and it gives me the column_id with the column so that I can easily look up the name when generating the summary statistics for a chunk.
This updates the mutable buffer, partitions, chunks, dictionary, tables, and individual columns to be able to return their approximate memory size used. This doesn't aim to be exact. There are spots where I'm not counting table or column pointers or the partition key. My expectation is that the data size will dominate and a few pointers here and there won't matter.
* feat: Implement TableProvider for Db
Gets us selection pushdown in plans, sets us up for predicate pushdown
Includes: SendableRecordBatchStreams for mutable buffer and read buffer results
fixup snapshots
* docs: comments
* chore: Update arrow + tokio deps
* chore: Use bleeding edge azure
* chore: Update aws + other deps
* fix: fmt
* fix: Switch to in-house version of routerify
* fix: Upgrade to hyper 0.14
The hyper::error module is now private; hyper::Error is the public
re-export
* fix: Upgrade cloud storage to get tokio upgrade
* fix: Upgrade open_telemetry
* fix: Do not call `panic::set_hook` during another panic
Doing so leads to a double panic which aborts the process.
* fix: new h2 error who dis
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* feat: Add selection interface to mutable buffer and query interface
* docs: Update mutable_buffer/src/table.rs
* refactor: rename for consistency
* refactor: use map and filter_map rather than fold
* test: revamp testing so it works in multiple scenarios, fix bug found by same
* fix: Update docs in server/src/db.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: use tsp rather than different functions
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* feat: Chunk Migration APIs and query data in the read buffer via SQL
* fix: Make code more consistent
* fix: fmt / clippy
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: Remove unecessary Result and make chunks() infallable
* chore: Apply more suggestions from code review
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Edd Robinson <me@edd.io>
* refactor: consolidate line protocol schema creation into data_types, and port code to use it
refactor: Port mutable buffer to use SchemaBuilder
* fix: doctest
* refactor: remove unecessary clippyisms
* docs: Improve comments via suggestions from code review
Co-authored-by: Edd Robinson <me@edd.io>
* refactor: use more idomatic try_ naming and TryInto trait
* docs: Change from line protocol data model to InfluxDB data model
* refactor: rename LP --> Influx in code
* feat: add support for UInteger type
Co-authored-by: Edd Robinson <me@edd.io>
* docs: Document the desire for read buffer and mutable buffer to be independent of query layer
* docs: Document desire for the query layer to not depend on storage systems
* fix: Apply suggestions from code review
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Edd Robinson <me@edd.io>