Commit Graph

1095 Commits (8e9dd227534f57e607c7d6b89fad055bde6e02fe)

Author SHA1 Message Date
Edd Robinson d7073efa72 feat: add support for query pred -> read buffer pred 2021-01-14 13:46:17 +00:00
Paul Dix e53a65b49a chore: fix failing tests after rebase 2021-01-13 18:15:35 -05:00
Paul Dix 9377dc1943 chore: address pr feedback 2021-01-13 18:15:35 -05:00
Paul Dix 39cc833ec7 chore: change wal buffer to use sync mutex instead of tokio 2021-01-13 18:15:35 -05:00
Paul Dix 26c4003e43 feat: add wal segment persistence to object store
This hooks the wal buffer up to the server. When segments go over their max size, they will be persisted if that configuration was set. A follow on commit will add persistence based on time (how long the segment has been open).
2021-01-13 18:15:35 -05:00
Paul Dix 3b1f045f44 feat: add serialization to wal buffer segments
Adds serialization with compression and checksum for WAL buffer segments.

This required a weird structure where the flatbuffer bytes of ReplicatedWrite were kept as a raw payload. I did this because otherwise each of the replicated writes would have been rebuilt in the segment.

The other thing that isn't ideal is that deserializing a segment actually marshals it into a Rust struct as opposed to keeping the entire thing as raw flatbuffers. We could update this later to have a concept of an open segment (regular rust stuct) and closed segments that are just the flatbuffers.
2021-01-13 18:15:34 -05:00
Paul Dix 29444ad06b feat: add function to get segment object store location 2021-01-13 18:15:34 -05:00
Carol (Nichols || Goulding) 6b67498533
Merge branch 'main' into cn/better-osp-api 2021-01-13 15:05:08 -05:00
Andrew Lamb 6376891da3
feat: implement query planning in terms of chunks (#647) 2021-01-12 16:04:45 -05:00
Dom 1a1a14308d feat: API set writer ID endpoint
Adds a new endpoint /iox/api/v1/id that accepts a JSON object in the form:

    {"id":42}

And calls into the server's set_id method to assign the writer ID to the server.
2021-01-12 15:16:59 +00:00
Dom e60906c5d5 refactor: remove async from ID methods
Both set / get ID methods are not async, so this removes the annotation &
awaits.
2021-01-12 15:01:51 +00:00
Dom e38df076b4
fix: do not overwrite databases (#644)
* fix: do not overwrite databases

Do not overwrite an existing database when attempting to create a DB with an
existing name.

This should probably update the existing database config, but for the time being
this prevents the existing database from being silently dropped.

Fixes #643

* test: ensure duplicate DB names returns an error

Covers #643 with a test case.
2021-01-12 08:55:29 -05:00
Carol (Nichols || Goulding) 06f1358e2d feat: Change ObjectStorePath API to be more explicit
Now you have to designate whether you're adding a directory or a file
name, with some assumptions based on paths coming from a cloud object
storage or the file system.

A notable difference: checking to see if "apple/b" is a prefix of
"apple/bear/cow.json" will now say no; only whole directories are
matched.
2021-01-11 16:57:37 -05:00
Andrew Lamb fd28d8a01b
refactor: Use u32 for Chunk ids consistently (#639) 2021-01-11 16:07:22 -05:00
Carol (Nichols || Goulding) 2be992b2f6
Merge branch 'main' into cn/fewer-servers 2021-01-08 12:41:48 -05:00
Andrew Lamb 6d0c538eca
refactor: pull DBChunk into its own module (#623)
* refactor: pull DBChunk into its own module

* refactor: consolidate impl blocks
2021-01-08 12:27:21 -05:00
Carol (Nichols || Goulding) cd03f39280 refactor: Move code in the server::server module to the root
This gets rid of the redundant server::server imports.
2021-01-08 12:19:58 -05:00
Andrew Lamb a4be6f74c7
refactor: Remove partition key from the Chunk trait (#622) 2021-01-08 06:11:07 -05:00
Andrew Lamb 8219403fab
feat: Instantiate ReadBuffer as part of server creation (#620)
* feat: Instantiate ReadBuffer as part of server creation

* refactor: remove Store from read_buffer
2021-01-07 13:25:42 -05:00
Andrew Lamb c672bb341d
feat: Extract SQL planning out of databases (#618) 2021-01-07 13:13:30 -05:00
Paul Dix d17ef800c5
feat: add create and get database to API (#619)
* feat: add create and get database to API

This commit is start of the IOx specific API. It puts everything under /iox/api/v1 as this is the beginning of the IOx API. Creating a database is done with a PUT and a GET request can retrieve the DatabaseRules details.

* feat: add defaults for DatabaseRules for create_database

* feat: add create and get database to API

This commit is start of the IOx specific API. It puts everything under /iox/api/v1 as this is the beginning of the IOx API. Creating a database is done with a PUT and a GET request can retrieve the DatabaseRules details.
2021-01-07 12:25:37 -05:00
Carol (Nichols || Goulding) 18ee1b561b feat: Use ObjectStorePath everywhere to feel out the API needed 2021-01-07 10:48:22 -05:00
Paul Dix d1ab5c0ee9 chore: refactor object_store crate
This pulls the different backing implmenetations into their own modules. They're about to get more complex so it felt like it was time to separate them out rather than building towards a single multi-thousand line lib.rs. The error type is only defined in lib and imported by the individual modules, which I think makes it easier to work with.
2021-01-07 09:19:07 -05:00
Andrew Lamb 654b520005
feat: Interface for writing and querying mutable buffer, read buffer and parquet (#615)
* refactor: Create database with mutable buffer, read buffer and parquet files

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: rename planners to clarify what they are

* refactor: simplify traits

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-01-06 17:25:46 -05:00
Andrew Lamb 08d52ea043
feat: implement partition chunk rollover + ids and timestamps (#601)
* feat: implement partition chunk rollover + ids and timestamps

* feat: add last_write_timestamp

* refactor: Use DateTime<Utc> rather than Instant

* refactor: avoid use of structure to generate ids
2020-12-29 11:00:18 -05:00
Andrew Lamb 0d0ec0ce69
chore: Upgrade arrow dependencies (#603)
* chore: Update arrow dependencies to latest

* refactor: Update code to conform to new arrow api
2020-12-28 16:08:09 -05:00
Andrew Lamb 5fa77c32cc
feat: Add "Chunks" to the Mutable Buffer (#596)
* refactor: Update docs, remove unused field

* refactor: rename partition -> chunk

* feat: Introduce new partition, which is a holder for Chunks

* refactor: Remove use of wal from mutable database

* refactor: cleanups, remove last direct use of chunks

* fix: delete old benchmarks

* fix: clippy sacrifice

* docs: tidy up comments

* refactor: remove unused error types

* chore: remove commented out tests
2020-12-28 07:10:25 -05:00
Paul Dix 1d200c5c77 chore: move http API over to Routerify
This moves the HTTP API over to Routerify, which has the basic route parsing logic that will enable the API design for IOx.

I had a little trouble with the error handling in Routerify so I ended up creating a macro for constructing error responses in the HTTP API. I'm not sure what I think of this pattern so I'm interested in what others think. Another option would be to have two functions for each API endpoint. One which is x_handler with a Routerify function signature. Then another which is just x that has the Result<Response<Body>, ApplicationError> return type, which would make using the ? operator work in those functions. That would eliminate the need for the return_err macro.

I'm happy to refactor to that if people prefer it.
2020-12-24 16:45:20 -06:00
Andrew Lamb 48c43b136c
refactor: rename write_buffer --> mutable_buffer (#595)
* refactor: git mv write_buffer mutable_buffer

* refactor: update crate name references

* refactor: update some more references
2020-12-22 10:49:53 -05:00
Dom 558cf6fe1a perf: avoid locking to load server ID
Moves the server ID value outside of the mutex-wrapped `Config` type and
into the `Server` struct, synchronising threads using an atomic instead.

This prevents any calls to `require_id()` from having to contend for a
read lock on the config mutex while maintaining lineraisaion of ID
values - all threads see the change as soon as it is set.

On x86 this synchronisation is actually completely "free" - memory
accesses are totally ordered, so this turns into a compile fence to
prevent LLVM changing program order at compile time with no runtime
overhead at all.

Previously the ID value was wrote as part of the config, but this is no
longer the case. It should be easy to add back in, but it appears to be
redundant as the ID is required to construct the path to the config when
loading in the first place.
2020-12-18 11:12:41 +00:00
Paul Dix 70ff4267ce
chore: Connect server crate to http routes and server command (#569)
* chore: Connect server crate to http routes and server command

This updates the http_routes and main server to use the Server crate. This is the first step in a larger effort to start hooking up the initial IOx API and get things running end to end with in-memory database, WAL buffer, and object storage.

For the time being, this disables the previous disk based WAL. Or rather, it uses the WriteBufferDb without it. That means that this IOx server has no persistence until later. Because of this, the restart in the end-to-end was removed.

Later PRs will add the WAL buffer and restart logic that loads from object store. We can opt to bring the local disk based WAL back later, but it will likely require some refactoring to work with how the WAL Buffer will operate.
2020-12-17 18:48:41 -05:00
Dom 6f473984d0 style: wrap comments
Runs rustfmt with the new config.
2020-12-11 18:22:26 +00:00
Paul Dix fa3ecbd4ed
feat: Implement write buffer to Parquet snapshotting (#526)
* feat: Implement write buffer to Parquet snapshotting

This introduces snapshot to the server packages to manage snapshotting. It also introduces a new trait for representing a Partition. There is a very crude API wired up in http_routes for testing purposes. Follow on work will bring the server package into http_routes and rework the snapshot API.
2020-12-08 14:20:43 -05:00
Dom 87573256a7 chore: fmt 2020-12-03 16:10:16 +00:00
Dom 13f391e2b9 refactor: ignore destructured fields
I temporarily forgot I can do this.

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-12-03 16:10:16 +00:00
Dom 234df612ec refactor: avoid clones for errors
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2020-12-03 16:10:16 +00:00
Dom b03de0e7ef refactor: remove needless lifetimes 2020-12-03 16:10:15 +00:00
Dom f90a95fd80 fix: unambigious bucket/org to DB mappings
Previosuly the $ORG and $BUCKET was joined as:

	$ORG + "_" + $BUCKET

Which is fine unless either $ORG or $BUCKET includes a "_", such as:

	$ORG = "org_a"
	$BUCKET = "bucket"

	and

	$ORG = "org"
	$BUCKET = "a_bucket"

This change continues to join $ORG and $BUCKET with an underscore, but
disallows underscores in either $ORG or $BUCKET. It appears these values
are non-zero u64s in the gRPC protocol converted to their base-10 string
representations for the DB name, so this seems safe to enforce.

In addition, this change introduces a `DatabaseName` type to avoid
passing bare strings around, and allow consuming code to ensure only
valid database names are provided at compile type. This type works with
both owned & borrowed content so doesn't force a string copy where we
can avoid it, and derefs to `str` to make it easier to use with existing
code.

I've been minimally invasive in pushing the `DatabaseName` through the
existing code and figured I'd see what the sentement is first.
Candidates for conversion from `str` to `DatabaseName` that seem to make
sense to me include:

	- `DatabaseStore` trait
	- `RemoteServer` trait
	- Others? Basically anywhere other than the "edge" API inputs

Fixes #436 (thanks @zeebo)
2020-12-03 16:10:15 +00:00
Andrew Lamb ecc4eee8e1
refactor: Move SQL functions into is own trait (#511)
* refactor: remove uneeded function table_to_arrow from Trait

* refactor: Move SQL functions into is own trait
2020-12-02 08:23:37 -05:00
Andrew Lamb 5ef499bb63
refactor: rename Database --> TSDatabase to better reflect its purpose (#510)
* refactor: rename Database --> TSDatabase to better reflect its purpose

* refactor: rename field_columns to field_column_names

* fix: clippy?
2020-12-01 12:37:11 -05:00
Dom 37fea83ca9 test: add newline test case 2020-11-30 11:57:48 +00:00
Dom f08846ca50 chore: validate database names
Ensure database names contain only {alphanumeric, -, _} characters.

Fixes #278.
2020-11-30 11:27:33 +00:00
Matt Freitas-Stavola 7e2df1fc59
chore(server): add logs for dropped WAL segments (#478)
* chore(server): add logs for dropped WAL segments

Added logging for dropped writes and old segments in rollover scenarios

Also including a dep on tracing and dev-dep on test_helpers

Refs: #466

* chore(server): Add more context to logs

Minor cleanup around remove_oldest_segment usage

Suggestions from @alamb's review
2020-11-24 16:37:09 -05:00
Andrew Lamb cdb26e60e4
refactor: rename `storage` crate to `query` to better reflect what it is (#475)
* refactor: rename storage --> query

* refactor: update a few more referenes
2020-11-24 14:19:29 -05:00
Paul Dix 59e2024d01 chore: rename cluster crate to server 2020-11-20 12:14:12 -05:00