So that they can be deserialized, without parsing, to create a new
iox object store from the location listed in the server config.
Notably, the locations serialized don't start with the object storage's
prefix like "s3:" or "file:". The location is the same object storage as
the server configuration that was just read from object storage. Having
the server config on one type of object storage and the database files
on another type is not supported.
The implementation of list_with_delimiter for the in-memory object
storage assumed that paths returned from the BTreeMap keys that sorted
greater than the prefix given to list_with_delimiter and for whom
prefix_matches returned true would also have parts after the prefix.
This didn't account for paths that started with the prefix but didn't
immediately have the delimiter after the prefix: that is,
prefix = 1/database_name
would match the in-memory paths:
1/database_name/0/rules.pb
1/database_name_and_another_thing/0/rules.pb
The first path here *would* return some parts_after_prefix, but the
second path would not and the previously existing code would panic for
the added path in the list_with_delimiter test case.
* refactor: remove display methods, use fmt::Display instead.
Signed-off-by: Ning Sun <sunng@protonmail.com>
* refactor: update a few calls from .display to .to_string()
* fix: consistently use `Path` rather than occasionally `DirsAndFileName`
* fix: fixup for merge conflicts
* fix: update test
* fix: Catch another case or two
* fix: fmt
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Addresses the API aspect of #818
Adds a utility module that helps computing the length of a stream while buffering it
for later replay (in-memory or spilling it in a temporary file).
This is the promised cleanup. This structure gets rid of a lot of
intermediate structures and encodes through associated types how the
object stores and path types are related.
The enums are still necessary to avoid having generics leak all over
the place, but the object store variants and path variants should always
match because they'll always come from the object store trait
implementations that use the associated types.
Now you have to designate whether you're adding a directory or a file
name, with some assumptions based on paths coming from a cloud object
storage or the file system.
A notable difference: checking to see if "apple/b" is a prefix of
"apple/bear/cow.json" will now say no; only whole directories are
matched.
Now that the code is separated into modules, we don't need the modules
inside the test modules. So before this commit, the test names looked
like this:
```
test aws::tests::amazon_s3::s3_test_put_nonexistent_bucket ... ok
test gcp::test::google_cloud_storage::gcs_test ... ok
test disk::tests::file::length_mismatch_is_an_error ... ok
test memory::tests::in_memory::length_mismatch_is_an_error ... ok
```
and after this commit, the test names look like this:
```
test aws::tests::s3_test_put_nonexistent_bucket ... ok
test gcp::test::gcs_test ... ok
test disk::tests::length_mismatch_is_an_error ... ok
test memory::tests::length_mismatch_is_an_error ... ok
```
This adds the list_with_delimiter function to the in-memory object store. It also updates the function signature to require a prefix since it will always only want to list either the objects in the dir or the common prefixes.
This pulls the different backing implmenetations into their own modules. They're about to get more complex so it felt like it was time to separate them out rather than building towards a single multi-thousand line lib.rs. The error type is only defined in lib and imported by the individual modules, which I think makes it easier to work with.