Rusoto's ChainProvider swallows the error message produced by the underlying InstanceMetadataProvider
(see d59d716f09/rusoto/credential/src/lib.rs (L397)) making it hard for us to know why it's not working in our staging cluster.
so that if not present, the aws client library can use builtin auth providers,
such as the InstanceMetadataProvider, which is commonly used to get the credentials
granted to the AWS VM via cloud native mechanism.
Empty directory names are silently ignored and can lead to very surprising effects
such as directory layouts missing a level. This makes it hard to reason about directory structures.
A sane object store path API should either disallow empty names or deal with them gracefully.
Since we already have to escape file/directory names using the minimum common denominator valid character
set for known cloud providers, it feels quite natural to treat this empty dir/file name problem as encoding problem.
Addresses the API aspect of #818
Adds a utility module that helps computing the length of a stream while buffering it
for later replay (in-memory or spilling it in a temporary file).
* fix: test_helpers crate should only be a dev-dep
* fix: object_store no longer has a build script, so no longer needs a build dep
* chore: Alphabetize all Cargo.tomls
I didn't want the object store lib Error to have to know about walkdir,
but I feel better about it now that this error type is scoped to the
disk module. The walkdir errors might have a bit more information.
This test was invalid because there are cases in which we use the
assumption that all file names in object storage should end with
`.json`, `.parquet`, or `.segment`.
This is the promised cleanup. This structure gets rid of a lot of
intermediate structures and encodes through associated types how the
object stores and path types are related.
The enums are still necessary to avoid having generics leak all over
the place, but the object store variants and path variants should always
match because they'll always come from the object store trait
implementations that use the associated types.
This is the start of using the type system to enforce that only
CloudPaths will be used with S3, GCS, and Azure.
Still some mess in here, cleanup coming.
* chore: Update arrow + tokio deps
* chore: Use bleeding edge azure
* chore: Update aws + other deps
* fix: fmt
* fix: Switch to in-house version of routerify
* fix: Upgrade to hyper 0.14
The hyper::error module is now private; hyper::Error is the public
re-export
* fix: Upgrade cloud storage to get tokio upgrade
* fix: Upgrade open_telemetry
* fix: Do not call `panic::set_hook` during another panic
Doing so leads to a double panic which aborts the process.
* fix: new h2 error who dis
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* refactor: Remove import of unimplemented macro that's in the prelude
* refactor: Remove allowing of dead code that isn't dead anymore
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Now you have to designate whether you're adding a directory or a file
name, with some assumptions based on paths coming from a cloud object
storage or the file system.
A notable difference: checking to see if "apple/b" is a prefix of
"apple/bear/cow.json" will now say no; only whole directories are
matched.
Closes#528
This patch adds support for Microsfot Azure Blob storage. The
implementations requires an account, a key and container name. They can
be configured via the environment variables `AZURE_STORAGE_ACCOUNT`,
`AZURE_STORAGE_MASTER_KEY` and `AZURE_STORAGE_CONTAINER`.
I had stuff in my Google storage from when I was manually testing out
how paths are handled; it was hard to see that was the problem without
this extra failure text.
Now that the code is separated into modules, we don't need the modules
inside the test modules. So before this commit, the test names looked
like this:
```
test aws::tests::amazon_s3::s3_test_put_nonexistent_bucket ... ok
test gcp::test::google_cloud_storage::gcs_test ... ok
test disk::tests::file::length_mismatch_is_an_error ... ok
test memory::tests::in_memory::length_mismatch_is_an_error ... ok
```
and after this commit, the test names look like this:
```
test aws::tests::s3_test_put_nonexistent_bucket ... ok
test gcp::test::gcs_test ... ok
test disk::tests::length_mismatch_is_an_error ... ok
test memory::tests::length_mismatch_is_an_error ... ok
```
This adds the list_with_delimiter function to the in-memory object store. It also updates the function signature to require a prefix since it will always only want to list either the objects in the dir or the common prefixes.
This adds a new function list_with_delimiter to the object store. This commit contains just the implementation for S3, leaving the others to be completed in follow on commits.
This has a fixed delimiter to ensure a directory structure is created. This delimiter should be dependent on platform and which object store is used. For any of the cloud object stores or in memory, the delimiter should be /. For the future disk based implementation it should be dependendent on if you're running on Windows or Linux.
I didn't use Stream for the return type because I found it difficult to work with and I don't think it actually added anything useful. The return ListResult struct has the next token and I prefer that the caller explicitly makes calls that go over the network so they're more aware of what's going on, where a Stream abstracts that away so it's hidden behind the scenes. We can easilsy add a Stream based version on top of this existing API if we want.
This pulls the different backing implmenetations into their own modules. They're about to get more complex so it felt like it was time to separate them out rather than building towards a single multi-thousand line lib.rs. The error type is only defined in lib and imported by the individual modules, which I think makes it easier to work with.
* feat: Implement write buffer to Parquet snapshotting
This introduces snapshot to the server packages to manage snapshotting. It also introduces a new trait for representing a Partition. There is a very crude API wired up in http_routes for testing purposes. Follow on work will bring the server package into http_routes and rework the snapshot API.