Commit Graph

191 Commits (5d66cd0a81d2533ee174e1bfb56bab86f81c1fed)

Author SHA1 Message Date
Carol (Nichols || Goulding) a373c90415
refactor: Extract the list_all function to object store
I'm about to use this in a third file, so time to extract this.

Make it clear that this is appropriate for tests only.
2022-03-29 08:15:24 -04:00
Dom Dwyer 5585dd3c21 refactor: switch to using DynObjectStore
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
2022-03-15 16:32:52 +00:00
Dom Dwyer b727d26dab refactor: path_from_dirs_and_filename trait method
Moves the path_from_dirs_and_filename from an ObjectStoreImpl method to
a trait method, completing the abstraction over all object store
backends.
2022-03-15 16:29:43 +00:00
Dom Dwyer 1d5066c421 refactor: rename ObjectStore -> ObjectStoreImpl
Frees up the name for so we can use `dyn ObjectStore` throughout the
code instead of `ObjectStoreApi`.
2022-03-15 16:29:43 +00:00
Dom Dwyer 7a7dfade94 feat(catalog): instrument list() ops
List (but not list_with_delimiter?!) returns a stream of ListResult
which previously wasn't instrumented - this commit uses the
StreamMetricRecorder to record the wall clock duration of the entire
list operation.
2022-03-15 14:23:11 +00:00
Dom Dwyer ee01225fec refactor: polymorphic stream recorder impl
Changes the StreamMetricRecorder to be generic over the Ok types in the
result stream, invoking a T-specific delegate when Ok(T) is observed.
This enables the stream instrumentation to be reused across different
stream types while keeping the hairy state checks DRY.

This will allow the StreamMetricRecorder to decorate the streams
returned by both the get() and list() operations, but this commit causes
no functional change.
2022-03-15 14:13:17 +00:00
Dom Dwyer 1aaf193299 refactor: take last_yielded_at 2022-03-14 11:54:02 +00:00
Dom Dwyer 3802401b4f docs: explicit stream duration semantics 2022-03-14 11:54:02 +00:00
Dom Dwyer 1aee8402bf refactor: suffix duration vars with _ms 2022-03-14 11:54:02 +00:00
Dom f01382f578
docs: fix typos
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-03-14 11:36:49 +00:00
Dom Dwyer dc8a708fb7 refactor: use DummyObjectStore for error tests 2022-03-11 14:36:34 +00:00
Dom Dwyer bfc473b8cd feat: object store instrumentation
Implements a decorator of the ObjectStoreAPI trait, recording:

    * Bytes uploaded / downloaded through the instrumented API
    * Call latencies, broken down by operation & success / error state

All the current implementations that return a Stream from the get
operation actually return a "fake" stream containing all the data in one
go rather than streaming chunks from the upstream. I've instrumented the
Stream to future-proof any actual streaming impls in the future.
2022-03-11 13:38:48 +00:00
Marco Neumann cf0c238ae4
feat: allow unencrypted HTTP connections to AWS via flag (#3916)
This is required for the test bench.
2022-03-03 17:01:03 +00:00
Dom Dwyer 46bb107be4 refactor: use rustls
Removes openssl as a dependency, switching to rustls[1] as the TLS
implementation throughout.

It is important to note that this change brings with it a significant
behavioural difference - rustls does not currently support IP SANs in
certificates (instead only supporting fully-qualified names / DNS) and
this will manifest as a failure to connect to IP endpoints over TLS.
This might be a blocker that prevents us using rustls exclusively, but
there's noe asy way to know without trying it. Fortunately the rustls
project has received funding to work on IP SAN support[2].

[1]: https://github.com/rustls/rustls
[2]: https://www.abetterinternet.org/post/preparing-rustls-for-wider-adoption/
2022-03-03 11:05:20 +00:00
Dom Dwyer 4785a7d028 refactor: bump Azure SDK
Update to the latest Azure SDK to pick up rustls support.
2022-03-03 11:02:20 +00:00
Marco Neumann f3f6f335a9
chore: upgrade to snafu 0.7 (#3440) 2022-01-11 19:22:36 +00:00
kodiakhq[bot] 099d9dc6e1
Merge branch 'main' into crepererum/issue3226 2021-12-17 16:37:16 +00:00
Carol (Nichols || Goulding) 24fd2e549b
docs: Fix some typos and outdated comments 2021-12-17 11:33:36 -05:00
Marco Neumann 9f2694bf1b test: `objest_store` azure support via Azurite 2021-12-17 09:30:21 +01:00
Marco Neumann 61ee70d222 refactor: make object store prefixes segment based
Don't use string prefixes, e.g. `foo/bar/` is a prefix of `foo/bar/x`
but NOT of `foo/bar_baz/y`.

This also removes some heuristics during the cloud storage parsing that
assumed that file names always contain a dot but directories don't.

Technically we should now always be able to know whether a path points
to a file or a directory:

- Rust (manually constructed): we use `DirsAndFileName` which knows the
  difference (i.e. if `file_name` is set)
- in-mem store: we also use `DirsAndFileName`
- file system: this was fixed by #1523
  (ccd094dfcf and 464667d8b8)
- cloud: cloud doesn't know about directories. So all paths that these
  APIs return and that end with a `/` are directories (can only occur in
  `list_with_delimiter`); everyting else is a file

Path string representations are now acting occurdingly (i.e. always end
with an `/` if they point to a directory).

Fixes #3226.
2021-12-16 14:40:45 +01:00
Marco Neumann f02d846389 test: `object_store` aws test using localstack
This removes 3 "nonexisting region" tests that where testing very
specific error behavior that no local emulator (minio and localstack)
replicate and that don't add much value. It's better to test our AWS
code at all than being to picky.
2021-12-16 10:10:06 +01:00
Marco Neumann 876a9af35a fix: limit number of S3 connections
Otherwise the whole thing blows up when starting a server that has many
DBs registerd, because we potentially create 1 connection per DB (e.g.
to read out the preserved catalog).

Fixes #3336.
2021-12-08 19:06:02 +01:00
Raphael Taylor-Davies bca561366b
feat: don't copy parquet files out of disk object store (#3282) (#3293)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-05 16:31:40 +00:00
Carol (Nichols || Goulding) 5d0fd1c603
fix: Allow dead code on fields that are now detected as never read 2021-12-02 11:52:01 -05:00
Marco Neumann f0136e9791 ci: run clippy for all features
This indeed fixes some issues in our object store implementation.

Closes #537.
2021-11-11 10:48:21 +01:00
Carol (Nichols || Goulding) 4de7dce8ba feat: Add a 404 NotFound error to the object store errors for GET
With the object store implementations that support figuring out if a GET
request returns 404 not found. Azure doesn't have great support for
this.
2021-10-22 15:55:59 -04:00
Carol (Nichols || Goulding) 6ae42bb774 fix: Put both Debug and Display of AWS errors in error messages 2021-10-22 13:59:48 -04:00
Carol (Nichols || Goulding) a2b81cdb20 fix: Put Debug format of rusoto errors in error messages
Fixes #2942
2021-10-22 13:16:37 -04:00
Marco Neumann f7ca80e29f
test: ensure query cancellation (somewhat) works (#2931)
* feat: enable reconfiguration of in-use throttled store

This is handy for tests for which a part should run "normal" and another
one should be throttled/blocked.

* feat: keep track of the number of tasks within a `DedicatedExecutor`

* test: ensure query cancellation (somewhat) works

We cannot really test that query cancellation finishes all subtasks
because _tokio_ doesn't provide sufficient stats / inspection, at least
as long we don't want to rely heavily on _tokio_ tracing. So let's at
least check that tasks from the dedicated executors are pruned properly.

For all other regressions we need to add unit tests to the affected
components. See for example:

- https://github.com/apache/arrow-datafusion/issues/1103
- https://github.com/apache/arrow-datafusion/pull/1105
- https://github.com/apache/arrow-datafusion/pull/1112
- https://github.com/apache/arrow-datafusion/pull/1121

Closes #2027.
2021-10-21 19:10:58 +00:00
Carol (Nichols || Goulding) 1dda568d28 test: Add a case for a path without a filename 2021-10-18 08:46:00 -04:00
Carol (Nichols || Goulding) d5ab29711e fix: Serialize relative db object store paths to the server config
So that they can be deserialized, without parsing, to create a new
iox object store from the location listed in the server config.

Notably, the locations serialized don't start with the object storage's
prefix like "s3:" or "file:". The location is the same object storage as
the server configuration that was just read from object storage. Having
the server config on one type of object storage and the database files
on another type is not supported.
2021-10-18 08:37:36 -04:00
Carol (Nichols || Goulding) fb0e6ad5b1 docs: Document the s3_request generics and how they fit together 2021-09-29 08:19:33 -04:00
Carol (Nichols || Goulding) 2b9612a033 fix: Remove some of the futures in futures layers
I think the FuturesRetry crate needed this, but we're no longer using
the crate so we can get rid of some of the complexity.
2021-09-29 08:19:33 -04:00
Carol (Nichols || Goulding) ad1405dcdf fix: Reduce number of AWS S3 request retries from 10 to 3 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) 5857edb19a fix: cloud_storage crate returns a different error type in this case 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) 92583aee82 fix: Remove streaming API since we're not streaming anyway 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) edd6c12e93 fix: Add logging when retrying AWS requests 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) ea4cb9a6c1 refactor: Extract the maximum number of retries into a documented public constant 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) 9cf343db08 refactor: Use loop instead of futures_retry 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) 9cdccae49d fix: Actually do the exponential part of exponential backoff 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) d69472e114 refactor: Use s3_request in s3 delete 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) d05528bcfd refactor: Use s3_request for put requests
Which meant we also needed to change the byte stream to be a closure
that can generate a byte stream
2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) dd3b405727 fix: Only retry s3 requests if error is 500 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) 936fba24b9 refactor: Make s3_request generic over the return type 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) 5c78b7d5ae refactor: Extract an s3_request method that handles retries 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) fd3a027ca8 refactor: Use the extracted aws list function in list_with_delimiter 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) a1a8bc01bf refactor: Extract a method just to stream aws list requests 2021-09-29 08:19:32 -04:00
Marco Neumann ef5ab67c77 chore: upgrade azure dependencies 2021-09-21 13:35:11 +02:00
Raphael Taylor-Davies 4382bc0b71
feat: disable Snafu futures feature (#2573)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-19 11:36:24 +00:00
Carol (Nichols || Goulding) fbf18813ac fix: Correct a mistaken assumption in in-memory object storage
The implementation of list_with_delimiter for the in-memory object
storage assumed that paths returned from the BTreeMap keys that sorted
greater than the prefix given to list_with_delimiter and for whom
prefix_matches returned true would also have parts after the prefix.

This didn't account for paths that started with the prefix but didn't
immediately have the delimiter after the prefix: that is,

prefix = 1/database_name

would match the in-memory paths:

1/database_name/0/rules.pb
1/database_name_and_another_thing/0/rules.pb

The first path here *would* return some parts_after_prefix, but the
second path would not and the previously existing code would panic for
the added path in the list_with_delimiter test case.
2021-09-02 14:57:27 -04:00