Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
List (but not list_with_delimiter?!) returns a stream of ListResult
which previously wasn't instrumented - this commit uses the
StreamMetricRecorder to record the wall clock duration of the entire
list operation.
Changes the StreamMetricRecorder to be generic over the Ok types in the
result stream, invoking a T-specific delegate when Ok(T) is observed.
This enables the stream instrumentation to be reused across different
stream types while keeping the hairy state checks DRY.
This will allow the StreamMetricRecorder to decorate the streams
returned by both the get() and list() operations, but this commit causes
no functional change.
Implements a decorator of the ObjectStoreAPI trait, recording:
* Bytes uploaded / downloaded through the instrumented API
* Call latencies, broken down by operation & success / error state
All the current implementations that return a Stream from the get
operation actually return a "fake" stream containing all the data in one
go rather than streaming chunks from the upstream. I've instrumented the
Stream to future-proof any actual streaming impls in the future.
Removes openssl as a dependency, switching to rustls[1] as the TLS
implementation throughout.
It is important to note that this change brings with it a significant
behavioural difference - rustls does not currently support IP SANs in
certificates (instead only supporting fully-qualified names / DNS) and
this will manifest as a failure to connect to IP endpoints over TLS.
This might be a blocker that prevents us using rustls exclusively, but
there's noe asy way to know without trying it. Fortunately the rustls
project has received funding to work on IP SAN support[2].
[1]: https://github.com/rustls/rustls
[2]: https://www.abetterinternet.org/post/preparing-rustls-for-wider-adoption/
Don't use string prefixes, e.g. `foo/bar/` is a prefix of `foo/bar/x`
but NOT of `foo/bar_baz/y`.
This also removes some heuristics during the cloud storage parsing that
assumed that file names always contain a dot but directories don't.
Technically we should now always be able to know whether a path points
to a file or a directory:
- Rust (manually constructed): we use `DirsAndFileName` which knows the
difference (i.e. if `file_name` is set)
- in-mem store: we also use `DirsAndFileName`
- file system: this was fixed by #1523
(ccd094dfcf and 464667d8b8)
- cloud: cloud doesn't know about directories. So all paths that these
APIs return and that end with a `/` are directories (can only occur in
`list_with_delimiter`); everyting else is a file
Path string representations are now acting occurdingly (i.e. always end
with an `/` if they point to a directory).
Fixes#3226.
This removes 3 "nonexisting region" tests that where testing very
specific error behavior that no local emulator (minio and localstack)
replicate and that don't add much value. It's better to test our AWS
code at all than being to picky.
Otherwise the whole thing blows up when starting a server that has many
DBs registerd, because we potentially create 1 connection per DB (e.g.
to read out the preserved catalog).
Fixes#3336.
* feat: enable reconfiguration of in-use throttled store
This is handy for tests for which a part should run "normal" and another
one should be throttled/blocked.
* feat: keep track of the number of tasks within a `DedicatedExecutor`
* test: ensure query cancellation (somewhat) works
We cannot really test that query cancellation finishes all subtasks
because _tokio_ doesn't provide sufficient stats / inspection, at least
as long we don't want to rely heavily on _tokio_ tracing. So let's at
least check that tasks from the dedicated executors are pruned properly.
For all other regressions we need to add unit tests to the affected
components. See for example:
- https://github.com/apache/arrow-datafusion/issues/1103
- https://github.com/apache/arrow-datafusion/pull/1105
- https://github.com/apache/arrow-datafusion/pull/1112
- https://github.com/apache/arrow-datafusion/pull/1121Closes#2027.
So that they can be deserialized, without parsing, to create a new
iox object store from the location listed in the server config.
Notably, the locations serialized don't start with the object storage's
prefix like "s3:" or "file:". The location is the same object storage as
the server configuration that was just read from object storage. Having
the server config on one type of object storage and the database files
on another type is not supported.
The implementation of list_with_delimiter for the in-memory object
storage assumed that paths returned from the BTreeMap keys that sorted
greater than the prefix given to list_with_delimiter and for whom
prefix_matches returned true would also have parts after the prefix.
This didn't account for paths that started with the prefix but didn't
immediately have the delimiter after the prefix: that is,
prefix = 1/database_name
would match the in-memory paths:
1/database_name/0/rules.pb
1/database_name_and_another_thing/0/rules.pb
The first path here *would* return some parts_after_prefix, but the
second path would not and the previously existing code would panic for
the added path in the list_with_delimiter test case.