This function is largely duplicated with namespace_to_proto, and the
other responses in this file don't make helper functions for
constructing the response type, so make Create more similar to the other
actions.
So that the same conversion can happen in the tests and one assert_eq!
can check everything rather than repeating lots of assertions for every
test for every field.
So that we can use PartialEq rather than comparing each field
individually.
Also take a reference to a namespace; this function doesn't need
ownership.
When gap-filling make the output time array have the same timezone
as the imput time array.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This moves the error handling up the the file level replay loop, being
stricter about which files are considered "replayed" when they are
truncated. Any files other than the most recent segment file which
encounter and unexpected are not considered to be safe to replay and
discard.
I was confused about whether validate_or_insert_schema should return all
columns a table has in the catalog if another process has added some.
Dom explained that no, this is by design-- the validate_or_insert_schema
function shouldn't be fetching any extra columns from the catalog, only
inserting missing columns from the diff set being processed during a
write.
The NamespaceCache/gossip system takes care of eventually converging
schemas at a higher level.
To avoid anyone having to go through the understanding path I just did,
encode this expected behavior in a test for future reference.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Update the selector functions to output the selected time in the
same timezone as input time array. This will not have any effect
on the rest of the system yet as timezones are not used anywhere.
This change is being done in preparation for making use of timezones.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the default ingester configuration to assign half the logical
cores to datafusion for persist execution. Prior to this commit,
datafusion always used 4 threads by default.
In situations where the ingesters are configured with 4 logical cores or
less, the periodic persist can start enough persist jobs to keep the 4
threads assigned to datafusion busy. Because there are enough threads to
saturate all CPU cores, these CPU-heavy persist threads can impact write
latency by stealing CPU time from the tokio runtime threads.
This change assigns exactly half the threads to DF by default, ensuring
there's always N/2 cores to service I/O heavy API requests.
This changes the per-namespace buffered partition limiter to only
consider non-empty partitions when enforcing the partition limit.
Non-empty partitions cost a small amount of RAM, but are not added to
the persist queue - only non-empty partitions will need persisting, so
the limiter only needs to limit non-empty partitions.
This commit also significantly improves the consistency properties of
the limiter - the limit no longer suffers from a small window of
"overrun" due to non-atomic updates w.r.t partition creation - the limit
is now exact.
As an optimisation, partitions are not created at all if the limit has
been reached, preventing an accumulation of empty partitions whilst the
limit is being enforced.
Use the PartitionDataBuilder in the MockPartitionProvider, allowing the
test caller to specify any necessary parameters, but still allow the
mock provider to inject the arguments it was called with.
This adds a level of assurance that multiple error states set are
ignored when they are all are present in the exceptions, while disjoint
error states and exceptions return an error. Arbitrary sets could be
covered, but would like require taking a non-const array for
`read_with_exceptions`.
* refactor: make partition key parsing more flexible
* feat: decode time portion of the partition key
Helpful for #8705 because we can prune partitions earlier during the
query planning w/o having to consider their parquet files at all.
* refactor: "projected schema" cache inputs must be normalized
Normalizing under the hood and returning normalized schemas w/o the user
knowing about it is a good source for subtle bugs.
* refactor: do not normalize projected schema by name
Normalizing makes it harder to predict the output and potentially
requires additional string lookups just to work with the schema.
* fix: typos
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Martin Hilton <mhilton@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Martin Hilton <mhilton@influxdata.com>
This commit adds an optional (disabled by default) limit on the number
partitions that may be buffered for a namespace at any one time.
The exact value is configurable by setting
INFLUXDB_IOX_MAX_PARTITIONS_PER_NAMESPACE to a non-zero value, and is
disabled unless specified.
In an ArcMap, an init() function is called exactly once, this sentence
was supposed to suggest threads race to call init, but instead it sounds
like they race to initialise a V (via init()) and put it in the map
before the other thread, which is incorrect.
There is no need to hash a hash.
Found while investigating https://github.com/influxdata/EAR/issues/4505
and the hashing code turned up in the profile. In general, hashing IDs
should be pretty cheap.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>