XML parsing lib for the Azure SDK is unmaintained and reportedly
contains integer overflow / panic issues in the parsing functionality.
Low risk ignore as it is used when talking to Azure only. The Azure SDK
is in the progress of being removed as a dependency.
* test: workaround for time > a number
* chore: cargo update
* chore: Revert "chore: cargo update"
This reverts commit 0798e4e14674267ddd2308b12a25031fc35de8b6.
* feat: Parse InfluxQL identifiers
Closes#5424
* chore: Add common derive implementations
* feat: Implement fmt::Display trait for Identifier
* feat: Display implementation, nouns for combinator functions
* chore: Update docs
* chore: Double quoted are identifiers, single quoted are literals
Single quoted strings will be parsed in a separate module.
* feat: upsert partition & update sort key for each day in bulk ingest
feat: import schema now supports earliest/latest time merging
chore: tests & tidying up for bulk ingest catalog update
* fix: always sort time last in PK in import schema update catalog
* chore: additional test for computing sort key in bulk ingest
* chore: bulk import catalog update gets sequencer from sharder service
chore: import update schema tests refactor using sharder svc mock
* chore: dead code fix
* chore: import schema sequencer lookup test
* chore: clarifying comment in import schema catalog update
Implements the ShardService to expose the shard mapping produced by
routers to external clients.
This impl uses an internal cache to eliminate unnecessary Catalog
queries, as the Kafka partition/Sequencer/Shard mapping is static once a
router has initialised.
This doesn't really need to be fallible but forces propagation of a ton
of error handling - no shards is always a sign of something being very
wrong, and can be caught in the caller if it's for some reason an
acceptable state / can be recovered from.
Removes the input oneof - a shard caller MUST always provide a
table/namespace, and MAY provide an optional payload (which in the
future will enable sharding using column valuess/etc). As there is
currently no payload-based sharding, this simplifies the RPC message.
Changes the returned types to better reflect the types we use internally
- this should avoid type juggling for both server & client.
* feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier
* chore: remove unecessary commments
The Sequencer (which will be renamed shortly) is a type that represents
a single sequencer/shard/kafka partition in the router.
In order to minimise confusion with all the various IDs floating around,
we have a KafkaPartition - this commit changes the Sequencer to return
the Kafka partition index as a typed value, rather than a usize to help
eliminate any inconsistencies.
As a side effect of these conversion changes, I've tightened up the
casting to ensure we assert on any overflows - we juggle a lot of
numeric types!
Previously aggregated writes were merged into a single Kafka Record -
this meant that all merged ops would be placed into the same Record, and
therefore receive the same sequence number once published to Kafka.
The new aggregator batches at the Record level, therefore aggregated
writes now get their own distinct sequence number. This commit updates
the batching tests to reflect this new sequence number assignment
behaviour.
Reduce the precision of timestamps in tests before comparing the DML
metadata objects.
This allows tests to accept different timestamp precisions, such as when
ops pass "through" Kafka vs. files, etc.
The previous aggregator impl would assert that writes had been
partitioned before aggregating them (or rather, that the DML write had a
partition key assigned).
This should be true for all writes passing through the write buffer,
irrespective of which aggregator is used, therefore this assert is moved
"up" into the write buffer itself.
* feat: refresh policy for caches
For #5318 we want to have a policy that refreshes keys before they are
too old. I initially tried to fold both TTL and the refresh system into
a single policy but than decided that they will basically be two
policies in one with a harder-to-test interface. Semantically TTL and
refresh are also a bit different (but will usually be used together):
- **TTL:** Prevents that a users gets data that is too old. It is some kind
of "soft correctness". In some sense this is related to the "remove
if" policy where some part of the system knows for sure (or with
reasonable likelyhood) that a cache entry is outdated. Note that TTL's
primary job is NOT to clean up memory from old keys (even though it
indirectly does that). There is no reason cached entries should be
removed except for correctness (TTL and remove-if) or resource
pressure -- and the latter is handled by the LRU policy.
- **Refresh:** While TTL is some kind of deadline, we often have good
reason to refresh the key before we pull the plug, namely when an
entry is used and a bit old (but not too old). The concrete mechanism
to archive this is flexible. At the moment the policy is rather
simple -- just start a refresh task if a key is old and we receive a
GET request -- but can be extended in the future.
This also adds some integration tests for TTL+refresh. There will be
follow-up changes to test the interaction with LRU as well, althouh I am
pretty certain that there won't be any surprises due to the excessive
testing we have in place for the policy backend itself as well as all
the policies.
This change also does NOT integrate the refresh with the querier for the
sake of keeping the changeset "small" (i.e. it is already rather large).
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Replaces the DmlAggregator with the simpler RecordAggregator.
Metrics gathered as part of #5323 shows there is practically no benefit
to the additional complexity of the DmlAggregator over the simpler
RecordAggregator impl.
This commit adds a new write buffer aggregator used by rskafka to
increase the size of Kafka messages on the wire. The Kafka write buffer
impl is the only impl to perform aggregation.
This Aggregator impl maps IOx-specific DML operations to rskafka Records
with no additional processing - it can be thought of as an IOx-specific
adaptor over rskafka's RecordAggregator.
By delegating batching of Record instances to rskakfa's simple
RecordAggregator, we minimise code complexity / bug surface area / LoC.
```
bumpalo v3.10.0 -> v3.11.0
either v1.7.0 -> v1.8.0
iana-time-zone v0.1.45 -> v0.1.46
rustix v0.35.8 -> v0.35.9
```
`rustix` is important because `0.35.8` was yanked.
* chore: struct for overrides of import schema conflicts
* chore: import schema override shouldn't support tags
* feat: import schema merge can take an override schema
* fix: schema override in test had superfluous tag
* chore: test for batch schema merge with override in import schema
* feat: import schema merge now takes override schema