* feat: Update replication.proto
* Remove the PartitionId in the replicate request as a single replicate request can have the data for many partitions.
* Add namespace_id and table_id to persist complete request to make data easier to lookup in buffer.
* feat: Initial ingest_replica skeleton
A bunch of copy pasta here from ingester2, but this takes out a ton of stuff that isn't used in replicas.
Also lays the groundwork for the simpler buffer structure to keep the data and a basic cache for catalog information that will be required.
* feat: update replication.proto GetPartitionBufferResponse
* chore: PR cleanup
* chore: PR cleanup
* feat: optimize wal with batching
Simplified the wal writer so that it batches up write operations. Currently it waits 10ms between fsync calls. We can pull this out to a config variable later if we want, but I think this is good enough for now.
Also updated the reader to be a more simple blocking reader without the extra tasks and channels as that wasn't really getting us anything that I know of.
* chore: cleanup wal code for PR feedback
* feat: Introduce InfluxQL to Flight
All InfluxQL queries will fail with an error
* chore: Temper protobuf lint
* chore: Finalize flight.proto changes; fix tests
* chore: Add tests for InfluxQL planner
* chore: Update docs
* chore: Update docs
* chore: Rename back to original
* chore: Use .into() rather than cast
* chore: Use function rather than field
* chore: Improved InfluxQL planner name
* chore: Restore `impl Into<String>` argument
* chore: Add a comment that Go clients are unable to execute InfluxQL
* chore: Add a test for the `--lang` argument and InfluxQL
* feat: compactor ignores max file count for first file
chore: typo in comment in compactor
* feat: restore special first file in partition compaction logic; add limit
* fix: calculation in compaction max file count
chore: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: marshal InfluxDbError into status details
* chore: address feedback and CI issues
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: move ns api from querier to router
* chore: add explanatory comment in querier about moved namespace API
* fix: add namespace service to router
* fix: querier returns unimplemented error for ns retention, not panic
* chore: reuse namespace -> proto in router ns api
* chore: grpc namespace - consume ns to avoid clone
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: NS+table ID (instead of name) in querier<>ingester
* feat(ingester): use IDs for query API
Changes the ingester to utilise the ID fields (instead of names) sent
over the query wire message wrapped within the Flight API.
BREAKING: this changes the "query-ingester" CLI command arguments which
now expects the namespace & table IDs, rather than their names.
* refactor(ingester): add more query logging context
Updates the log messages during query execution to include more context
fields.
* style: remove unused import
Co-authored-by: Marco Neumann <marco@crepererum.net>
* refactor: make namespace folder for all namesapce's commands
* feat: WIP for add command to set retention period
* feat: more on updating retention period
* feat: grpc for update namespace retention period
* test: end to end test fpr namespace retention
* fix: lint proto
* chore: cleanup
* chore: kick CI run again
* fix: command hierachy
* chore: fix comments
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.
Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.
In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
This commit removes tombstone support from the ingester, and deletes
associated code/helpers/tests. This commit does NOT remove tombstone
support from any other service, but MAY include removing overlapping
test coverage.
This also removes the tombstone support from the Ingester -> Querier RPC
response message.
This has the nice side effect of removing a whole lot of thread spawning
in the ingester tests for the Executor, speeding everything up!
* refactor: Remove grpc WriteService
* fix: update end to end test
* fix: Update generated_types/protos/influxdata/pbdata/v1/influxdb_pb_data_protocol.proto
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Removes the input oneof - a shard caller MUST always provide a
table/namespace, and MAY provide an optional payload (which in the
future will enable sharding using column valuess/etc). As there is
currently no payload-based sharding, this simplifies the RPC message.
Changes the returned types to better reflect the types we use internally
- this should avoid type juggling for both server & client.
* refactor: remove min_sequnce_number
* fix: typos
* fix: remove min_sequencer_number from new files from merging main
* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier
* test: add test_compactor_collision back but modify the input to make it work woth new changes
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
There is no way a user can filter for partition keys (neither via
InfluxRPC nor via SQL) and the query engine doesn't use this field at
all. So let's remove it.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Previously the column data type was exposed using an internal i32 value.
This commit changes the Schema API to use a self-descriptive proto enum
for the column data type.
* refactor: store per-file column set in catalog
Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.
The querier will use this to directly load parquet files into the read
buffer.
**WARNING: This requires a catalog wipe!**
Ref #4124.
* refactor: use proper `ColumnSet` type