When adding many series using offline tooling, it's likely that every
series involves an entry being appended to a LogFile. Typically an entry
is 11 or 12 bytes, but the default bufio.Writer buffer size is only 4K.
This means by default a write of 10,000 new series would involve ~30
buffer flushes.
This commit makes the buffer configurable, and sets the value in
`buildtsi` such that it reflects the number of series being written to
the LogFile.
When running offline tooling, flushing buffers and syncing files on
every write to a `LogFile` is not necessary. Were a hard exit
with data loss to occur, the tooling can simply be run again.
TSI LogFile compactions occasionally race with insert and delete
operations because the index partition FileSet is retained needlessly by
the method that calls Partition.CheckLogFile.
In this change:
- TSI LogFile compaction respects enable/disable compactions
- Partition FileSet.Release before log compaction is triggered
An alternative to the second step is to handle log file compaction in a
new goroutine. Log file compaction errors would be logged and not
returned to the caller.
After this change, `DELETE FROM /regex/` does not deadlock; performance:
- 30s to delete 100 measurements
- 5m30s to delete 1000 measurements
* Check for errors from binary.Uvarint when reading TSI logs
* also check len(parsed) == len(input)
* wrap binary.Uvarint
* make uvarint() more generally useful/used
This commit adds the `max-index-log-file-size` configuration flag so
that users can restrict the maximum size of log files before compaction.
The default limit was also lowered from `5MB` to `1MB`. The original
size was set before we partitioned the index so the change reflects this.
There was a check in inmem TagSets to see if a series was assigned
to a shard to prevent cursors for non-existent series getting created.
This check was lost during TSI development because inmem Series tracking
was removed and then replaced with bitsets. The bitsets were not
re-incorporated as before. This adds the functionality back using
the bitsets.
This commit improves the startup time when using the `inmem` index by
ensuring that the series are created in the index and series file in
batches of 10000, rather than individually.
Fixes#9486.
When a max series per data limit was in place (or 0), we would create
series one at a time which really affects throughput. This does it
in bulk which is less accurate, but more performant.
This was added for preventing concurrent writes and deletes to the
same series. This is not handled by the bitsets for both tsi and
inmme. The time.Now() calls shows up in profiles and is not needed.
This commit adds initial empty sketches back to the tsi1 index, as well
as ensuring that ephemeral sketches in the index `LogFile` are updated
accordingly.
The commit also adds a test that verifies that the merged sketches at
the store level produce the correct results under writes, deletions and
re-opening of the store.
This commit does not provide working sketches for post-compaction on the
tsi1 index.
If all the series in a measurement were tombstone, MeasurementHasSeries
would return true because the ok var was re-used from a prior check
earlier in the func. This caused it to be true all the time unless
the measurment was actually tombstoned.
This separates out the dropping of a measurement from the series
to avoid frequent checks to see if a measurement still has series.
The series are dropped individually and we keep track of which
measurements are involved and then delete each measurment afterwards.
Now that each shard-local index is maintaining a bitset of series ids,
tracking the series present in the local shard's tsm engine, there is no
need to track shards in the `inmem` index.
This commit removes the methods associated with tracking those
series/shard relationships.
There are two series key formats: the `models` package format, which is
also line-protocol format, and the `tsdb` package format, which is used
by the series file when serialising series keys.
When writing to a series, rather than taking a `models` format key from
the `coordinator` package and then converting it to a `tsdb` package
format, it would be cheaper to keep the key in the `models` format
before hashing it to determine which partition the key lives in.
This commit adds the ability to correctly mark a series as deleted in
the global series file. Whenever a shard engine determines that a series
should be deleted, it checks with each shard's bitset for series that
are to be deleted and are no longer contained in any shard-local
bitsets.
These series are then removed from the series file.
This commit adds a bitset into each shard's in-memory index, to be used to
track undeleted series ids. Currently tsi1 support is not implemented.
When new series are added to the shard, the series id is added
to the bitset. When series are deleted from the shard, the series
ids are removed from the bitset.
Becasue each shard shares the same inmem index reference, the bitset
is stored in the `ShardIndex`, which is local to each shard, and then
different references are passed into the shared `Index` object, depending
on which shard is writing the series.
* only call ParseTags when necessary
* remove dependency on inmem.Series in tsdb test package
* Measurement and Series are no longer exported. Their use is restricted
to the inmem package
* improve Measurement and Series types by exporting immutable
fields and removing unnecessary APIs and locks
Reduced startup time from 28s to 17s. Overall improvement including
#9162 reduces startup from 46s to 17s for 1MM series across 14 shards.
This commit ensures that the series file should work appropriately on
32-bit architecturs. It does this by reducing the maximum size of a
series file to 512MB on 32-bit systems, which should be fully
addressable.
It further updates tests so that the series file size can be reduced
further when running many tests in parallel on 32-bit architectures.
This commit firstly ensures that a shard's size on disk is accurately
reported when using the tsi1 index, by including the on-disk size of the
tsi1 index in the calculation.
Secondly, this commit add support for shard streaming/copying when using
the tsi1 index. Prior to this, a tsi1 index would not be correctly
restored when streaming shards.