This PR adds a configuration option that can be used to inform the
kernel that we intent to page in much of the TSM files.
This madvise value has been problematic in the past when its been set,
so this option defaults to off. It may be useful to some users with slow
disks.
This commit adds the `max-index-log-file-size` configuration flag so
that users can restrict the maximum size of log files before compaction.
The default limit was also lowered from `5MB` to `1MB`. The original
size was set before we partitioned the index so the change reflects this.
With the recent changes to compactions and snapshotting, the current
default can create lots of small level 1 TSM files. This increases
the default in order to create larger level 1 files and less disk
utilization.
Update support in the `toml` package for parsing human-readble byte sizes.
Supported size suffixes are "k" or "K" for kibibytes, "m" or "M" for
mebibytes, and "g" or "G" for gibibytes. If a size suffix isn't specified
then bytes are assumed.
In the config, `cache-max-memory-size` and `cache-snapshot-memory-size` are
now typed as `toml.Size` and support the new syntax.
This limit allows the number of concurrent level and full compactions
to be throttled. Snapshot compactions are not affected by this limit
as then need to run continously.
This limit can be used to control how much CPU is consumed by compactions.
The default is to limit to the number of CPU available.
Fsyncs to the WAL can cause higher IO with lots of small writes or
slower disks. This reworks the previous wal fsyncing to remove the
extra goroutine and remove the hard-coded 100ms delay. Writes to
the wal still maintain the invariant that they do not return to the
caller until the write is fsync'd.
This also adds a new config options wal-fsync-delay (default 0s)
which can be increased if a delay is desired. This is somewhat useful
for system with slower disks, but the current default works well as
is.
These were all b1/bz1 settings that no longer have any effect:
- {Default,}MaxWALSize
- {Default,}WALFlushInterval
- {Default,}WALPartitionFlushDelay
- {Default,WAL}ReadySeriesSize
- {Default,WAL}CompactionThreshold
- {Default,WAL}MaxSeriesSize
- {Default,WAL}FlushColdInterval
- {Default,WAL}PartitionSizeThreshold
This has a few changes in it (unfortuantely). The main change is to run compactions
concurrently. While implementing this, a few query and performance bugs showed up that
are also fixed by this commit.
* Update Plan to do a full compaction if cold for writes
* Remove MaxFileSize as a config variable from Compactor. Should be a set constant
* Update Plan to keep track of if the last check was fully compacted so we can skip future planning calls
* Update compact min file count to 3 so that compactions run more frequently
* remove rolloverTSMFileSize constant that is no longer used
* remove the maxGenerationFileCount since it is no longer a limitation that's necessary with the new compaction scheme. We no longer read WAL segments as part of the compaction so memory is only used as we read in each individual key
* remove minFileCount and switch to a user configurable variable
* remove the mutex from WALSegmentWriter. There's never more than one open in the WAL at one time and it's not exported through any function so the lock on the WAL should be used. This simplified keeping track of the last write time and removed a bunch of unnecessary locks.
* update WALSegmentWriter.Write to take the compressed bytes so that encoding and compression can occur before the call to write (while we don't hold the WAL lock)
* remove a bunch of unnecessary locking in WAL.writeToLog
* Add check for TSM file magic number and vesion
* Remove old tsm, log, and unused cursor code
* Remove references to tsm1dev everywhere except in the inspector
* Clean up config options for compaction and snapshotting
* Remove old TSM configuration options
* Update the config.sample.toml with TSM options
* Update WAL compact to force if it has been cold for writes for a configurable period of time (1h by default)
100mb is easy it hit even with basic stress test config. Don't set
a limit by default so that an operator can size it appropriately based
on their hardware.