A write lock was being taken to read the memory size to determine if writes
should be paused. What happens is that writers get blocked indefintely when
trying to acquire a write lock which makes writes pause (or stop) for long periods
of time.
The log was deferring the release of the read lock on the WAL. This had
the affect that a read-lock was held until after the partition finished writing
(which maintains it's own locks). The read lock is only needed around the call
to pointsToPartions so it can get a consistent copy of the points to write. After
that calls returns, a lock is not needed so free it immediatedly.
addToCache is called in a goroutine and can panic if the server is closed while opening. If
part of the open func errors, it returns an error and immediately calls close. close sets
p.cache to nil which causes the goroutine trying to initialized the cache to panic as well. The
goroutine should run under a write lock to avoid this race/panic.
If LoadMetadataIndex() tries to log an error, it causes a panic because the
logger is not set until Open() is called, which is after LoadMetaDataIndex() returns.
Instead, just set the logger up when the WAL is created.
This commit changes the default block size from 64KB to 4KB for
bz1. This was lowered because small blocks were being uncompressed,
merged, recompressed, and inserted for a large portion of updates.
This became slower and slower over time until it reached the 64KB
threshold. We moved to the 4KB threshold in order to lower the
impact of this recompression.
The buffer allocation in bz1 was unused and I'm fairly certain that it
was harmful to performance if used. For queries that run through a bz1
block, needing to hold on to a 64kb block is expensive. Better to churn
on the allocator and have the blocks be released when they are unused
than to have 64kb hanging around for each series regardless of size.
Thanks to @jwilder for brainstorming this issue with me.
By using preallocated buffers for marshaling WAL entries, we can
reduce the amount of memory we allocate.
On a run of `influx_stress -series 10000 -points 1000` this cuts
total allocations from 18684.15MB to 15200.73MB
* Update the store to remove the WAL directories associated with a shard or database when they are deleted.
* Fix the Store so that it creates separate WAL directories for databases and retention policies.
This commit changes the bz1 append to check for a small
ending block first. If the block is below the threshold
for block size then it is rewritten with the new data
points instead of having a new block written.
If a flush is happening and you bring up a cursor for a series, if that series didn't have any data in the cache (after the flush started) then it would return no data. What it should have done instead is return the data that is in the flush cache, which is held in separate area of memory until it is committed to the index.