The monitor goroutine calls enable compactions every 10s to spin down
(or start up) goroutines for cold shards. This frequent Lock may be
causing lock contention for writes and queries which get blocked trying
to acquire an RLock.
The go RWMutex says that new RLock calls will block if there is a
pending Lock call that is blocked. Switching the common path to use
an RLock should avoid the Lock and reduce lock contention for writes
and queries.
This commit adds a new environment variable INFLUXDB_PANIC_CRASH, which
when set to a truthy value, e.g., true, TRUE, 1, will prevent the server
from recovering from a panic.
Recover currently occurs in two places: the HTTP handler and the
QueryExecutor. INFLUXDB_PANIC_CRASH will control both.
Further, this commit adds _internal stats that will monitor the
occurrence of panics all the time (regardless of if INFLUXDB_PANIC_CRASH
has been set to true or not).
The recovered panic frequency can be inspected with the following
queries:
SELECT "recoveredPanics" FROM "_internal"."monitor"."httpd";
SELECT "recoveredPanics" FROM "_internal"."monitor"."queryExecutor";
Currently two write locks in `inmem` are obtained and then
manually unlocked at function exit points. However, we have
reports that the `inmem` index is hanging on a write lock and
cannot track the issue down to anything else besides a lock
that could have been left unlocked because of a panic.
This commit changes the two locks to always defer their unlocks
to prevent these hangs.
This dependency appears to only be used when building for Solaris (mmap
dependency). It looks like this wasn't caught earlier due to gdm using
the current build tags to determine which files to process when saving
dependencies.
The Points channel is nil until after the subscriber service is opened.
If it is append before it's opened, the PointsWriter holds onto the
old reference.
This fixes the case where log files are compacted out of order
and cause non-contiguous sets of index files to be compacted.
Previously, the compaction planner would fetch a list of index files
for each level and compact them in order starting with the oldest
ones. This can be a problem for level 1 because level 0 (log files)
are compacted individually and in some cases a log file can finish
compacting before older log files are finished compacting. This
causes there to be a gap in the list of level 1 files that is
ignored when fetching a list of index files.
Now, the planner reads the list of index files starting from the
oldest but stops once it hits a log file. This prevents that gap
from being ignored.