This calculates the start and end time along with any time zones shifts
so that continuous queries are run at the correct time when a time zone
is included in the query.
Removing the forced `Connection: close` header from the `/query`
endpoint. This was originally added because of golang/go#13165, but it
seems like it's possible to use pipelining with go 1.8 and http 1.1,
just not recommended.
After some testing, it appears that the channel returned by
`ResponseWriter.CloseNotify()` will not send a value if the connection
was not interrupted. We already account for this in /query by exiting
from the goroutine if the request has finished by signaling another
channel.
Since the handler already accounts for the possibility that the channel
will not signal and since `CloseNotify()` doesn't interfere with a
pipelined request, we can remove the forced `Connection: close` that was
added to force clients to establish a new connection.
The Go timestamp leads Truncate to start a week on Monday, but the query
engine uses unix timestamps which has the week start on a Thursday.
Updating the service so it uses a custom truncate method that uses the
unix timestamp instead of `time.Time`.
Fixes#8569.
The sortedSeriesIds slice was not getting reset to 0 which caused
the same series ids to exist in the slice more than once. Since
the size of the slice never matched the size of the seriesID map,
it kept appendending to the slice and sorting it which cause multiple
cursor to get created for the same series.
Fixes#8531
There was a race in the WAL writeToLog and scheduleSync which could
lead to a writing goroutine blocking indefinitely on its syncErr channel.
The issue was that the clearing of the syncCount happenend after the
wal was unlock. If a goroutine was able to lock, write and call scheduleSync
before the existing scheduleSync goroutine returns and ran the defer to
clear the syncCount, then a new scheduleSync goroutine would not get started.
This left the writing goroutine block with nothing to signal it.
While in this state, a RLock on the engine was held. If a Lock was requested
on the engine during this time, all future writes and queries would block waiting
on the blocked wal writer.
The fix is to move the atomic clearing of syncCount before the Lock is released.
The min key was not used in OverlapsKeyRange which caused it to return
false when it should be true. This causes a bug where deletes would not
write tombstones for files that actually contained the data it was supposed
to delete.
The in-memory index can get out of sync when deletes and writes
to the same measurement are running concurrently. The index is
updated independently from data on disk and it's possible for the
index to unassign a shard when data still exists on disk. What happens
is that there are TSM files on disk, but the index does not know that
the series that exist in those files still are in the shard. Restarting
the server reloads the index and the data is visible again. From and
end user perspective, this can look like more data is deleted than should
have been or that deleted data re-appears after a restart or writes to the
shard occur again.
There isn't an easy way to resolve this since the index and storage
are not transactional resources and we cannot atomically commit or
rollback changes to both at once.
As a workaround, after new TSM files are installed, we refresh the
index with series keys that exist in the new tsm files as well as
any lingering data still in the cache. There is a small window of time
when the index may be missing series, but it will re-appear after the refresh
completes.
This adds a v3 format that is a gzip compressed version of the v2
format. It reduces the size of tombstone files substantially without
having to support a more feature rich file format for tombstones.
The monitor goroutine calls enable compactions every 10s to spin down
(or start up) goroutines for cold shards. This frequent Lock may be
causing lock contention for writes and queries which get blocked trying
to acquire an RLock.
The go RWMutex says that new RLock calls will block if there is a
pending Lock call that is blocked. Switching the common path to use
an RLock should avoid the Lock and reduce lock contention for writes
and queries.
The react library was removed when the admin gui was removed;
however, a reference to the library lingered in the third
party license file.
closesinfluxdata/plutonium#1426
This commit adds a new environment variable INFLUXDB_PANIC_CRASH, which
when set to a truthy value, e.g., true, TRUE, 1, will prevent the server
from recovering from a panic.
Recover currently occurs in two places: the HTTP handler and the
QueryExecutor. INFLUXDB_PANIC_CRASH will control both.
Further, this commit adds _internal stats that will monitor the
occurrence of panics all the time (regardless of if INFLUXDB_PANIC_CRASH
has been set to true or not).
The recovered panic frequency can be inspected with the following
queries:
SELECT "recoveredPanics" FROM "_internal"."monitor"."httpd";
SELECT "recoveredPanics" FROM "_internal"."monitor"."queryExecutor";
Currently two write locks in `inmem` are obtained and then
manually unlocked at function exit points. However, we have
reports that the `inmem` index is hanging on a write lock and
cannot track the issue down to anything else besides a lock
that could have been left unlocked because of a panic.
This commit changes the two locks to always defer their unlocks
to prevent these hangs.
This dependency appears to only be used when building for Solaris (mmap
dependency). It looks like this wasn't caught earlier due to gdm using
the current build tags to determine which files to process when saving
dependencies.