There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once. #7015
When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement
were removed and were expanded lower down. This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times instead of just once as before.
While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good. This change just limits the concurrency of the deletes which keeps memory
more stable.
When snapshots and compactions are disabled, the check to see if
the compaction should be aborted occurs in between writing to the
next TSM file. If a large compaction is running, it might take
a while for the file to be finished writing causing long delays.
This now interrupts compactions while iterating over the blocks to
write which allows them to abort immediately.
When a time zone shift happened at the edge of a time bucket, the logic
was incorrect. When we added an hour to the day with a time grouping of
2 hours, we took the hour away from the previous bucket instead of the
next bucket so the first bucket of the day would be from midnight to 1
AM and the second bucket would include 1 AM to 4 AM (2 AM to 3 AM does
not exist when shifting forwards). The correct buckets should have been
12 AM to 2 AM for the first bucket of the day and 3 AM to 4 AM for the
second (since 2 AM to 3 AM does not exist).
This also modifies the tests to use table tests and sub tests.
* introduced UnsignedValue type
* leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points
NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
This calculates the start and end time along with any time zones shifts
so that continuous queries are run at the correct time when a time zone
is included in the query.
Removing the forced `Connection: close` header from the `/query`
endpoint. This was originally added because of golang/go#13165, but it
seems like it's possible to use pipelining with go 1.8 and http 1.1,
just not recommended.
After some testing, it appears that the channel returned by
`ResponseWriter.CloseNotify()` will not send a value if the connection
was not interrupted. We already account for this in /query by exiting
from the goroutine if the request has finished by signaling another
channel.
Since the handler already accounts for the possibility that the channel
will not signal and since `CloseNotify()` doesn't interfere with a
pipelined request, we can remove the forced `Connection: close` that was
added to force clients to establish a new connection.
The Go timestamp leads Truncate to start a week on Monday, but the query
engine uses unix timestamps which has the week start on a Thursday.
Updating the service so it uses a custom truncate method that uses the
unix timestamp instead of `time.Time`.
Fixes#8569.
The sortedSeriesIds slice was not getting reset to 0 which caused
the same series ids to exist in the slice more than once. Since
the size of the slice never matched the size of the seriesID map,
it kept appendending to the slice and sorting it which cause multiple
cursor to get created for the same series.
Fixes#8531
There was a race in the WAL writeToLog and scheduleSync which could
lead to a writing goroutine blocking indefinitely on its syncErr channel.
The issue was that the clearing of the syncCount happenend after the
wal was unlock. If a goroutine was able to lock, write and call scheduleSync
before the existing scheduleSync goroutine returns and ran the defer to
clear the syncCount, then a new scheduleSync goroutine would not get started.
This left the writing goroutine block with nothing to signal it.
While in this state, a RLock on the engine was held. If a Lock was requested
on the engine during this time, all future writes and queries would block waiting
on the blocked wal writer.
The fix is to move the atomic clearing of syncCount before the Lock is released.
The min key was not used in OverlapsKeyRange which caused it to return
false when it should be true. This causes a bug where deletes would not
write tombstones for files that actually contained the data it was supposed
to delete.
The in-memory index can get out of sync when deletes and writes
to the same measurement are running concurrently. The index is
updated independently from data on disk and it's possible for the
index to unassign a shard when data still exists on disk. What happens
is that there are TSM files on disk, but the index does not know that
the series that exist in those files still are in the shard. Restarting
the server reloads the index and the data is visible again. From and
end user perspective, this can look like more data is deleted than should
have been or that deleted data re-appears after a restart or writes to the
shard occur again.
There isn't an easy way to resolve this since the index and storage
are not transactional resources and we cannot atomically commit or
rollback changes to both at once.
As a workaround, after new TSM files are installed, we refresh the
index with series keys that exist in the new tsm files as well as
any lingering data still in the cache. There is a small window of time
when the index may be missing series, but it will re-appear after the refresh
completes.