Commit Graph

760 Commits (7374e4e8a457350ab4ba8cb89bdc3cd7d5c2f905)

Author SHA1 Message Date
Edd Robinson 101af89987 Update CHANGELOG 2017-07-05 16:35:41 +01:00
Edd Robinson 0748d28986 Ensure tmp files cleaned up when compaction disabled 2017-07-04 20:04:23 +01:00
Stuart Carnie 46796d932f add database to index, engine and shard; call AuthorizeSeriesRead 2017-05-26 13:21:50 -07:00
Joe LeGasse 815f740f4c initial fga work
wip

wip

fix tests / build
2017-05-26 13:16:27 -07:00
Jason Wilder 208ef09f87 Prevent writing series keys that exceed max key size
WriteBlock was missing the check for the max series keys which allowed
series keys to be written that were larger than the 2 bytes allocated
to store their length.  When this occurred, the TSM can fail to load.
2017-05-24 13:41:09 -06:00
Jason Wilder 29e4287fd2 Preven masking root errors when compactions are in progress
The root error when creating a tmp file when writing a snapshot
was hidden making it difficult to determine why snapshots were
failing.
2017-05-23 12:09:36 -06:00
Jason Wilder bd6d0681e9 Ensure planned files are released
The defer was never executed because the planning happens in a
long running goroutine that loops.  The plans need to be released
immediately after applying them.
2017-05-23 12:08:25 -06:00
Jason Wilder 4e582f297a Fix race in findGenerations
It was possible that the findGenerations could get stuck returning
no files even when generations existed on disk.
2017-05-23 12:05:47 -06:00
Jason Wilder 1833475c09 Fix TSM tmp files leaking
TMP files could leak when compactions failed for various reasons. They
were also being deleted inadvertently when compactions were disabled causing
other errors to be reported in the logs.
2017-05-22 14:51:18 -06:00
Stuart Carnie c863923e68 cache MarshalSize 2017-05-12 14:05:25 -06:00
Stuart Carnie 0151afe31c check size and allocate once 2017-05-12 14:05:25 -06:00
Stuart Carnie 096d6f65b4 explicit sizes 2017-05-12 14:05:24 -06:00
Jason Wilder 4d002bb370 Limit concurrent compactions within a shard
This changes full compactions within a shard to run sequentially
instead of running all the compaction groups in parallel.  Normally,
there is only 1 full compaction group to run.  At times, there could
be several which causes instability if they are all running concurrently
as they tie up a cpu for long periods of time.

Level compactions are also capped to a max of 4 concurrently running for each level
in a shard.  This prevents sudden spikes in CPU and disk usage due to a large backlog
of tsm files at a given level.
2017-05-12 14:05:24 -06:00
Jason Wilder 2cac46ebbc Convert usage of strings to []byte
Measurement name and field were converted between []byte and string
repetively causing lots of garbage.  This switches the code to use
[]byte in the write path.
2017-05-12 14:05:19 -06:00
Jason Wilder 503d41a08f Add LimitedBytePool for wal buffers
This pool was previously a pool.Bytes to avoid repetitive allocations.
It was recently switchted to a sync.Pool because pool.Bytes held onto
very larger buffers at times which were never released.  sync.Pool is
showing up in allocation profiles quite frequently.

This switches the pool to a new pool that limits how many buffers are
in the pool as well as the max size of each buffer in the pool.  This
provides better bounds on allocations.
2017-05-11 11:27:00 -06:00
Jason Wilder e17be9f4ba Merge pull request #8377 from influxdata/jw-encoders
Speed up time encoding/decoding
2017-05-11 10:38:27 -06:00
Joe LeGasse 087d9f4670 tsm: fixed test to not require sorted backup tarball 2017-05-11 12:00:19 -04:00
Jason Wilder b150a6293c Merge pull request #8380 from influxdata/jw-wal-buffer
Use buffer writer for wal segments
2017-05-11 08:34:44 -06:00
Jason Wilder b81ac21bcb Merge pull request #8378 from influxdata/jw-snapshot-disable
Don't disable snapshots when snapshot compactions are disabled
2017-05-10 12:00:27 -06:00
Jason Wilder e102fcca9c Use buffer writer for wal segments 2017-05-10 11:42:32 -06:00
Jason Wilder 39a829c1ae Speed up time encoding/decoding
This speeds up time encoding and decoding by skipping the divisor
scaling if scaling by 1.  Since division and multiplication are expensive
cpu and scaling by 1 has no effect, this just slows encoding and decoding
down.
2017-05-10 11:12:35 -06:00
Jason Wilder 4e3e707abc Fix packed time encoded benchmark 2017-05-10 10:35:44 -06:00
Jason Wilder 29c2b1958e Fix deletes triggering unnecessary compactions
Tombstone files would be written to all TSM files even if the deleted
keys or timerange did not exist in the TSM file.  This had the side
effect of causing shards to get recompacted back to the same state. If
any shards or large numbers of TSM files existed, disk usage and CPU
utilization would spike causing issues.

This prevents tombstones being written for TSM files that could not
possiby contain the series keys being deleted or if the delted time
range is outside the range of the file.
2017-05-08 14:52:28 -06:00
Jason Wilder c0c6ad6880 Don't disable snapshots when snapshot compactions are disabled
Snapshot compactions can be disabled independently of snapshotting
capability.  This prevents taking backups of shards that have compactions
disabled.
2017-05-05 14:15:45 -06:00
Jason Wilder bc639c5982 Make disableLevelCompactions lighter weight
Since this is called more frequently now, the cleanup func was invoked
quite a bit which makes several syscalls per shard.  This should only
be called the first time compactions are disabled.
2017-05-04 09:56:15 -06:00
Jason Wilder b4ea523910 Include snapshot size in the total cache size
This was causing a shard to appear idle when in fact a snapshot compaction
was running.  If the time was write, the compactions would be disabled and
the snapshot compaction would be aborted.
2017-05-03 16:31:58 -06:00
Jason Wilder 88848a9426 Remove per shard monitor goroutine
The monitor goroutine ran for each shard and updated disk stats
as well as logged cardinality warnings.  This goroutine has been
removed by making the disks stats more lightweight and callable
direclty from Statisics and move the logging to the tsdb.Store.  The
latter allows one goroutine to handle all shards.
2017-05-03 16:31:57 -06:00
Jason Wilder f87fd7c7ed Stop background compaction goroutines when shard is cold
Each shard has a number of goroutines for compacting different levels
of TSM files.  When a shard goes cold and is fully compacted, these
goroutines are still running.

This change will stop background shard goroutines when the shard goes
cold and start them back up if new writes arrive.
2017-05-03 16:31:57 -06:00
Jason Wilder 3d1c0cd981 Don't return compaction plans for files already part of a plan
The compactor prevents the same file from being compacted by different
compaction runs, but it can result in warning errors in the logs that
are confusing.

This adds compaction plan tracking to the planner so that files are
only part of one plan at a given time.
2017-05-03 16:31:57 -06:00
Jason Wilder 8fc9853ed8 Add max-concurrent-compactions limit
This limit allows the number of concurrent level and full compactions
to be throttled.  Snapshot compactions are not affected by this limit
as then need to run continously.

This limit can be used to control how much CPU is consumed by compactions.
The default is to limit to the number of CPU available.
2017-05-03 16:31:57 -06:00
Jason Wilder 3c130cd39c Expose TSMWriter.Flush
Allows flushing the writer so we don't always need to close and
re-open the file handle.
2017-04-28 14:00:50 -06:00
Jason Wilder 141f0d71cd Update index when import files 2017-04-28 14:00:45 -06:00
Jason Wilder a76146e34a Add Store.Import capability
This allows the contents of a backup to be imported into a shard without
requiring the whole shard to be replaced.
2017-04-28 13:30:46 -06:00
Jason Wilder 3839fe34ea Remove FileStore.Add/Remove
Can use Replace which handles files in-use and stats correctly.
2017-04-28 13:20:55 -06:00
Jason Wilder 137d0c0d09 Rename WAL.WritePoints to WAL.WriteMulti
To match Cache.WriteMulti
2017-04-28 13:20:55 -06:00
Jason Wilder 28422f2fec Use consistent receiver var name for Value types 2017-04-28 13:20:55 -06:00
Jason Wilder 1bc4936336 Export Reader.ReadBytes 2017-04-28 13:20:55 -06:00
Jason Wilder d88604f6f2 Move repetive loop checks outside of values loop 2017-04-20 13:45:04 -06:00
Jason Wilder 888689f5d3 Move values loop under type switch
All the values read must be of the same type so repeatedly using
the type switch is confusing and less efficiient.
2017-04-20 13:39:49 -06:00
Jason Wilder b0988511bf Use fixed size array instead of slice 2017-04-20 13:38:33 -06:00
Jason Wilder da6bdfdda8 Use bufio.Reader when reading wal segments
Reduces disk IO due to small reads.
2017-04-20 13:33:42 -06:00
Jason Wilder 8e9cbd7ffc Simplify WALSegmentReader.UnmarshalBinary
There were two loops over nvals which created some extra allocation
which coudl be replaced with a simplet slice capacity and append.
2017-04-20 13:33:42 -06:00
Jason Wilder ef65ee77f4 Switch WAL byte pools to sync/pool
The current bytes.Pool will hold onto byte slices indefinitely. Large
writes can cause the pool to hold onto very large buffers over time.
Testing w/ sync/pool seems to perform similarly now so using a sync/pool
will allow these buffers to be GC'd when necessary.
2017-04-20 12:28:42 -06:00
Jason Wilder d155d37ca8 Reduce TSM write buffer
When many TSM files are being compacted, the buffers can add up fairly
quickly.
2017-04-20 12:28:42 -06:00
Jason Wilder d7c5dd0a3e Reduce wal sync goroutine churn
Under high write load, the sync goroutine would startup, and end
very frequently.  Starting a new goroutine so frequently adds a small
amount of latency which causes writes to take long and sometimes timeout.

This changes the goroutine to loop until there are no more waiters which
reduce the churn and latency.
2017-04-20 12:28:34 -06:00
Jason Wilder aa9925621b Fix deadlock in wal
If the sync waiters channel was full, it would block sending to the
channel while holding a the wal write lock.  The sync goroutine would
then be stuck acquiring the write lock and could not drain the channel.

This increases the buffer to 1024 which would require a very high write
load to fill as well as retuns and error if the channel is full to prevent
the blocking.
2017-04-19 11:33:13 -06:00
Jason Wilder 5c51ae7319 Merge branch '1.2' into jw-merge-123 2017-04-14 14:36:54 -06:00
Jason Wilder ff1270dfeb Fix dropping fields created data corruption
The Point is intended to be immutable after being parsed since it
is shared by several goroutines.  When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.

To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
2017-04-07 12:58:42 -06:00
Ben Johnson 9c97cd8601
Merge remote-tracking branch 'upstream/master' into tsi 2017-04-04 12:46:09 -06:00
Jason Wilder 5fa8073fc2 Merge branch '1.2' into jw-merge-123 2017-04-04 11:12:06 -06:00