Commit Graph

137 Commits (e6aa5023eb4e54dbde7e8133ea426aec46fda9c2)

Author SHA1 Message Date
Jonathan A. Sternberg e415d0b10f Merge pull request #8835 from influxdata/js-uint-write-protocol
Add uint support into the write protocol
2017-10-09 10:55:16 -05:00
Jonathan A. Sternberg 2f47c3d28f Add support for uint64 in the clients 2017-10-05 09:35:06 -05:00
Jonathan A. Sternberg ff7b576389 Add uint support into the write protocol
This is currently protected behind a conditional compilation flag. Use
`-tags uint` or `-tags uint64` to enable this.
2017-09-19 10:44:26 -05:00
Jason Wilder a8d9eeef36 Reduce lock contention when deleting high cardinality series
Deleting high cardinality series could take a very long time, cause
write timeouts as well as dead lock the process.  This fixes these
issue to by changing the approach for cleaning up the indexes and
reducing lock contention.

The prior approach delete each series and updated every index (inmem)
during the delete.  This was very slow and cause the index to be locked
while it items in a slice were removed one by one.  This has been changed
to mark series as deleted and then rebuild the index asynchronously which
speeds up the process.

There was also a dead lock that could occur when deleing the field set.
Deleting the field set held a write lock and the function it invoked under
the lock could try to take a read lock on the field set.  This would then
deadlock.  This approach was also very slow and caused time out for writes.
It now uses faster approach that checks for the existing of the measurment
in the cache and filestore which does not take write locks.
2017-09-07 11:36:02 -06:00
Ben Johnson 60ab1282ea
Refactor system iterators.
Previously pseudo iterators could be created for meta data such
as series, measurement, and tag data. These iterators were created
at a higher level and lacked a lot of the power of the query engine.

This commit moves system iterators down to the series level and
supports the following:

	- _name
	- _seriesKey
	- _tagKey
	- _tagValue
	- _fieldKey

These can be used as normal fields such as:

	SELECT _seriesKey FROM cpu

This will return all the series keys for `cpu`.
2017-08-16 09:27:29 -06:00
Stuart Carnie eec80692c4 Taught tsm1 storage engine how to read and write uint64 values
* introduced UnsignedValue type
  * leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points

NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
2017-07-24 09:03:22 -07:00
lrita 72fcf6283e optimize point split, which reduce unnecessary allocate 2017-06-15 16:28:49 +08:00
Joe LeGasse 815f740f4c initial fga work
wip

wip

fix tests / build
2017-05-26 13:16:27 -07:00
Jason Wilder 5372db6327 Fix point validation to include field key length
The series key stored in TSM files includes the field.  We validated
the series length using only the measurement and tag set which allowed
very large field names to overflow.  This now checks the series key
as the measurement + tagset + field + the tsm field key separator size.
2017-05-24 14:39:54 -06:00
Jason Wilder 2cac46ebbc Convert usage of strings to []byte
Measurement name and field were converted between []byte and string
repetively causing lots of garbage.  This switches the code to use
[]byte in the write path.
2017-05-12 14:05:19 -06:00
Stuart Carnie 2770a17095 Merge pull request #8207 from stuartcarnie/master
Fixes issue #8199
2017-04-26 15:32:09 -07:00
Jason Wilder 5c51ae7319 Merge branch '1.2' into jw-merge-123 2017-04-14 14:36:54 -06:00
Jason Wilder ff1270dfeb Fix dropping fields created data corruption
The Point is intended to be immutable after being parsed since it
is shared by several goroutines.  When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.

To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
2017-04-07 12:58:42 -06:00
Jason Wilder c3e0748bd9 Optimize Point.NewPointFromBytes
There was a check to ensure that fields exists when unmarshalBinary
is called.  This created a map and other garbage just to see if any
fields exist.

This changes it to use a FieldIterator that does not allocate as
much as the other method.
2017-04-06 12:51:45 -06:00
Jason Wilder 1a4b1b3109 Fix delete time fields creating unparseable points
If a field was named time was written and was subsequently dropped,
it could leave a trailing comma in the series key causing it to fail
to be parseable in other parts of the code.
2017-04-04 16:37:51 -06:00
Jason Wilder abaf42fbab Merge pull request #8251 from influxdata/jw-time-delete
Fix delete time fields creating unparseable points
2017-04-04 18:37:00 -04:00
Jason Wilder ca55fff12c Fix delete time fields creating unparseable points
If a field was named time was written and was subsequently dropped,
it could leave a trailing comma in the series key causing it to fail
to be parseable in other parts of the code.
2017-04-04 16:19:11 -06:00
Stuart Carnie 9e69bdbef5 Fixes issue #8199
Benchmarks

```
benchmark              old ns/op     new ns/op     delta
BenchmarkMarshal-8     1216          1007          -17.19%

benchmark              old allocs     new allocs     delta
BenchmarkMarshal-8     4              2              -50.00%

benchmark              old bytes     new bytes     delta
BenchmarkMarshal-8     416           256           -38.46%
```
2017-03-25 13:33:36 -07:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Ben Johnson dffd12319c
Add point.UnmarshalBinary() bounds checking. 2017-03-01 12:01:25 -07:00
Jason Wilder a024003f2c Merge branch '1.2' into jw-merge-12 2017-02-22 12:13:29 -07:00
Ben Johnson 78a9bb2527 Remove Tags.shouldCopy, replace with forceCopy on series creation.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.

This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
2017-02-21 11:13:35 -07:00
Edd Robinson 4fbba8234e Add Size to models.Tags 2017-02-08 18:44:48 +00:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Edd Robinson fb7388cdfc Remove dead code from various pkgs 2017-01-17 09:47:34 -08:00
Joe LeGasse cd00085e9e Adjust Tags cloning
This change delays Tag cloning until a new series is found, and will
only clone Tags acquired from `ParsePoints...` and not those referencing
the mmap-ed files (TSM) that are created on startup.
2017-01-13 13:15:36 -05:00
Mark Rushakoff cdbdd156f3 Fix memory leak of retained HTTP write payloads
This leak seems to have been introduced in 8aa224b22d,
present in 1.1.0 and 1.1.1.

When points were parsed from HTTP payloads, their tags and fields
referred to subslices of the request body; if any tag set introduced a
new series, then those tags then were stored in the in-memory series
index objects, preventing the HTTP body from being garbage collected. If
there were no new series in the payload, then the request body would be
garbage collected as usual.

Now, we clone the tags before we store them in the index. This is an
imperfect fix because the Point still holds references to the original
tags, and the Point's field iterator also refers to the payload buffer.
However, the current write code path does not retain references to the
Point or its fields; and this change will likely be obsoleted when TSI
is introduced.

This change likely fixes #7827, #7810, #7778, and perhaps others.
2017-01-12 16:16:54 -08:00
Mark Rushakoff a135906b43 Merge pull request #7747 from influxdata/mr-lint-cleanup
Miscellaneous lint cleanup
2017-01-10 08:22:00 -08:00
Ben Johnson 62d2b3ebe9
Series filtering. 2017-01-05 10:02:42 -07:00
Ben Johnson 62269c3cea
intermediate 2017-01-05 10:02:41 -07:00
Ben Johnson 8863e3c0f3
Refactor tsi1 merge iterators, finish multi-file compaction. 2017-01-05 10:01:25 -07:00
Ben Johnson 2a81351992
Implement tsdb.Index interface on tsi1.Index. 2017-01-05 10:00:43 -07:00
Ben Johnson ac9c6a0207
Add TSI index benchmark. 2017-01-05 09:34:37 -07:00
Ben Johnson 8d40ceb00c
TSI1 Index 2017-01-05 09:34:36 -07:00
Mark Rushakoff 6a94d200c8 Merge remote-tracking branch 'influx/master' into mr-godoc 2017-01-04 13:27:36 -08:00
Mark Rushakoff 4aedd29b02 Merge pull request #7788 from influxdata/mr-point-data-methods
Remove unused methods from Point: Data, SetData
2017-01-04 13:15:38 -08:00
Mark Rushakoff 3e473dd262 Remove unused methods from Point: Data, SetData
These are not called anywhere in the TICK stack that I can see.
2017-01-03 16:11:40 -08:00
Cory LaNou 3c518f8927
panicing is bad -> error returns are good 2017-01-03 14:28:29 -06:00
Mark Rushakoff 07b87f2630 Miscellaneous lint cleanup 2017-01-03 09:47:32 -08:00
Mark Rushakoff 7b5b3189dd Update godoc for package models 2016-12-30 18:02:52 -08:00
Mark Rushakoff b896c56b5a Add test for marshalling a point without fields 2016-12-28 12:09:30 -08:00
Mark Rushakoff f6a3ffecaf Require fields when marshalling Point
I haven't been able to reproduce creating a point without any fields,
but we've seen points in the wild that have been marshalled with no
fields - that is, the length header for fields is uint32(0) and a
well-formed encoded time follows.

Attempting to unmarshal points via NewPointFromBytes returns
ErrPointMustHaveAField, so it seems better to fail earlier with the same
error, rather than allowing those points to be serialized in the first
place.
2016-12-23 19:43:15 -08:00
kun de4436e9d9 fix scan tag value panic 2016-12-20 08:39:18 +08:00
Mark Rushakoff 15ba7958f8 Use strings.Replacer to escape string field
benchmark                                        old ns/op     new ns/op     delta
BenchmarkEscapeStringField_Plain-4               167           65.3          -60.90%
BenchmarkEscapeString_Quotes-4                   167           165           -1.20%
BenchmarkEscapeString_Backslashes-4              211           184           -12.80%
BenchmarkEscapeString_QuotesAndBackslashes-4     413           397           -3.87%

BenchmarkExportTSMStrings_100s_250vps-4     33833611      27381442           -19.07%
BenchmarkExportWALStrings_100s_250vps-4     34977761      29222717           -16.45%

benchmark                                        old allocs     new allocs     delta
BenchmarkEscapeStringField_Plain-4               4              1              -75.00%
BenchmarkEscapeString_Quotes-4                   4              3              -25.00%
BenchmarkEscapeString_Backslashes-4              5              3              -40.00%
BenchmarkEscapeString_QuotesAndBackslashes-4     9              5              -44.44%

BenchmarkExportTSMStrings_100s_250vps-4     201605          76938              -61.84%
BenchmarkExportWALStrings_100s_250vps-4     225371         100728              -55.31%

benchmark                                        old bytes     new bytes     delta
BenchmarkEscapeStringField_Plain-4               56            16            -71.43%
BenchmarkEscapeString_Quotes-4                   56            48            -14.29%
BenchmarkEscapeString_Backslashes-4              104           80            -23.08%
BenchmarkEscapeString_QuotesAndBackslashes-4     208           160           -23.08%

BenchmarkExportTSMStrings_100s_250vps-4     10872629       6062048           -44.24%
BenchmarkExportWALStrings_100s_250vps-4     10094933       5269980           -47.80%
2016-12-17 23:46:58 -08:00
Jason Wilder 2f776ea9e1 Fix string fields w/ trailing slashes
A string field w/ a trailing slash before the quote would parse incorrectly
because the quote would be seen as escaped.  We have to treat \\ as an
escape sequence within strings in order to handle this.
2016-12-01 15:24:11 -07:00
Jonathan A. Sternberg b4db76cee2 Introduce syntax for marking a partial response with chunking
The `partial` tag has been added to the JSON response of a series and
the result so that a client knows when more of the series or result will
be sent in a future JSON chunk.

This helps interactive clients who don't want to wait for all of the
data to know if it is done processing the current series or the current
result. Previously, the client had to guess if the next chunk would
refer to the same result or a new result and it had to match the name
and tags of the two series to know if they were the same series. Now,
the client just needs to check the `partial` field included with the
response to know if it should expect more.

Fixed `max-row-limit` so it counts rows instead of results and it
truncates the response when the `max-row-limit` is reached.
2016-11-22 11:16:22 -06:00
Jason Wilder 8fce6bba48 Add tag value cardinality limit 2016-10-10 11:42:15 -06:00
Jason Wilder 2ae6b5e1ed Replace uses of newFieldsFromBinary with FieldIterator 2016-10-03 16:30:21 -06:00
Joe LeGasse 743946fafb models: Add FieldIterator type
The FieldIterator is used to scan over the fields of a point, providing
information, and delaying parsing/decoding the value until it is needed.
This change uses this new type to avoid the allocation of a map for the
fields which is then thrown away as soon as the points get converted
into columns within the datastore.
2016-10-03 16:30:21 -06:00