Commit Graph

84 Commits (7374e4e8a457350ab4ba8cb89bdc3cd7d5c2f905)

Author SHA1 Message Date
Jason Wilder 77afe50f7e Fix panic in ForEachMeasurementTagKey
If a shard was closed, ForEachMeasurementTagKey and TagKeyCardinality
would panic because the engine was nil.
2017-06-13 12:04:32 -06:00
Stuart Carnie 46796d932f add database to index, engine and shard; call AuthorizeSeriesRead 2017-05-26 13:21:50 -07:00
Stuart Carnie 8097e817f6 prefix partial write errors with `partial write:`
NOTE: parser errors (via http API) are also transformed into
PartialWriteError
2017-04-28 11:00:14 -07:00
Jason Wilder 5c51ae7319 Merge branch '1.2' into jw-merge-123 2017-04-14 14:36:54 -06:00
Jason Wilder ff1270dfeb Fix dropping fields created data corruption
The Point is intended to be immutable after being parsed since it
is shared by several goroutines.  When dropping a field (e.g. time),
corrupted data can result if one goroutine is delete the field
while another is marshaling the underlying byte slices.

To avoid this, the shard will just skip invalid fields and series
instead of trying to mutate them by deleting them.
2017-04-07 12:58:42 -06:00
Ben Johnson 9fb8f1ec1d
Fix database and tag limits. 2017-03-24 09:48:10 -06:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Jason Wilder 29f8d8de76 Fix race in WALEntry.Encode and Value.Deduplicate
Under high query load, a race exists in the cache and the WAL.  Since
writes currently hit the cache first, they are availble for query before
they hit the WAL.  If the WAL is writing and accessign the Value slice
at the same time that a query is run that needs to dedup the same slice,
a race occurs.

To fix this, the cache now just copies the values instead of storing the
slice passed in.  Another way to fix this might be to have the writes go
to the wal before the cache.  I think the latter would be better, but it
introduces some larger write path issues that we'd need to also address.
e.g. if the cache was full, writes to the WAL would need to be rejected
to avoid filling the disk.

Copying the slice in the cache is simpler for now and does not appear to
dramatically affect performance.
2017-03-06 09:38:22 -07:00
Jason Wilder a024003f2c Merge branch '1.2' into jw-merge-12 2017-02-22 12:13:29 -07:00
Ben Johnson 78a9bb2527 Remove Tags.shouldCopy, replace with forceCopy on series creation.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.

This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
2017-02-21 11:13:35 -07:00
Mark Rushakoff 601cbcd084 Merge branch '1.2' into mr-merge-12 2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg 2fe48d6781 Rename zap import back to github.com/uber-go/zap
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
2017-02-17 17:17:22 -06:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Edd Robinson feb7a2842c Use unbuffered error channels in tests 2017-01-17 10:53:15 -08:00
Edd Robinson 292b30b82b Fix subtle bugs and remove dead code from tsdb 2017-01-17 09:47:34 -08:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Ben Johnson f9efcb3365
Re-add shared in-memory index. 2017-01-05 10:17:09 -07:00
Edd Robinson 0f9b2bfe6a
Fix tests 2017-01-05 10:16:15 -07:00
Ben Johnson 9f8b206b51
Fix measurement system queries. 2017-01-05 10:15:34 -07:00
Ben Johnson cb93f10120
Remove per-shard in-memory index. 2017-01-05 10:11:09 -07:00
Ben Johnson 409b0165f5
shared in-memory index 2017-01-05 10:09:57 -07:00
Ben Johnson a812502ea3
reintegrating in-memory index 2017-01-05 10:07:35 -07:00
Ben Johnson 1ac067e53b
intermediate 2017-01-05 10:03:09 -07:00
Ben Johnson 62d2b3ebe9
Series filtering. 2017-01-05 10:02:42 -07:00
Ben Johnson 62269c3cea
intermediate 2017-01-05 10:02:41 -07:00
Edd Robinson da63b349a4
Fix bad rebase 2017-01-05 09:59:44 -07:00
Edd Robinson 149b1cef1d
Fix 32bit overflow; limit capacity 2017-01-05 09:59:10 -07:00
Edd Robinson d19fbf5ab4
Wire in HLL estimator 2017-01-05 09:54:03 -07:00
Edd Robinson 05bc4dec00
Refactor 2017-01-05 09:50:23 -07:00
Edd Robinson 2171d9471b
Initialise index in shards 2017-01-05 09:42:48 -07:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Edd Robinson 28ba8ced74 Fixes #7625 2016-11-17 16:31:36 +00:00
Jason Wilder 0b6f5441b9 Add config option to messages when limits exceeded
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded.  The messages don't always provide an indication
as to where or how they are configured.

Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
2016-10-28 14:54:45 -06:00
Jason Wilder 873189e0c2 Fix panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.StringValue
If concurrent writes to the same shard occur, it's possible for different types to
be added to the cache for the same series.  The way the measurementFields map on the
shard is updated is racy in this scenario which would normally prevent this from occurring.
When this occurs, the snapshot compaction panics because it can't encode different types
in the same series.

To prevent this, we have the cache return an error a different type is added to existing
values in the cache.

Fixes #7498
2016-10-28 12:15:50 -06:00
Jason Wilder bbecb3f03d Drop points that would execeed limits
This changes the behavior of the max-series-per-database and
max-values-per-tag limits to drop points that would exceed the limits
and allow the remaining points to be written.  Previously, the whole
batch would fail and return and 500 error to the client.

This now will write the allow points and return a `partial write`
error indicating some of the points were dropped, how many were
dropped and one of the problem measureent and tags.
2016-10-10 11:42:15 -06:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jonathan A. Sternberg 9621bee195 Drop time when used as a tag or field key
The "time" field and tags are unqueryable so we prevent those from being
written so we don't have unreadable data.
2016-08-10 10:02:01 -05:00
David Norton 0c4559722c feat #6679: add series limit config setting 2016-08-01 08:28:46 -04:00
Michael Desa 517d8d5881 Move benchmarks beneath other NewSeries 2016-06-23 10:15:37 -07:00
Michael Desa 0c867e4b2c Fix benchmark test names
Previously the test names included an `s` for the name of a singular
component.
2016-06-16 08:45:36 -07:00
Michael Desa 9dfaa182a7 Add additional benchmarks for various schemas
Anecdotally, the relationship between memory consumption and series
cardinality was thought to be exponential. I suspect that this is false.
The intent of the added benchmarks is to verify my suspicion. Eventually
the these benchmarks will run nightly to serve as a basis to evualuate
the memory performance in a controlled environment.

https://github.com/influxdata/docs.influxdata.com/issues/392
2016-06-15 14:54:14 -07:00
Jason Wilder 1ff8ecf4fb Add ability to disable shards
Disabling a shard causes all writes and queries to a shard to return
an error.  This also disables compactions for the shard.
2016-05-31 10:51:54 -06:00
Jonathan A. Sternberg 5e7e0bd19b Filter out sources that do not match the shard database/retention policy
If you use a statement like this:

    SELECT value FROM one..cpu, two..cpu

It will access both the `one` and `two` databases as if you had selected
the `cpu` measurement twice for both of them. Updated the `tsdb.Shard`
create iterator function to filter out any sources that do not apply to
that shard so this duplication doesn't happen.

Fixes #6701.
2016-05-23 17:05:33 -04:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder 2bd5880d7a Remove series from index when shard is closed
When a shard is closed and removed due to retention policy enforcement,
the series contained in the shard would still exists in the index causing
a memory leak.  Restarting the server would cause them not to be loaded.

Fixes #6457
2016-04-28 12:34:46 -06:00
Jonathan A. Sternberg 7ec2a991d5 Modify all of the iterators to allow returning an error on Next()
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.

Updated the Emitter to return errors when it cannot read properly from
the iterators.
2016-04-18 11:17:55 -04:00
Jonathan A. Sternberg 94ec92d669 Handle nil values from the tsm1 cursor correctly
Send nil values from the tsm1 cursor at the end of the cursor. After the
cursor reached tsm1, the `nextAt()` call would always return the default
value rather than a nil value.

Descending also didn't work correctly because the seeking functionality
for tsm1 iterators would always act like they were ascending instead of
descending when choosing which value to select. This resulted in very
strange output from the emitter since it couldn't figure out if it was
ascending or descending.

Fixes #6206.
2016-04-06 09:27:02 -04:00
Edd Robinson 8e2d1e48c7 Check if engine closed. Fixes #6140 2016-03-31 15:59:04 +01:00