Commit Graph

129 Commits (88d6487f4a95dfd4d699bdc155f1fff57a792efb)

Author SHA1 Message Date
Ben Johnson 9fb8f1ec1d
Fix database and tag limits. 2017-03-24 09:48:10 -06:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Jason Wilder 29f8d8de76 Fix race in WALEntry.Encode and Value.Deduplicate
Under high query load, a race exists in the cache and the WAL.  Since
writes currently hit the cache first, they are availble for query before
they hit the WAL.  If the WAL is writing and accessign the Value slice
at the same time that a query is run that needs to dedup the same slice,
a race occurs.

To fix this, the cache now just copies the values instead of storing the
slice passed in.  Another way to fix this might be to have the writes go
to the wal before the cache.  I think the latter would be better, but it
introduces some larger write path issues that we'd need to also address.
e.g. if the cache was full, writes to the WAL would need to be rejected
to avoid filling the disk.

Copying the slice in the cache is simpler for now and does not appear to
dramatically affect performance.
2017-03-06 09:38:22 -07:00
Jason Wilder a024003f2c Merge branch '1.2' into jw-merge-12 2017-02-22 12:13:29 -07:00
Ben Johnson 78a9bb2527 Remove Tags.shouldCopy, replace with forceCopy on series creation.
Previously, tags had a `shouldCopy` flag to indicate if those tags
referenced an underlying buffer and should be copied to allow GC.
Unfortunately, this prevented tags from being copied that were
created and referenced the mmap which caused segfaults.

This change removes the `shouldCopy` flag and replaces it with a
`forceCopy` argument in `CreateSeriesIfNotExists()`. This allows
the write path to indicate that tags must be cloned on insert.
2017-02-21 11:13:35 -07:00
Mark Rushakoff 601cbcd084 Merge branch '1.2' into mr-merge-12 2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg 2fe48d6781 Rename zap import back to github.com/uber-go/zap
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
2017-02-17 17:17:22 -06:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Edd Robinson feb7a2842c Use unbuffered error channels in tests 2017-01-17 10:53:15 -08:00
Edd Robinson 292b30b82b Fix subtle bugs and remove dead code from tsdb 2017-01-17 09:47:34 -08:00
Jonathan A. Sternberg d7c8c7ca4f Support subquery execution in the query language
This adds query syntax support for subqueries and adds support to the
query engine to execute queries on subqueries.

Subqueries act as a source for another query. It is the equivalent of
writing the results of a query to a temporary database, executing
a query on that temporary database, and then deleting the database
(except this is all performed in-memory).

The syntax is like this:

    SELECT sum(derivative) FROM (SELECT derivative(mean(value)) FROM cpu GROUP BY *)

This will execute derivative and then sum the result of those derivatives.
Another example:

    SELECT max(min) FROM (SELECT min(value) FROM cpu GROUP BY host)

This would let you find the maximum minimum value of each host.

There is complete freedom to mix subqueries with auxiliary fields. The only
caveat is that the following two queries:

    SELECT mean(value) FROM cpu
    SELECT mean(value) FROM (SELECT value FROM cpu)

Have different performance characteristics. The first will calculate
`mean(value)` at the shard level and will be faster, especially when it comes to
clustered setups. The second will process the mean at the top level and will not
include that optimization.
2017-01-07 13:00:48 -06:00
Ben Johnson f9efcb3365
Re-add shared in-memory index. 2017-01-05 10:17:09 -07:00
Edd Robinson 0f9b2bfe6a
Fix tests 2017-01-05 10:16:15 -07:00
Ben Johnson 9f8b206b51
Fix measurement system queries. 2017-01-05 10:15:34 -07:00
Ben Johnson cb93f10120
Remove per-shard in-memory index. 2017-01-05 10:11:09 -07:00
Ben Johnson 409b0165f5
shared in-memory index 2017-01-05 10:09:57 -07:00
Ben Johnson a812502ea3
reintegrating in-memory index 2017-01-05 10:07:35 -07:00
Ben Johnson 1ac067e53b
intermediate 2017-01-05 10:03:09 -07:00
Ben Johnson 62d2b3ebe9
Series filtering. 2017-01-05 10:02:42 -07:00
Ben Johnson 62269c3cea
intermediate 2017-01-05 10:02:41 -07:00
Edd Robinson da63b349a4
Fix bad rebase 2017-01-05 09:59:44 -07:00
Edd Robinson 149b1cef1d
Fix 32bit overflow; limit capacity 2017-01-05 09:59:10 -07:00
Edd Robinson d19fbf5ab4
Wire in HLL estimator 2017-01-05 09:54:03 -07:00
Edd Robinson 05bc4dec00
Refactor 2017-01-05 09:50:23 -07:00
Edd Robinson 2171d9471b
Initialise index in shards 2017-01-05 09:42:48 -07:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Edd Robinson 28ba8ced74 Fixes #7625 2016-11-17 16:31:36 +00:00
Jason Wilder 0b6f5441b9 Add config option to messages when limits exceeded
When a limit is exceeded, we return errors and sometimes log (if appropriate)
that a limit was exceeded.  The messages don't always provide an indication
as to where or how they are configured.

Instead, return the config option (easily searchable for) as well as the limit
currently set and the value that exceeded it when possible.
2016-10-28 14:54:45 -06:00
Jason Wilder 873189e0c2 Fix panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.StringValue
If concurrent writes to the same shard occur, it's possible for different types to
be added to the cache for the same series.  The way the measurementFields map on the
shard is updated is racy in this scenario which would normally prevent this from occurring.
When this occurs, the snapshot compaction panics because it can't encode different types
in the same series.

To prevent this, we have the cache return an error a different type is added to existing
values in the cache.

Fixes #7498
2016-10-28 12:15:50 -06:00
Jason Wilder bbecb3f03d Drop points that would execeed limits
This changes the behavior of the max-series-per-database and
max-values-per-tag limits to drop points that would exceed the limits
and allow the remaining points to be written.  Previously, the whole
batch would fail and return and 500 error to the client.

This now will write the allow points and return a `partial write`
error indicating some of the points were dropped, how many were
dropped and one of the problem measureent and tags.
2016-10-10 11:42:15 -06:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jonathan A. Sternberg 9621bee195 Drop time when used as a tag or field key
The "time" field and tags are unqueryable so we prevent those from being
written so we don't have unreadable data.
2016-08-10 10:02:01 -05:00
David Norton 0c4559722c feat #6679: add series limit config setting 2016-08-01 08:28:46 -04:00
Michael Desa 517d8d5881 Move benchmarks beneath other NewSeries 2016-06-23 10:15:37 -07:00
Michael Desa 0c867e4b2c Fix benchmark test names
Previously the test names included an `s` for the name of a singular
component.
2016-06-16 08:45:36 -07:00
Michael Desa 9dfaa182a7 Add additional benchmarks for various schemas
Anecdotally, the relationship between memory consumption and series
cardinality was thought to be exponential. I suspect that this is false.
The intent of the added benchmarks is to verify my suspicion. Eventually
the these benchmarks will run nightly to serve as a basis to evualuate
the memory performance in a controlled environment.

https://github.com/influxdata/docs.influxdata.com/issues/392
2016-06-15 14:54:14 -07:00
Jason Wilder 1ff8ecf4fb Add ability to disable shards
Disabling a shard causes all writes and queries to a shard to return
an error.  This also disables compactions for the shard.
2016-05-31 10:51:54 -06:00
Jonathan A. Sternberg 5e7e0bd19b Filter out sources that do not match the shard database/retention policy
If you use a statement like this:

    SELECT value FROM one..cpu, two..cpu

It will access both the `one` and `two` databases as if you had selected
the `cpu` measurement twice for both of them. Updated the `tsdb.Shard`
create iterator function to filter out any sources that do not apply to
that shard so this duplication doesn't happen.

Fixes #6701.
2016-05-23 17:05:33 -04:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder 2bd5880d7a Remove series from index when shard is closed
When a shard is closed and removed due to retention policy enforcement,
the series contained in the shard would still exists in the index causing
a memory leak.  Restarting the server would cause them not to be loaded.

Fixes #6457
2016-04-28 12:34:46 -06:00
Jonathan A. Sternberg 7ec2a991d5 Modify all of the iterators to allow returning an error on Next()
This also switches the remaining iterators to be lazy so they can return
errors properly. They needed to be converted to lazy initialization
anyway, which has the side effect of making it much easier for us to
propagate the underlying error during initialization.

Updated the Emitter to return errors when it cannot read properly from
the iterators.
2016-04-18 11:17:55 -04:00
Jonathan A. Sternberg 94ec92d669 Handle nil values from the tsm1 cursor correctly
Send nil values from the tsm1 cursor at the end of the cursor. After the
cursor reached tsm1, the `nextAt()` call would always return the default
value rather than a nil value.

Descending also didn't work correctly because the seeking functionality
for tsm1 iterators would always act like they were ascending instead of
descending when choosing which value to select. This resulted in very
strange output from the emitter since it couldn't figure out if it was
ascending or descending.

Fixes #6206.
2016-04-06 09:27:02 -04:00
Edd Robinson 8e2d1e48c7 Check if engine closed. Fixes #6140 2016-03-31 15:59:04 +01:00
Mark Rushakoff cdcb079769 Tag TSM stats with database, retention policy
... by extracting the db/rp from the given path.

Now that the code has "standardized" on extracting db/rp this way, the
ShardLocation struct is no longer necessary and thus has been removed.
We're back on the previous style of passing the path and walPath to
NewShard.
2016-02-29 09:17:34 -08:00
Mark Rushakoff 40a98e0d55 Add database, RP as tags on shard stats
This commit updates tsdb.Shard to contain a ShardConfig and updates
tsdb.Store to directly reference a map of tsdb.Shard rather than the
previous tsdb.shardLocation abstraction.
2016-02-25 13:41:55 -08:00
Mark Rushakoff fb83374389 Track stats for number of series, measurements
Per database: track number of series and measurements
Per measurement: track number of series
2016-02-24 08:10:16 -08:00
Ben Johnson f7e04abef7 remove NaN from query engine
This commit removes `math.NaN` returns from float iterators.
2016-02-17 14:11:31 -07:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Ben Johnson 47c2bab74b add SHOW TAG KEYS support 2016-02-10 09:40:28 -07:00
Ben Johnson 00806de9b8 refactor query engine 2016-02-10 09:40:25 -07:00
Ben Johnson cde973f409 refactor query engine 2016-02-10 09:40:24 -07:00
Jason Wilder 0926b19e6b Prevent creating points with NaN float values
Float values are not supported in the existing engine and the tsm1
engines.  This changes NewPoint to return an error if a field value
contains a NaN field.  It also allows us to validate fields to prevent
other unsupported types from sneaking in through other input plugins.
2015-10-27 17:12:52 -06:00
Cory LaNou d19a510ad2 refactor Points and Rows to dedicated packages 2015-09-16 15:33:08 -05:00
Jason Wilder a4c1d9a9a7 Remove unused Database index names and sorting
Writes could timeout and when adding new measurement names to the
index if the sort took a long time.  The names slice was never
actually used (except a test) so keeping it in index wastes memory
and sort it wastes CPU and increases lock contention.  The sorting
was happening while the shard held a write-lock.

Fixes #3869
2015-08-27 11:57:20 -06:00
Paul Dix 73f3dc1e14 Update store to properly manage WAL create/delete.
* Update the store to remove the WAL directories associated with a shard or database when they are deleted.
* Fix the Store so that it creates separate WAL directories for databases and retention policies.
2015-08-21 11:22:04 -04:00
Paul Dix 9df3b7d828 Add WAL configuration options 2015-08-18 16:59:54 -04:00
Paul Dix 3348dab4e0 Fix bug with new shards not getting series data persisted. 2015-08-16 15:45:09 -04:00
Paul Dix b583b896ce Integrate WAL and BZ1 and make BZ1 the default engine. 2015-08-16 12:46:50 -04:00
Ben Johnson a7f50ae03c refactor storage to engine 2015-07-22 11:08:10 -06:00
Ben Johnson de1f9a3736 refactor tsdb tests into test package 2015-07-22 11:07:06 -06:00
Philip O'Toole ca86fa2633 Allow WAL inter-flush time to be configurable 2015-07-02 10:40:26 -04:00
Joseph Crail 5fccee3d16 Fix spelling errors in comments and strings. 2015-06-28 02:54:34 -04:00
Ben Johnson b574e2f755 Add write ahead log
This commit adds a write ahead log to the shard. Entries are cached
in memory and periodically flushed back into the index. The WAL and
the cache are both partitioned into buckets so that flushing doesn't
stop the world as long.
2015-06-25 15:47:13 -06:00
Jason Wilder bc7e1f6fd6 Fix panic when adding new fields
Fixes #2869

When adding a new field to an existing measurment, Shard.validateSeriesAndFields
would also encode the fields as a side-effect.  In the case of a new field
that needed to be created, the encoding would fail because the field type
had not been created for the measurement yet.  The fields are re-encoded
after validateSeriesAndFields returns and after the field encoding have been
setup properly so this additional encoding during
validation isn't necessary.
2015-06-10 10:30:14 -06:00
Todd Persen 0ee71b9755 Merge pull request #2743 from influxdb/tsdb-benchmarks
add shard & index benchmarks
2015-06-05 13:15:38 -07:00
Paul Dix 408bc3f81e Ensure proper locking of index structures on writes and queries. 2015-06-04 14:50:32 -04:00
David Norton 938ad2ef85 add Store Open benchmark test 2015-06-03 10:09:50 -04:00
David Norton 31bb8e70a9 don't build index before benchmarking WritePoints 2015-06-02 17:17:31 -04:00
David Norton 97c84a6d4f add benchmark tests for shard WritePoints 2015-06-02 17:00:25 -04:00
Jason Wilder 9a9bb736f7 Add text protocol parsing and serialzation for points
This changes the implementation of point to minimize the extra
processing needed to parse and marshal point data though the system.
2015-05-29 11:18:40 -06:00
Jason Wilder 21bfb150a1 Wire up new write path
This allows the new write path to be hooked up if you start the
server with `INFLUXDB_ALPHA1=1`.  When set, writes will go though
the coordinator and be stubbed out to write to a single local data
node with one shards.  The write will be logged and written to
disk .

The env var is used so that the current write path is not completely
broken which would break many of the tests that depend on writes.

Note that queries are not currently working w/ the this change.
2015-05-26 12:07:56 -06:00
Paul Dix 6c80108f63 Change Database to DatabaseIndex, remove leftover warn statement 2015-05-24 07:39:45 -04:00
Paul Dix c3ab88a715 Make the metadata index shared across shards while keeping field types and encoding local to each shard. 2015-05-23 18:06:07 -04:00
Jason Wilder 1076153a00 Convert Point to interface
Should be possible to replace the implementation with a more
optimized version now.
2015-05-22 15:39:55 -06:00
Jason Wilder f8d599cda9 Convert Point.Tags to Point.Tags() 2015-05-22 15:12:34 -06:00
Jason Wilder 5dcab443dc Move data.Point to tsdb.Point 2015-05-22 15:00:51 -06:00
Paul Dix 8f937cae87 Initial implementation for writing data to a shard. 2015-05-22 16:11:18 -04:00