Commit Graph

1023 Commits (cba956943a848f8db2b0dd93fc4e5ac644c255b8)

Author SHA1 Message Date
Edd Robinson 9c5c1c7001 Optimisation for expressions with single measument 2018-07-18 12:21:54 +01:00
Stuart Carnie d977c0ac24 fix(tsdb): Fix existing Prometheus tests based on batch cursors 2018-07-16 08:55:37 -07:00
Stuart Carnie 497fc42779 pr(tsdb): Feedback items from megacheck
* batch cursors and cursorIterator will be removed in a follow up
  PR using Arrow array data structures
2018-07-16 08:55:37 -07:00
Stuart Carnie 910d0fe5e6 feat(tsm1): ArrayCursor interfaces and implementations
Array cursors are enabled for storage RPC calls

tsm1:

* Implemented cursors that utilize Array decoders

storage:

* Abstractions to easily switch to Array cursors
2018-07-16 08:55:37 -07:00
Gunnar e9f0dc48ca
Merge pull request #10051 from influxdata/ga-config
Update example config with UDP precision option
2018-07-10 08:16:28 -07:00
Gunnar Aasen 8870512e6b Update example config with UDP precision option
Also add a test for precision in the UDP configuration.
2018-07-09 10:07:05 -07:00
Jeff Wendling 07e5465cb8 httpd: fix flaky test in timeout handler
there were two problems with this code:

1. the send on pending did not imply that the handler was running
2. there was a race starting the handler with timing out

1 is fixed by sending to a begin channel inside the handler. it is
then guaranteed that the timeout handler code has been entered.

2 is fixed by attempting to acquire the semaphore channel once before
checking the timeout channel. in this way, if there is capacity, which
in this test there is known to be, it is guaranteed to be taken. if
we check with the timer at the same time and the timer has already
fired, there is a pseudorandom chance the timer will be taken even
if there is capacity.
2018-07-05 12:09:09 -06:00
Jonathan A. Sternberg 88b81941ac Modify the storage service to expose a grpc interface instead of yarpc 2018-06-28 14:03:09 -05:00
Edd Robinson 8fd00853e7 Ensure read service regexes get optimised 2018-06-22 19:32:17 +01:00
Jonathan A. Sternberg 87d2469877
Merge pull request #9964 from influxdata/js-enable-storage-service
Enable the storage service by default
2018-06-13 16:28:27 -05:00
Edd Robinson 3cb9e13d58 Address PR feedback 2018-06-13 17:41:50 +01:00
Jonathan A. Sternberg 17ca220f33 Enable the storage service by default 2018-06-13 10:56:50 -05:00
Edd Robinson 8a78e64868 Make zero results work properly 2018-06-12 23:49:04 +01:00
Edd Robinson 28b6df7afb Ensure remote read can handle no data in time 2018-06-12 23:10:18 +01:00
Edd Robinson 522e509709 Add further tests 2018-06-12 15:54:18 +01:00
Edd Robinson 524f400836 Make testing of handler easier 2018-06-12 15:54:18 +01:00
Edd Robinson 806464d9e7 Add storage package mocks 2018-06-12 15:54:18 +01:00
Edd Robinson 40fa4eddc0 Fix overflow 2018-06-12 15:54:18 +01:00
Paul Dix 4f7b93342c Update Prometheus read/write to use new storage query layer.
* Update Prometheus remote write to use metric name as measurement name and value as the field name.
* Update Prometheus remote read to use the storage.Read method to bypass the InfluxQL query engine.
2018-06-12 15:54:18 +01:00
Jeff Wendling 5ec7b901bb
Merge pull request #9847 from influxdata/jmw-docker-on-windows
fix(tsdb): attempt to work on docker on windows
2018-06-01 15:16:51 -06:00
Jeff Wendling e6aec771b0 fix(tsdb): attempt to work on docker on windows
multiple users have attempted to run influxdb in a docker container
with a windows host and a volume mounted from windows. that causes
problems because it apparently uses samba/cifs which does not
support fsync on directories. this patchset will, if it receives an EINVAL
on directory fsync, as is what appears to happen on samba/cifs, then it
will ignore it. this should help.

fixes #9833.
fixes #9630.
2018-06-01 14:57:18 -06:00
Stuart Carnie e4b3204328 fix(services/storage): Don't serialize empty group frames 2018-06-01 13:28:34 -07:00
Stuart Carnie 4b7a4868a9 fix(services/storage): Add nil checks to prevent panics
Fixes #9925
2018-05-31 14:22:19 -07:00
Stuart Carnie d42abde836 refactor(services/storage): Enhanced group support for Read RPC 2018-05-30 11:32:20 -07:00
Ben Johnson 8b44e3142c
Add optional pprof http endpoint immediately on startup.
This commit adds `debug-pprof-enabled` which will start the default
`net/http/pprof` endpoint and bind against `localhost:6060`. This
will help to debug startup performance issues.
2018-05-23 14:32:15 -06:00
Ben Johnson 8a74c6759f
Add http write throttling.
This commit adds throttling to the HTTP write endpoints based on
queue depth and, optionally, timeout. Two queues exist: `enqueued`
and `current`. The `current` queue is the number of concurrent
requests that can be processed. The `enqueued` queue limits the
maximum number of requests that can be waiting to be processed.

If the timeout is exceeded or the `enqueued` queue is full then
a `"503 Service unavailable"` code is returned and the error is
logged.

By default these options are turned off.
2018-05-21 13:08:24 -06:00
Jeff Wendling e89438f7c2 fix flaky subscriber tests
Fixes #9554.
2018-04-24 12:34:45 -06:00
Jonathan A. Sternberg a7e1da5f86 Add suppress-write-log option to disable the write log when the log is enabled 2018-04-23 12:45:48 -05:00
Stuart Carnie a8692a9e24 services/meta: improve readability of Contains function, add unit tests 2018-04-19 18:05:55 -07:00
Jacob Marble 232be14aef respect rp parameter in /query 2018-04-19 08:31:43 -07:00
Jacob Marble 321ae4ff04
update CircleCI config to 2.0 syntax (#9711)
* enable flaky test, see if CircleCI fails

* Use CircleCI 2.0 with docker layer caching

* update CONTRIBUTING
2018-04-16 12:00:44 -07:00
Jonathan A. Sternberg 8334693b47 Properly track the response bytes written for queries in all format types
The number of bytes written for CSV responses and probably MessagePack
responses was incorrect.
2018-04-09 12:42:58 -05:00
Jonathan A. Sternberg 243ed2ea5e Avoid a panic when using show diagnostics with text/csv
If the columns change between series, it will now act as if it was a new
statement id and reprint the headers. This only happens with show
diagnostics at the moment and we shouldn't add this functionality
anywhere else anyway.
2018-04-09 09:09:42 -05:00
Jonathan A. Sternberg 1b738d3991 Allow customizing the unix socket group and permissions created by the server 2018-04-05 14:40:12 -05:00
Ben Johnson 1fe9abd66f
Delete deleted shards in retention service. 2018-03-28 10:44:14 -06:00
Stuart Carnie ef2ba80ce2 don't overwrite `_measurement` for multi-tenant reads 2018-03-23 13:59:42 -07:00
Stuart Carnie 813cb1a2f6 to var () or not to var (), that is the question 2018-03-23 12:26:55 -07:00
Stuart Carnie ee3e2ad67f rename Tenant -> OrgID 2018-03-23 12:26:55 -07:00
Stuart Carnie 2cc1f5137e support for tenant+bucket
NOTE: to match storage service, values for database and rp are
hard-coded to `db` and `rp` respectively
2018-03-23 12:26:55 -07:00
Stuart Carnie aa61359cc7 Storage RPC API improvements. See PR for details
* reduce # allocations (115M -> 22M)
* reduce size allocations (53GB -> 1.3GB)
* reduce RPC query time (45s -> 12.9s)
2018-03-21 13:46:09 -07:00
Edd Robinson bdd61298ea Skip flaky test 2018-03-12 16:18:35 +00:00
Jonathan A. Sternberg 733d842812 Turn the ExecutionContext into a context.Context
Along with modifying ExecutionContext to be a context and have the
TaskManager return the context itself, this also creates a Monitor
interface and exposes the Monitor through the Context. This way, we can
access the monitor from within the query.Select method and keep all of
the limits inside of the query package instead of leaking them into the
statement executor.

An eventual goal is to remove the InterruptCh from the IteratorOptions
and use the Context instead, but for now, we'll just assign the done
channel from the Context to the IteratorOptions so at least they refer
to the same channel.
2018-03-08 14:03:20 -06:00
Jonathan A. Sternberg de4390ae83 Rename some of the structs and interfaces in the query package
Remove the `Query` prefix from some structs and interfaces. They were
there so when the query engine was in the same package as influxql,
these would be differentiated. Now that the package name is query, the
extra prefix seems redundant.
2018-03-02 09:44:12 -06:00
Stuart Carnie a74d296200 use underscore vs period, fix doc comment, add database name to CQ 2018-02-26 10:08:43 -07:00
Stuart Carnie 0a5a07dc3a series keys are produced in ascending order 2018-02-22 13:08:36 -07:00
Stuart Carnie d135aecf02 Generate trace logs for a number of significant influx operations
* tsdb Store.Open traces all events related to opening files
    * op.name : tsdb.open
* retention policy shard deletions
    * op.name : retention.delete_check
* all TSM compaction strategies
    * op.name : tsm1.compact_group
* series file compactions
    * op.name : series_partition.compaction
* continuous query execution (if logging enabled)
    * op.name : continuous_querier.execute
* TSI log file compaction
    * op_name: index.tsi.compact_log_file
* TSI level compaction
    * op.name: index.tsi.compact_to_level
2018-02-21 15:08:49 -07:00
Jonathan A. Sternberg d38413a849
Merge pull request #9454 from influxdata/js-structured-logging
Update logging calls to take advantage of structured logging
2018-02-21 09:14:40 -06:00
Jonathan A. Sternberg 2bbd96768d Update logging calls to take advantage of structured logging
Includes a style guide that details the basics of how to log.
2018-02-20 10:04:19 -06:00
Stuart Carnie 584e7ac09a Added option to write HTTP request logs to separate file. 2018-02-14 23:11:01 -07:00
Mark Rushakoff f7fc6a6501 Remove IsAdmin() method from meta.User interface
The method was only called in tests.
2018-02-13 16:59:22 -08:00
Stuart Carnie 8f978068f9
Merge pull request #9415 from influxdata/sgc-storage
restore `MetaClient`, which is needed by store
2018-02-09 07:51:36 -07:00
Andrew Hare d21ebfe531 Do not report an error when dropping a CQ on a non-existent DB/RP
This makes the behvior similar to other places in the DB where we
don't return an error when we try to drop an object from a non-
existent database.
2018-02-08 13:28:47 -06:00
Stuart Carnie 41dc96ca91 restore `MetaClient`, which is needed by store
* Switch from an anonymous type to avoid false positives with
  `megacheck`
2018-02-08 12:13:13 -07:00
Edd Robinson f19588360e
Merge pull request #9349 from influxdata/er-the-purge
Cleanup of codebase using static analysis tools
2018-01-25 17:11:53 -08:00
Patrick Hemmer 2dc2c53093 fix nil err panic in msgpack httpd WriteResponse 2018-01-23 19:54:00 -05:00
Edd Robinson 6a66b5faf0 Cleanup services package 2018-01-21 10:52:37 -08:00
Adam 938db68198
Update restore functionality to run in online mode, consume Enterprise backup files. (#9207)
* Live Restore + Enterprise data format compatability

* Extended ImportData to import all DB's if no db name given

* Added a new enterprise data test, and backup command now prints the backup file paths at conclusion

* Added whole-system backup test

* Update to use protobuf in all enterprise data cases

* Update to test to do cross-testing with enterprise version

* incremental enterprise backup format support
2018-01-10 13:59:18 -05:00
Edd Robinson 1f3352efbd
Merge pull request #9153 from influxdata/er-prom-parsing
Fix Prometheus regex parsing
2018-01-02 18:39:46 +00:00
Stuart Carnie 5dfe3b2645 inmem startup improvments
* only call ParseTags when necessary
* remove dependency on inmem.Series in tsdb test package
* Measurement and Series are no longer exported. Their use is restricted
  to the inmem package
* improve Measurement and Series types by exporting immutable
  fields and removing unnecessary APIs and locks

Reduced startup time from 28s to 17s. Overall improvement including
#9162 reduces startup from 46s to 17s for 1MM series across 14 shards.
2017-12-29 07:58:52 -07:00
Ben Johnson d8b1d208c0
rebase 2017-12-20 15:13:34 -07:00
Edd Robinson c476a0b4a1 Merge branch 'master' into er-tsi-index-part 2017-12-15 18:31:24 +00:00
Jonathan A. Sternberg 5fcf57a764 Remove extraneous newlines from the log
The newlines were accidentally kept when changing the logger. They are
not necessary and mess up the log output.
2017-12-14 16:41:42 -06:00
Stuart Carnie 0d29dc1121 add Prometheus metrics HTTP endpoint 2017-12-11 08:51:40 -07:00
Edd Robinson 7d13bf3262 merge master 2017-12-08 17:21:58 +00:00
Edd Robinson f6835632e7 Merge master into branch 2017-12-08 17:11:07 +00:00
Adam a0b2195d6b
Pulled in backup-relevant code for review (#9193)
for issue #8879
2017-12-07 11:35:20 -05:00
Jonathan A. Sternberg 95e1e3b332
Merge pull request #8015 from influxdata/js-code-coverage
Expand code coverage for undercovered packages
2017-11-29 19:30:47 -06:00
Andrew Hare d7e328050c
Merge branch 'master' into ah-truncate-shards 2017-11-28 17:25:31 -07:00
Jonathan A. Sternberg b775ad3d5d Expand unit test code coverage in services that were undercovered
This expands code coverage for the following packages:
* monitor (3.5% -> 86.9%)
* services/precreator (31.6% -> 83.8%)
* services/retention (83.0% -> 84.9%)
* services/snapshotter (0.0% -> 82.1%)
* tcp (48.7% -> 60.0%)
2017-11-28 15:44:35 -06:00
Edd Robinson 8e3d29ec7a Fixes #9134.
This converts Prometheus' regex syntax for a condition value into InfluxDB's.
2017-11-27 11:19:01 +00:00
Stuart Carnie 7cdfd95966 initial opentrace implementation for ifql interface
NOTE: does not include a default tracer until configuration across
projects is standardized
2017-11-22 14:42:26 -07:00
Ben Johnson fc966a1b67
Add series file backup/restore. 2017-11-22 08:55:54 -07:00
Stuart Carnie 89877d7764 ifql: writer tracks estimated size (bytes) to limit memory between Send 2017-11-20 11:33:37 -07:00
Edd Robinson c098081c7d Don't initialise a new Authorizer each query 2017-11-17 11:06:43 +00:00
Jonathan A. Sternberg 97ab61addb
Merge pull request #9092 from influxdata/jenkinsfile
Initial jenkinsfile
2017-11-14 11:12:32 -06:00
Stuart Carnie 2e04e871c9 fix descending queries
* did not handle cached values correctly
* sort shards by time in either ascending or descending
  order depending on the RPC request ordering to ensure they
  are traversed in the correct order.
2017-11-13 17:14:36 -08:00
Jonathan A. Sternberg ca5a773c34 Initial jenkinsfile 2017-11-13 14:02:23 -06:00
Jonathan A. Sternberg 0b7c56bcd8 Update the zap logger dependency
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.

This updates the usage of the logger and adds a simple package for
constructing the base logger.

The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
2017-11-10 16:27:16 -06:00
Stuart Carnie 57677be010 don't panic here; nil cursor is handled for now. fixes #9090 2017-11-10 11:21:31 -07:00
Edd Robinson 126db1b5f9
Merge pull request #9068 from influxdata/er-show-query-perf
Add time support to some meta queries
2017-11-07 15:48:58 +00:00
Stuart Carnie 6ee0c6ee0c check and flush frames while streaming points for a series
TODO(sgc): implement `writer` type that handles all the details
of writing frames to the RPC stream. Additional responsibilities
of writer include

* point frame recycling to reduce memory pressure
* skip empty point frames
* skip series frames with no points
2017-11-06 13:00:57 -07:00
Edd Robinson 98d584b63f Use index for SHOW X meta queries
When a meta query does not include a time component then it can be
answered exclusively by the index. This should result in a much faster
query execution that if the TSM engine was engaged.

This commit rewrites the following queries such that they make use
of the index where no time component is present:

  - SHOW MEASUREMENTS
  - SHOW SERIES
  - SHOW TAG KEYS
  - SHOW FIELD KEYS
2017-11-06 19:15:00 +00:00
Stuart Carnie cf2227def1 add expected data type to series frame 2017-11-06 11:12:27 -07:00
Stuart Carnie 728f5cc6ac strip series frame if no points returned 2017-11-03 17:04:33 -07:00
Stuart Carnie 10a0bb8f73 don't send empty response 2017-11-02 16:27:05 -07:00
Stuart Carnie f3d45ba301 influxdata/influxdb/influxql -> influxdata/influxql 2017-10-30 14:40:26 -07:00
Stuart Carnie d99cabb5d2 handle nil *indexSeriesCursor 2017-10-26 13:32:05 -07:00
Stuart Carnie ab17e15caf check nil iterator; check nil cursor when no data 2017-10-26 12:54:59 -07:00
Edd Robinson 2ea2abb001 Remove possibility of race when dropping shards
Fixes #8819.

Previously, the process of dropping expired shards according to the
retention policy duration, was managed by two independent goroutines in
the retention policy service. This behaviour was introduced in #2776,
at a time when there were both data and meta nodes in the OSS codebase.
The idea was that only the leader meta node would run the meta data
deletions in the first goroutine, and all other nodes would run the
local deletions in the second goroutine.

InfluxDB no longer operates in that way and so we ended up with two
independent goroutines that were carrying out an action that was really
dependent on each other.

If the second goroutine runs before the first then it may not see the
meta data changes indicating shards should be deleted and it won't
delete any shards locally. Shortly after this the first goroutine will
run and remove the meta data for the shard groups.

This results in a situation where it looks like the shards have gone,
but in fact they remain on disk (and importantly, their series within
the index) until the next time the second goroutine runs. By default
that's 30 minutes.

In the case where the shards to be removed would have removed the last
occurences of some series, then it's possible that if the database was already at its
maximum series limit (or tag limit for that matter), no further new series
can be inserted.
2017-10-26 16:15:13 +01:00
Edd Robinson 77977af685 Add repro test for #8819 2017-10-26 14:47:30 +01:00
Edd Robinson 1629ec7f5f Add tests to Retention service 2017-10-26 14:47:30 +01:00
Stuart Carnie dc04eaa8f3 Amendments based on feedback
* Fprint* functions
* No nakedness
* clarify panic messages
* spacing between case statements
* remove break in favor of return
* remove goto in favor of for { continue }
2017-10-25 13:38:07 -07:00
Stuart Carnie 415ed14c53 storage service
* storage service is disabled by default
* default port 8082
* RPC interface generated using yarpc via service.proto
2017-10-25 13:38:07 -07:00
Andrew Hare 13c3808aff Add test 2017-10-19 15:57:16 -06:00
Jonathan A. Sternberg 83ecab6299 Prevent deadlock during collectd, graphite, opentsdb, and udp shutdown
All of these services start up goroutines and then wait for the
goroutines to finish. Each of them has a `tsdb.PointBatcher` that may
return a point during the shutdown sequence. During the shutdown
sequence, a lock was held. This lock may get accessed when attempting to
write the point that came back from the `tsdb.PointBatcher`. This caused
the read lock attempt to wait forever for the write lock to be unlocked
during `Close()`.

This modifies these methods so that the write lock is released while
waiting for goroutines to finish in these three services.
2017-10-19 15:57:05 -05:00
Edd Robinson 9b55ee2b90 Merge pull request #8935 from posquit0/patch-1
Fix mis-typing in README.md of UDP Service
2017-10-18 15:57:50 +01:00
Byungjin Park f7d8ad50e2 Update README.md 2017-10-18 22:46:44 +09:00
Andrew Hare e6aa5023eb Create a command to truncated shard groups 2017-10-16 20:34:26 -06:00
Mark Rushakoff 4ed2e6f21e Minor cleanup 2017-10-13 17:28:24 -07:00
Mark Rushakoff f3f1cc1064 Initial integration tests for config settings 2017-10-11 17:16:42 -07:00