* feat(task): add limit function for task concurrency
The new task executor handles limit's differently then the old executor
instead of front loading limits by creating a runner for every task that might run
the new executor has a large worker pool and queue. This allow's us to have a unlimited
concurrency per task and helps us avoid a back log of task's execution based on a
arbitrary execution limit. This add's the ability to add an optional task execution limit
so a user can still have the advantages of limiting concurrency.
We needed the coordinator to be able to execute manual runs and resume runs.
These two functions have been added, but we also needed to allow for the executor to be
mocked out. To do that we needed to return a Promise interface instead of an actual
struct. Both these changes are to facilitate coordinator work and testing.
I chose to add a execute function that allow's the task executor to match expectation from
the scheduler but I left in the existing executor method that return's promises. This is
because I like to be able to have the accountablilty and visiblity inside what's happening
with each execution even though the promise isn't required for the scheduler. This function signature
will be used by the coordinator and potentially other's that want to ensure a 'execution' is completed.
The http error schema has been changed to simplify the outward facing
API. The `op` and `error` attributes have been dropped because they
confused people. The `error` attribute will likely be readded in some
form in the future, but only as additional context and will not be
required or even suggested for the UI to use.
Errors are now output differently both when they are serialized to JSON
and when they are output as strings. The `op` is no longer used if it is
present. It will only appear as an optional attribute if at all. The
`message` attribute for an error is always output and it will be the
prefix for any nested error. When this is serialized to JSON, the
message is automatically flattened so a nested error such as:
influxdb.Error{
Msg: errors.New("something bad happened"),
Err: io.EOF,
}
This would be written to the message as:
something bad happened: EOF
This matches a developers expectations much more easily as most
programmers assume that wrapping an error will act as a prefix for the
inner error.
This is flattened when written out to HTTP in order to make this logic
immaterial to a frontend developer.
The code is still present and plays an important role in categorizing
the error type. On the other hand, the code will not be output as part
of the message as it commonly plays a redundant and confusing role when
humans read it. The human readable message usually gives more context
and a message like with the code acting as a prefix is generally not
desired. But, the code plays a very important role in helping to
identify categories of errors and so it is very important as part of the
return response.
Implementations of the backend.Executor produce errors limited to
querying the KV store. The remainder of the errors will be processed
in the implementation of a `RunPromise`.
Fixes#15161
The current behavior is that the update is pushed into the scheduler,
and the scheduler cherry pick's what it needs. This leaves the task itself out
meaning any logging the scheduler did was not going to have the new task information in it.
When a task is told to execute it can be enqueued waiting for a worker.
This statistic will be superior to the existing delta based on scheduled for,
the current system can be effected by a user having slow queries or a long "delay" on the task.
This new way of measuring the same thing should allow us to accuratly measure when it is the task system's fault.
If we are caching run's in the kv storage system it is possible to get
the the cached version from the kv store and the recently completed run
from the analytical store. We just need to only show analytical results if
we find a duplicate.
fix(notification/check): include tags in check object in generated flux
Closes https://github.com/influxdata/influxdb/issues/14769
fix(notification/check): use selected field in threshold functions
Closes https://github.com/influxdata/influxdb/issues/14776
fix(testing): add selected field for check tests
fix(check): use real flux for threshold check
feat(notification/check): generate flux for deadman checks
chore(endpoint): rename webhook endpoint to http endpoint
fix(notification/rule): fetch url for flux script off of endpoint
fix(notification/rule): clean up slack and http rules
fix(notification/rule): change MessageTemp to MessageTemplate
fix(rules): pass endpoint in to rule during create
fix(ui): rename webhook to http
feat(notification/check): namespace deadman under alerts
fix(notification/check): nest tags under tags key in data object in flux
wip
feat(kv): log error if urm cannot be deleted for notification rule
fix(notification/rule): remove name from notify call in slack rule
chore(ui/cypress/e2e): skip rule create test
To have checks and notifications happen transactionally we need to be
able to alert the task system when a new task was created using the checks and notifications systems.
These two new middlewares allow us to inform the task system of a update
to a task that was created through the check or notification systems.
* feat(task): Remove token's from task structures
We had previously removed token's from the task api but left the token in place in several locations in the stack.
Now we can cleanly remove the extra tokens.
* feat(task): impersonate user on task execution
Passing tokens to tasks is cumbersome and we needed a way to more easily create tasks. With this change we no longer need a token on task create. We take the user that created the task and pass that in as the "owner". As far as the task is concerned the owner is the source of permissions.
This is done by adding an additional field on task create that is OwnerID. We will no longer respect the token passed in and it will be deprecated soon.
Things to do still:
Task updates need to allow for owners to be set.
Current behavior is that the first execution of a task happens based on the create time
of the task when using a 'every' schedule. If you create a task at 12:02 and want
the task to run every 15m. The first execution would happen at 12:17, and the 2nd would happen
at 12:30.
To fix this behavior I refactored the kv task to give a single source of knowledge.
We now have one function for finding exactly what the last scheduled task was.
We also now have a single method that calculates when the next schedule is due.
By unifying the logic it should always work the same way weather your asking when to run
or when creating a task.
* fix(tasks): Add a log message for run transition clairity
We on occasion will see a run in chronograf with missing run data.
We need to find out if we are submitting incomplete data or if we submit full data and somethinge else is happening
* Report errors found when iterating over flux query in task
* Add failing test for tasks executor result iterator exhaust failure
* Ensure errors exhausting tasks query result iterator are surfaced as task failure
* Update CHANGELOG with task result iteration error surfacing fix
The controller implementation is primarily used by influxdb so it
shouldn't be part of the flux repository. This copies the code from flux
to influxdb so it can be removed from the next flux release.
Now that the run status updates are transactional actions
We no longer have to add a timer to keep things on track.
This is causing a problem where some runs are showing up without a start or stop time if the system is busy.
I would rather have the scheduler hang on the update then leave a run action without required fields.
* task(fix): Tasks should no longer have inaccurate response data
tasks should be able to pull from a table with both success and failed results
Co-authored-by: AlirieGray <alirie@influxdata.com>
Co-authored-by: docmerlin <emrys@influxdata.com>
BucketsAccessed doesn't work currently with a private flux.Spec.
See this issue: https://github.com/influxdata/influxdb/issues/13278
This set of changes just allows code to compile until #13278 is fixed.
Note that preauthorization is not working in the meantime.
Fixes#13275.
This replaces usages of the spec compiler with the ast compiler and it
removes the error message referencing the spec compiler as an available
input.
It does not remove any of the code using the spec compiler that is
involved for proxying requests and it does not remove it from the API.
* Update task servicetest to move dependency to the new TaskControlService
closes#12724
We will now have the capability to write new task services that dont have to implement the backend.Store or LogReader or LogWriters
Task updates now attempt to keep the existing runners working.
This causes the system to be slightly slower after a task update and caused a flakey test.
The synchronous executor was missing a call to ResultIterator.Release.
The asynchronous executor wasn't even calling Query.Statistics.
Also add a test that the scheduler records the statistics to the run
log, and that the statistics are visible from the launcher test. The
launcher test is the most likely place to catch if something goes wrong
in the full stack.
The late measurement filter, after a pivot, had the potential to result
in empty groups without a runID, which would cause a runtime error,
which would cause the whole query to fail.
Experimentation has shown that those empty tables will no longer arrive
by filtering early on measurement.
This should considerably simplify debugging when things go wrong with
the tasks, as this error can be displayed from the UI or CLI. Prior to
this change, you would have to view the console output from influxd.
Fixes#12548.
In the platform adapter, we ask the URM for a list of tasks the user
owns, and then we look up each task individually.
The task service tests uncovered a legitimate bug where FindTasks would
return a "task not found" error, originating from looking up a task that
was present when we interrogated the URM but was deleted before we could
find it in the task store.
This change also removes duplicated URM logic from the HTTP handler
which has since been pushed down into the platform adapter.
* feat(kv:inmem:bolt): implement user service in a kv
* refactor(kv): use consistent func receiver name
* feat(kv): add initial basic auth service
* refactor(passwords): move auth interface into own file
* refactor(passwords): rename basic auth files to passwords
* refactor(passwords): rename from BasicAuth to Passwords
* refactor(kv): copy bolt user test into kv
Co-authored-by: Michael Desa <mjdesa@gmail.com>
* feat(kv): add inmem testing to kv store
* fix(kv): remove extra user index initialization
* feat(kv): attempt at making errors nice
* fix(http): return not found error if filter is invalid
* fix(http): s/platform/influxdb/ for user service
* fix(http): s/platform/influxdb/ for user service
* feat(kv): initial port of telegraf configs to kv
* feat(kv): first pass at migrating bolt org service to kv
* feat(kv): first pass at bucket service
* feat(kv): first pass at migrating kvlog to kv package
* feat(kv): add resource op logs
* feat(kv): first pass at user resource mapping migration
* feat(kv): add urm usage to bucket and org services
* feat(kv): first pass at kv authz service
* feat(kv): add cascading auth delete for users
* feat(kv): first pass d authorizer.OrganizationService in kv
* feat(cmd/influxd/launcher): user kv services where appropriate
* fix(kv): initialize authorizations
* fix(influxdb): use same buckets while slowly migrating stuff
* fix(kv): make staticcheck pass
* feat(kv): add dashboards to kv
review: make suggestions from pr review
fix: use common bucket names for bolt/kv stores
* test(kv): add complete password test coverage
* chore(kv): fixes for staticcheck
* feat(kv): implement labels generically on kv
* feat(kv): implement macro service
* feat(kv): add source service
* feat(kv): add session service
* feat(kv): add kv secret service
* refactor(kv): update telegraf and urm with error messages
* feat(kv): add lookup service
* feat(kv): add kv onboarding service
* refactor(kv): update telegraf to avoid repetition
* feat(cmd/influxd): use kv lookup service
* feat(kv): add telegraf to lookup service
* feat(cmd/influxd): use kv telegraf service
* feat(kv): initial port of scrapers in bolt to kv
* feat(kv): update scraper error messaging
* feat(cmd/influxd): add kv scraper
* feat(kv): add inmem backend tests
* refactor(kv): copy paste errors
* refactor(kv): add code to password errors
* fix(testing): update error messages for incorrect passwords
* feat(kv:inmem:bolt): implement user service in a kv
* refactor(kv): use consistent func receiver name
* refactor(kv): copy bolt user test into kv
Co-authored-by: Michael Desa <mjdesa@gmail.com>
* feat(kv): add inmem testing to kv store
* fix(kv): remove extra user index initialization
* feat(kv): attempt at making errors nice
* fix(http): return not found error if filter is invalid
* fix(http): s/platform/influxdb/ for user service
* feat(kv): first pass at migrating bolt org service to kv
* feat(kv): first pass at bucket service
* feat(kv): first pass at migrating kvlog to kv package
* feat(kv): add resource op logs
* feat(kv): first pass at user resource mapping migration
* feat(kv): add urm usage to bucket and org services
* feat(kv): first pass at kv authz service
* feat(kv): add cascading auth delete for users
* feat(kv): first pass d authorizer.OrganizationService in kv
* feat(cmd/influxd/launcher): user kv services where appropriate
* feat(kv): add initial basic auth service
* refactor(passwords): move auth interface into own file
* refactor(passwords): rename basic auth files to passwords
* fix(http): s/platform/influxdb/ for user service
* fix(kv): initialize authorizations
* fix(influxdb): use same buckets while slowly migrating stuff
* fix(kv): make staticcheck pass
* feat(kv): add dashboards to kv
review: make suggestions from pr review
fix: use common bucket names for bolt/kv stores
* feat(kv): implement labels generically on kv
* refactor(passwords): rename from BasicAuth to Passwords
* test(kv): add complete password test coverage
* chore(kv): fixes for staticcheck
* feat(kv): implement macro service
* feat(kv): add source service
* feat(kv): add session service
* feat(kv): initial port of telegraf configs to kv
* feat(kv): initial port of scrapers in bolt to kv
* feat(kv): add kv secret service
* refactor(kv): update telegraf and urm with error messages
* feat(kv): add lookup service
* feat(kv): add kv onboarding service
* refactor(kv): update telegraf to avoid repetition
* feat(cmd/influxd): use kv lookup service
* feat(kv): add telegraf to lookup service
* feat(cmd/influxd): use kv telegraf service
* feat(kv): update scraper error messaging
* feat(cmd/influxd): add kv scraper
* feat(kv): add inmem backend tests
* refactor(kv): copy paste errors
* refactor(kv): add code to password errors
* fix(testing): update error messages for incorrect passwords
* feat(http): initial support for flushing all key/values from kv store
* feat(kv): rename macro to variable
* feat(cmd/influxd/launcher): user kv services where appropriate
* refactor(passwords): rename from BasicAuth to Passwords
* feat(kv): implement macro service
* test(ui): introduce cypress
* test(ui): introduce first typescript test
* test(ui/e2e): add ci job
* chore: update gitignore to ignore test outputs
* feat(inmem): in memory influxdb
* test(e2e): adding pinger that checks if influxdb is alive
* hackathon
* hack
* hack
* hack
* hack
* Revert "feat(inmem): in memory influxdb"
This reverts commit 30ddf032003e704643b07ce80df61c3299ea7295.
* hack
* hack
* hack
* hack
* hack
* hack
* hack
* hack
* hack
* hack
* hack
* hack
* hack
* chore: lint ignore node_modules
* hack
* hack
* hack
* add user and flush
* hack
* remove unused vars
* hack
* hack
* ci(circle): prefix e2e artifacts
* change test to testid
* update cypress
* moar testid
* fix npm warnings
* remove absolte path
* chore(ci): remove /home/circleci proto mkdir hack
* wip: crud resources e2e
* fix(inmem): use inmem kv store services
* test(dashboard): add first dashboard crud tests
* hack
* undo hack
* fix: use response from setup for orgID
* chore: wip
* add convenience getByTitle function
* test(e2e): ui can create orgs
* test(e2e): add test for org deletion and update
* test(e2e): introduce task creation test
* test(e2e): create and update of buckets on org view
* chore: move types to declaration file
* chore: use route fixture in dashboard tests
* chore(ci): hack back
* test(ui): update snapshots
* chore: package-lock
* chore: remove macros
* fix: launcher rebase issues
* fix: compile errors
* fix: compile errors
* feat(cmd/influxdb): add explicit testing, asset-path, and store flags
Co-authored-by: Andrew Watkins <watts@influxdb.com>
* fix(cmd/influxd): set default HTTP handler and flags
Co-authored-by: Andrew Watkins <watts@influxdb.com>
* build(Makefile): add run-e2e and PHONY
* feat(kv:inmem:bolt): implement user service in a kv
* refactor(kv): use consistent func receiver name
* feat(kv): add initial basic auth service
* refactor(passwords): move auth interface into own file
* refactor(passwords): rename basic auth files to passwords
* refactor(passwords): rename from BasicAuth to Passwords
* refactor(kv): copy bolt user test into kv
Co-authored-by: Michael Desa <mjdesa@gmail.com>
* feat(kv): add inmem testing to kv store
* fix(kv): remove extra user index initialization
* feat(kv): attempt at making errors nice
* fix(http): return not found error if filter is invalid
* fix(http): s/platform/influxdb/ for user service
* fix(http): s/platform/influxdb/ for user service
* feat(kv): initial port of telegraf configs to kv
* feat(kv): initial port of scrapers in bolt to kv
* feat(kv): first pass at migrating bolt org service to kv
* feat(kv): first pass at bucket service
* feat(kv): first pass at migrating kvlog to kv package
* feat(kv): add resource op logs
* feat(kv): first pass at user resource mapping migration
* feat(kv): add urm usage to bucket and org services
* feat(kv): first pass at kv authz service
* feat(kv): add cascading auth delete for users
* feat(kv): first pass d authorizer.OrganizationService in kv
* feat(cmd/influxd/launcher): user kv services where appropriate
* fix(kv): initialize authorizations
* fix(influxdb): use same buckets while slowly migrating stuff
* fix(kv): make staticcheck pass
* feat(kv): add dashboards to kv
review: make suggestions from pr review
fix: use common bucket names for bolt/kv stores
* test(kv): add complete password test coverage
* chore(kv): fixes for staticcheck
* feat(kv): implement labels generically on kv
* feat(kv): implement macro service
* feat(kv): add source service
* feat(kv): add session service
* feat(kv): add kv secret service
* refactor(kv): update telegraf and urm with error messages
* feat(kv): add lookup service
* feat(kv): add kv onboarding service
* refactor(kv): update telegraf to avoid repetition
* feat(cmd/influxd): use kv lookup service
* feat(kv): add telegraf to lookup service
* feat(cmd/influxd): use kv telegraf service
* feat(kv): update scraper error messaging
* feat(cmd/influxd): add kv scraper
* feat(kv): add inmem backend tests
* refactor(kv): copy paste errors
* refactor(kv): add code to password errors
* fix(testing): update error messages for incorrect passwords
* feat(kv): rename macro to variable
* refactor(kv): auth/bucket/org/user unique checks return errors now
* feat(inmem): add way to get all bucket names from store
* feat(inmem): Buckets to return slice of bytes rather than strings
* feat(inmem): add locks around Buckets to avoid races
* feat(cmd/influx): check for unauthorized error in wrapCheckSetup
* chore(e2e): add video and screenshot artifcats to gitignore
* docs(ci): add build instructions for e2e tests
* feat(kv): add id lookup for authorized resources
Task ID is now a required value on run and log filters. It was
effectively required by all implementations before anyway, so now those
types reflect that requirement.
Organization ID was removed from those same fields. The TaskService
looks up the organization ID via the task in cases where we need it at a
lower layer.
This was a missed case from #11817.
This case currently occurs when creating a task through the UI, using a
session rather than a full-fledged authorization. It doesn't fix that
case yet, but at least it will log an informative message.
Immediately before the executor calls out to the query service, the
executor loads the authorizer associated with the task, and associates
that authorizer with the context used to execute the query.
Accept token when creating or updating a task, but only report back the
authorization ID.
This means the executor and the platform adapter are now both aware of
an Authorization Service.
With the ongoing authorization work, creation arguments will differ from
what's returned on reads. More specifically, creation will accept a
token, but reads will report back a token ID.
This refactor facilitates that authorization work, and also brings the
code closer to the swagger definition, for the TaskCreateRequest type in
particular.
The earliest time needed to be captured before creating the task, not
after; otherwise a test running close to a second boundary would fail.
Also make the failure message slightly more readable.
This switches run status from a tag to a field. This is likely a
breaking change to existing task logs.
Using a one-off local query, for 250 records, the previous approach took
around 10 seconds and the new approach is about 30 milliseconds. At 1000
records, the previous approach was roughly 110 seconds and the new
approach is around 70 milliseconds.
filter out resources that have mission IDs
fix(influxdb): simplify auth check in PermissionAllowed
review(platform): update as noted in review
fix(influxdb): ensure permission has valid org id
I did this with a dumb editor macro, so some comments changed too.
Also rename root package from platform to influxdb.
In interest of minimizing risk, anyone importing the root package has
now aliased it to "platform" so that no changes beyond imports were
necessary in those files.
Lastly, replace the old platform module to local path /dev/null so that
nobody can accidentally reintroduce a platform dependency while
migrating platform code to influxdb.
CreateTasks now check that the user has the write permission to the
tasks resource belonging to an organization. This change comes after
https://github.com/influxdata/platform/pull/2157 modified the structure
of authorization.
Also rename RetryAlreadyQueuedError by running:
gorename -from '"github.com/influxdata/platform/task/backend".RetryAlreadyQueuedError' -to RequestStillQueuedError
and some further manual cleanup for comments.
A standard Makefile is used now in all subdirs that run go generate.
Make will only generate the file if its source files changed.
The checkgenerate target runs clean to ensure all targets a generated
fresh.
Previously, the WithTicker option would call TickScheduler.Tick every
time the underlying time.Ticker sent a time on its channel. This meant
we used a 1s period, which meant that in the worst case, we would see a
tick at about 999ms after the second rollover.
This change increases the underlying time.Ticker frequency, but only
calls TickScheduler.Tick after a second rolls over. Since we now use a
tick frequency of 100ms, during normal operation, TickScheduler.Tick
will be called within 0.1s after the second rolls over.