core/homeassistant/components/recorder/util.py

"""SQLAlchemy util functions."""
from contextlib import contextmanager
import logging
import time

from .const import DATA_INSTANCE

_LOGGER = logging.getLogger(__name__)

RETRIES = 3
QUERY_RETRY_WAIT = 0.1


@contextmanager
def session_scope(*, hass=None, session=None):
    """Provide a transactional scope around a series of operations."""
    if session is None and hass is not None:
        session = hass.data[DATA_INSTANCE].get_session()

    if session is None:
        raise RuntimeError('Session required')

    try:
        yield session
        session.commit()
    except Exception as err:  # pylint: disable=broad-except
        _LOGGER.error("Error executing query: %s", err)
        session.rollback()
        raise
    finally:
        session.close()


def commit(session, work):
    """Commit & retry work: Either a model or in a function."""
    import sqlalchemy.exc
    for _ in range(0, RETRIES):
        try:
            if callable(work):
                work(session)
            else:
                session.add(work)
            session.commit()
            return True
        except sqlalchemy.exc.OperationalError as err:
            _LOGGER.error("Error executing query: %s", err)
            session.rollback()
            time.sleep(QUERY_RETRY_WAIT)
    return False


def execute(qry):
    """Query the database and convert the objects to HA native form.

    This method also retries a few times in the case of stale connections.
    """
    from sqlalchemy.exc import SQLAlchemyError

    for tryno in range(0, RETRIES):
        try:
            timer_start = time.perf_counter()
            result = [
                row for row in
                (row.to_native() for row in qry)
                if row is not None]

            if _LOGGER.isEnabledFor(logging.DEBUG):
                elapsed = time.perf_counter() - timer_start
                _LOGGER.debug('converting %d rows to native objects took %fs',
                              len(result),
                              elapsed)

            return result
        except SQLAlchemyError as err:
            _LOGGER.error("Error executing query: %s", err)

            if tryno == RETRIES - 1:
                raise
            else:
                time.sleep(QUERY_RETRY_WAIT)
Feature/reorg recorder (#6237) * Re-organize recorder * Fix history * Fix history stats * Fix restore state * Lint * Fix session reconfigure * Move imports around * Do not start recording till HASS started * Lint * Fix logbook * Fix race condition recorder init * Better reporting on errors 2017-02-26 22:38:06 +00:00			`"""SQLAlchemy util functions."""`
			`from contextlib import contextmanager`
			`import logging`
			`import time`

			`from .const import DATA_INSTANCE`

			`_LOGGER = logging.getLogger(__name__)`

			`RETRIES = 3`
			`QUERY_RETRY_WAIT = 0.1`


			`@contextmanager`
			`def session_scope(*, hass=None, session=None):`
			`"""Provide a transactional scope around a series of operations."""`
			`if session is None and hass is not None:`
			`session = hass.data[DATA_INSTANCE].get_session()`

			`if session is None:`
			`raise RuntimeError('Session required')`

			`try:`
			`yield session`
			`session.commit()`
			`except Exception as err: # pylint: disable=broad-except`
Update docstrings (#7374) * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstring * Update docstrings * Update docstrings * Fix lint issues * Update docstrings * Revert changes in dict 2017-05-02 16:18:47 +00:00			`_LOGGER.error("Error executing query: %s", err)`
Feature/reorg recorder (#6237) * Re-organize recorder * Fix history * Fix history stats * Fix restore state * Lint * Fix session reconfigure * Move imports around * Do not start recording till HASS started * Lint * Fix logbook * Fix race condition recorder init * Better reporting on errors 2017-02-26 22:38:06 +00:00			`session.rollback()`
			`raise`
			`finally:`
			`session.close()`


			`def commit(session, work):`
			`"""Commit & retry work: Either a model or in a function."""`
			`import sqlalchemy.exc`
			`for _ in range(0, RETRIES):`
			`try:`
			`if callable(work):`
			`work(session)`
			`else:`
			`session.add(work)`
			`session.commit()`
			`return True`
			`except sqlalchemy.exc.OperationalError as err:`
Update docstrings (#7374) * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstring * Update docstrings * Update docstrings * Fix lint issues * Update docstrings * Revert changes in dict 2017-05-02 16:18:47 +00:00			`_LOGGER.error("Error executing query: %s", err)`
Feature/reorg recorder (#6237) * Re-organize recorder * Fix history * Fix history stats * Fix restore state * Lint * Fix session reconfigure * Move imports around * Do not start recording till HASS started * Lint * Fix logbook * Fix race condition recorder init * Better reporting on errors 2017-02-26 22:38:06 +00:00			`session.rollback()`
			`time.sleep(QUERY_RETRY_WAIT)`
			`return False`


			`def execute(qry):`
			`"""Query the database and convert the objects to HA native form.`

			`This method also retries a few times in the case of stale connections.`
			`"""`
			`from sqlalchemy.exc import SQLAlchemyError`

			`for tryno in range(0, RETRIES):`
			`try:`
History query and schema optimizations for huge performance boost (#8748) * Add DEBUG-level log for db row to native object conversion This is now the bottleneck (by a large margin) for big history queries, so I'm leaving this log feature in to help diagnose users with a slow history page * Rewrite of the "first synthetic datapoint" query for multiple entities The old method was written in a manner that prevented an index from being used in the inner-most GROUP BY statement, causing massive performance issues especially when querying for a large time period. The new query does have one material change that will cause it to return different results than before: instead of using max(state_id) to get the latest entry, we now get the max(last_updated). This is more appropriate (primary key should not be assumed to be in order of event firing) and allows an index to be used on the inner-most query. I added another JOIN layer to account for cases where there are two entries on the exact same `last_created` for a given entity. In this case we do use `state_id` as a tiebreaker. For performance reasons the domain filters were also moved to the outermost query, as it's way more efficient to do it there than on the innermost query as before (due to indexing with GROUP BY problems) The result is a query that only needs to do a filesort on the final result set, which will only be as many rows as there are entities. * Remove the ORDER BY entity_id when fetching states, and add logging Having this ORDER BY in the query prevents it from using an index due to the range filter, so it has been removed. We already do a `groupby` in the `states_to_json` method which accomplishes exactly what the ORDER BY in the query was trying to do anyway, so this change causes no functional difference. Also added DEBUG-level logging to allow diagnosing a user's slow history page. * Add DEBUG-level logging for the synthetic-first-datapoint query For diagnosing a user's slow history page * Missed a couple instances of `created` that should be `last_updated` * Remove `entity_id` sorting from state_changes; match significant_update This is the same change as 09b3498f410106881fc5e095c49a8d527fa89644 , but applied to the `state_changes_during_period` method which I missed before. This should give the same performance boost to the history sensor component! * Bugfix in History query used for History Sensor The date filter was using a different column for the upper and lower bounds. It would work, but it would be slow! * Update Recorder purge script to use more appropriate columns Two reasons: 1. the `created` column's meaning is fairly arbitrary and does not represent when an event or state change actually ocurred. It seems more correct to purge based on the event date than the time the database row was written. 2. The new columns are indexed, which will speed up this purge script by orders of magnitude * Updating db model to match new query optimizations A few things here: 1. New schema version with a new index and several removed indexes 2. A new method in the migration script to drop old indexes 3. Added an INFO-level log message when a new index will be added, as this can take quite some time on a Raspberry Pi 2017-08-05 06:16:53 +00:00			`timer_start = time.perf_counter()`
			`result = [`
Feature/reorg recorder (#6237) * Re-organize recorder * Fix history * Fix history stats * Fix restore state * Lint * Fix session reconfigure * Move imports around * Do not start recording till HASS started * Lint * Fix logbook * Fix race condition recorder init * Better reporting on errors 2017-02-26 22:38:06 +00:00			`row for row in`
			`(row.to_native() for row in qry)`
			`if row is not None]`
History query and schema optimizations for huge performance boost (#8748) * Add DEBUG-level log for db row to native object conversion This is now the bottleneck (by a large margin) for big history queries, so I'm leaving this log feature in to help diagnose users with a slow history page * Rewrite of the "first synthetic datapoint" query for multiple entities The old method was written in a manner that prevented an index from being used in the inner-most GROUP BY statement, causing massive performance issues especially when querying for a large time period. The new query does have one material change that will cause it to return different results than before: instead of using max(state_id) to get the latest entry, we now get the max(last_updated). This is more appropriate (primary key should not be assumed to be in order of event firing) and allows an index to be used on the inner-most query. I added another JOIN layer to account for cases where there are two entries on the exact same `last_created` for a given entity. In this case we do use `state_id` as a tiebreaker. For performance reasons the domain filters were also moved to the outermost query, as it's way more efficient to do it there than on the innermost query as before (due to indexing with GROUP BY problems) The result is a query that only needs to do a filesort on the final result set, which will only be as many rows as there are entities. * Remove the ORDER BY entity_id when fetching states, and add logging Having this ORDER BY in the query prevents it from using an index due to the range filter, so it has been removed. We already do a `groupby` in the `states_to_json` method which accomplishes exactly what the ORDER BY in the query was trying to do anyway, so this change causes no functional difference. Also added DEBUG-level logging to allow diagnosing a user's slow history page. * Add DEBUG-level logging for the synthetic-first-datapoint query For diagnosing a user's slow history page * Missed a couple instances of `created` that should be `last_updated` * Remove `entity_id` sorting from state_changes; match significant_update This is the same change as 09b3498f410106881fc5e095c49a8d527fa89644 , but applied to the `state_changes_during_period` method which I missed before. This should give the same performance boost to the history sensor component! * Bugfix in History query used for History Sensor The date filter was using a different column for the upper and lower bounds. It would work, but it would be slow! * Update Recorder purge script to use more appropriate columns Two reasons: 1. the `created` column's meaning is fairly arbitrary and does not represent when an event or state change actually ocurred. It seems more correct to purge based on the event date than the time the database row was written. 2. The new columns are indexed, which will speed up this purge script by orders of magnitude * Updating db model to match new query optimizations A few things here: 1. New schema version with a new index and several removed indexes 2. A new method in the migration script to drop old indexes 3. Added an INFO-level log message when a new index will be added, as this can take quite some time on a Raspberry Pi 2017-08-05 06:16:53 +00:00
			`if _LOGGER.isEnabledFor(logging.DEBUG):`
			`elapsed = time.perf_counter() - timer_start`
			`_LOGGER.debug('converting %d rows to native objects took %fs',`
			`len(result),`
			`elapsed)`

			`return result`
Feature/reorg recorder (#6237) * Re-organize recorder * Fix history * Fix history stats * Fix restore state * Lint * Fix session reconfigure * Move imports around * Do not start recording till HASS started * Lint * Fix logbook * Fix race condition recorder init * Better reporting on errors 2017-02-26 22:38:06 +00:00			`except SQLAlchemyError as err:`
Update docstrings (#7374) * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstrings * Update docstring * Update docstrings * Update docstrings * Fix lint issues * Update docstrings * Revert changes in dict 2017-05-02 16:18:47 +00:00			`_LOGGER.error("Error executing query: %s", err)`
Feature/reorg recorder (#6237) * Re-organize recorder * Fix history * Fix history stats * Fix restore state * Lint * Fix session reconfigure * Move imports around * Do not start recording till HASS started * Lint * Fix logbook * Fix race condition recorder init * Better reporting on errors 2017-02-26 22:38:06 +00:00
			`if tryno == RETRIES - 1:`
			`raise`
			`else:`
			`time.sleep(QUERY_RETRY_WAIT)`