=============== Running Gnocchi =============== Once Gnocchi is properly installed, you need to launch Gnocchi. Simply run the HTTP server and metric daemon: :: gnocchi-api gnocchi-metricd You can run these services as background daemons. Running API As A WSGI Application ================================= To run Gnocchi API, you can use the provided `gnocchi-api`. It wraps around `uwsgi` – makes sure that `uWSGI`_ is installed. If one Gnocchi API server is not enough, you can spawn any number of new API server to scale Gnocchi out, even on different machines. Since Gnocchi API tier runs using WSGI, it can alternatively be run using `Apache httpd`_ and `mod_wsgi`_, or any other HTTP daemon. uWSGI ----- If you want to deploy using `uWSGI`_ yourself, the following uWSGI configuration file can be used as a base:: [uwsgi] http = localhost:8041 # Set the correct path depending on your installation wsgi-file = /usr/local/bin/gnocchi-api master = true die-on-term = true threads = 32 # Adjust based on the number of CPU processes = 32 enabled-threads = true thunder-lock = true plugins = python buffer-size = 65535 lazy-apps = true add-header = Connection: close You should configure the number of processes according to the number of CPU you have, usually around 1.5 × number of CPU. Once written to `/etc/gnocchi/uwsgi.ini`, it can be launched this way:: uwsgi /etc/gnocchi/uwsgi.ini Apache mod_wsgi --------------- If you want to use Apache httpd `mod_wsgi`_, here's an example configuration file:: WSGIDaemonProcess gnocchi user=gnocchi processes=4 threads=32 display-name=%{GROUP} WSGIProcessGroup gnocchi WSGIScriptAlias / /usr/local/bin/gnocchi-api WSGIPassAuthorization On WSGIApplicationGroup %{GLOBAL} Require all granted .. _Apache httpd: http://httpd.apache.org/ .. _mod_wsgi: https://modwsgi.readthedocs.org/ .. _uWSGI: https://uwsgi-docs.readthedocs.org/ How to define archive policies ============================== The |archive policies| define how the |metrics| are aggregated and how long they are stored. Each |archive policy| definition is expressed as the number of points over a |timespan|. If your |archive policy| defines a policy of 10 points with a |granularity| of 1 second, the |time series| archive will keep up to 10 seconds, each representing an aggregation over 1 second. This means the |time series| will at maximum retain 10 seconds of data between the more recent point and the oldest point. That does not mean it will be 10 consecutive seconds: there might be a gap if data is fed irregularly. **There is no expiry of data relative to the current timestamp. Data is only expired according to timespan.** Each |archive policy| also defines which |aggregation methods| will be used. The default is set to `default_aggregation_methods` which is by default set to *mean*, *min*, *max*, *sum*, *std*, *count*. Therefore, both the |archive policy| and the |granularity| entirely depends on your use case. Depending on the usage of your data, you can define several |archive policies|. A typical low grained use case could be:: 1440 points with a granularity of 1 minute = 24 hours The worst case scenario for storing compressed data points is 8.04 bytes per point, whereas best case scenario can compress up to 0.05 bytes per point. Knowing that, it is possible to compute the worst case scenario for storage in order to plan for data storage capacity. An archive policy of 1440 points would need 1440 points × 8.04 bytes = 11.3 KiB per |aggregation method|. If you use the 6 standard |aggregation method| proposed by Gnocchi, your |metric| will take up to 6 × 11.3 KiB = 67.8 KiB of disk space per metric. Be aware that the more definitions you set in an |archive policy|, the more CPU it will consume. Therefore, creating an |archive policy| with 2 definitons (e.g. 1 second granularity for 1 day and 1 minute granularity for 1 month) may consume twice CPU than just one definition (e.g. just 1 second granularity for 1 day). Default archive policies ------------------------ By default, 4 |archive policies| are created when calling `gnocchi-upgrade`: *bool*, *low*, *medium* and *high*. The name both describes the storage space and CPU usage needs. The `bool` |archive policy| is designed to store only boolean values (i.e. 0 and 1). It only stores one data point for each second (using the `last` |aggregation method|), with a one year retention period. The maximum optimistic storage size is estimated based on the assumption that no other value than 0 and 1 are sent as |measures|. If other values are sent, the maximum pessimistic storage size is taken into account. - low * 5 minutes granularity over 30 days * aggregation methods used: `default_aggregation_methods` * maximum estimated size per metric: 406 KiB - medium * 1 minute granularity over 7 days * 1 hour granularity over 365 days * aggregation methods used: `default_aggregation_methods` * maximum estimated size per metric: 887 KiB - high * 1 second granularity over 1 hour * 1 minute granularity over 1 week * 1 hour granularity over 1 year * aggregation methods used: `default_aggregation_methods` * maximum estimated size per metric: 1 057 KiB - bool * 1 second granularity over 1 year * aggregation methods used: *last* * maximum optimistic size per metric: 1 539 KiB * maximum pessimistic size per metric: 277 172 KiB How to plan for Gnocchi’s storage ================================= Gnocchi uses a custom file format based on its library *Carbonara*. In Gnocchi, a |time series| is a collection of points, where a point is a given |aggregate| or sample, in the lifespan of a |time series|. The storage format is compressed using various techniques, therefore the computing of a |time series|' size can be estimated based on its **worst** case scenario with the following formula:: number of points × 8 bytes = size in bytes The number of points you want to keep is usually determined by the following formula:: number of points = timespan ÷ granularity For example, if you want to keep a year of data with a one minute resolution:: number of points = (365 days × 24 hours × 60 minutes) ÷ 1 minute number of points = 525 600 Then:: size in bytes = 525 600 points × 8 bytes = 4 204 800 bytes = 4 106 KiB This is just for a single aggregated |time series|. If your |archive policy| uses the 6 default |aggregation methods| (mean, min, max, sum, std, count) with the same "one year, one minute aggregations" resolution, the space used will go up to a maximum of 6 × 4.1 MiB = 24.6 MiB. Metricd ======= Metricd is the daemon responsible for processing measures, computing their aggregates and storing them into the aggregate storage. It also handles a few other cleanup tasks, such as deleting metrics marked for deletion. Metricd therefore is responsible for most of the CPU usage and I/O job in Gnocchi. The archive policy of each metric will influence how fast it performs. In order to process new measures, metricd checks the incoming storage for new measures from time to time. The delay between each check is can be configured by changing the `[metricd]metric_processing_delay` configuration option. Some incoming driver (only Redis currently) are able to inform metricd that new measures are available for processing. In that case, metricd will not respect the `[metricd]metric_processing_delay` parameter and start processing the new measures right away. This behaviour can be disabled by turning off the `[metricd]greedy` option. How many metricd workers do I need to run ----------------------------------------- By default, `gnocchi-metricd` daemon spans all your CPU power in order to maximize CPU utilisation when computing |metric| aggregation. You can use the `gnocchi status` command to query the HTTP API and get the cluster status for |metric| processing. It’ll show you the number of |metric| to process, known as the processing backlog for `gnocchi-metricd`. As long as this backlog is not continuously increasing, that means that `gnocchi-metricd` is able to cope with the amount of |metric| that are being sent. In case this number of |measures| to process is continuously increasing, you will need to (maybe temporarily) increase the number of `gnocchi-metricd` daemons. You can run any number of metricd daemon on any number of servers. How to scale measure processing ------------------------------- Measurement data pushed to Gnocchi is divided into "sacks" for better distribution. Incoming |metrics| are pushed to specific sacks and each sack is assigned to one or more `gnocchi-metricd` daemons for processing. The number of sacks should be set based on the number of active |metrics| the system will capture. Additionally, the number of sacks should be higher than the total number of active `gnocchi-metricd` workers. In general, use the following equation to determine the appropriate `sacks` value to set:: sacks value = number of **active** metrics / 300 If the estimated number of |metrics| is the absolute maximum, divide the value by 500 instead. If the estimated number of active |metrics| is conservative and expected to grow, divide the value by 100 instead to accommodate growth. How do we change sack size -------------------------- In the event your system grows to capture significantly more |metrics| than originally anticipated, the number of sacks can be changed to maintain good distribution. To avoid any loss of data when modifying the number of `sacks`, the value should be changed in the following order: 1. Stop all input services (api, statsd). 2. Stop all metricd services once backlog is cleared. 3. Run ``gnocchi-change-sack-size `` to set new sack size. Note that the sack value can only be changed if the backlog is empty. 4. Restart all gnocchi services (api, statsd, metricd) with the new configuration. Alternatively, to minimize API downtime: 1. Run gnocchi-upgrade but use a new incoming storage target such as a new ceph pool, file path, etc. Additionally, set |aggregate| storage to a new target as well. 2. Run ``gnocchi-change-sack-size `` against the new target. 3. Stop all input services (api, statsd). 4. Restart all input services but target the newly created incoming storage. 5. When done clearing backlog from original incoming storage, switch all metricd daemons to target the new incoming storage but maintain original |aggregate| storage. How to monitor Gnocchi ====================== The `/v1/status` endpoint of the HTTP API returns various information, such as the number of |measures| to process (|measures| backlog), which you can easily monitor (see `How many metricd workers do I need to run`_). The Gnocchi client can show this output by running `gnocchi status`. Making sure that the HTTP server and `gnocchi-metricd` daemon are running and are not writing anything alarming in their logs is a sign of good health of the overall system. Total |measures| for backlog status may not accurately reflect the number of points to be processed when |measures| are submitted via batch. How to backup and restore Gnocchi ================================= In order to be able to recover from an unfortunate event, you need to backup both the index and the storage. That means creating a database dump (PostgreSQL or MySQL) and doing snapshots or copy of your data storage (Ceph, S3, Swift or your file system). The procedure to restore is no more complicated than initial deployment: restore your index and storage backups, reinstall Gnocchi if necessary, and restart it. How to clear Gnocchi data ========================= If you ever want to start fresh or need to clean Gnocchi data, this can be easily done. You need to clean the measures (incoming), aggregates (storage) and indexer data storage. Once that is done, if you want to re-initialize Gnocchi, you need to call `gnocchi-upgrade` so it re-initialize the different drivers. Index storage ------------- Both MySQL and PostgreSQL drivers uses a single database. Delete the database. If you want to install Gnocchi again, create back that database with the same name before calling `gnocchi-upgrade`. Incoming data ------------- Depending on the driver you use, the data are stored in different places: * **Ceph**: delete the `gnocchi-config` object and the objects whose names start with `incoming` in the Ceph pool. Alternatively you can delete the Ceph pool (and recreate it if needed). * **OpenStack Swift**: delete the `gnocchi-config` container and containers whose names start with `incoming` in the Swift account. * **Redis**: delete the `gnocchi-config` key and the keys whose names start with `incoming`. * **File**: delete `${incoming.file_basepath}/tmp` and the directories whose names start with `${incoming.file_basepath}/incoming`. * **Amazon S3**: delete the bucket whose name start with `incoming`. Storage data ------------ Depending on the driver you use, the data are stored in different places: * **Ceph**: delete the objects whose names start with `gnocchi_` in the Ceph pool. Alternatively you can delete the Ceph pool (and recreate it if needed). * **OpenStack Swift**: delete the containers whose names start with `$storage.swift_container_prefix` in the Swift account. * **Redis**: delete the keys whose names start with `timeseries`. * **File**: delete the directories whose names are UUIDs under `$incoming.file_basepath`. * **Amazon S3**: delete the bucket whose name start with `$storage.s3_bucket_prefix`. .. include:: include/term-substitution.rst