📈 Re-add updated and fixed stats role. Goodbye influx, hello Prometheus

This commit is contained in:
David Stephens 2022-08-23 00:56:02 +01:00
parent ddc9bb1d87
commit 53b17c8811
15 changed files with 2854 additions and 102 deletions

View file

@ -36,7 +36,8 @@ If you have a spare domain name you can configure applications to be accessible
* [Gitea](https://gitea.io/en-us/) - Simple self-hosted GitHub clone * [Gitea](https://gitea.io/en-us/) - Simple self-hosted GitHub clone
* [GitLab](https://about.gitlab.com/features/) - Self-hosted GitHub clone of the highest order * [GitLab](https://about.gitlab.com/features/) - Self-hosted GitHub clone of the highest order
* [Glances](https://nicolargo.github.io/glances/) - for seeing the state of your system via a web browser * [Glances](https://nicolargo.github.io/glances/) - for seeing the state of your system via a web browser
* [Gotify](https://gotify.net/) Self-hosted server for sending push notifications * [Gotify](https://gotify.net/) - Self-hosted server for sending push notifications
* [Grafana](https://grafana.com/) - Query, visualize, alert on, and understand your data no matter where its stored (via stats role).
* [Guacamole](https://guacamole.apache.org/) - Web based remote desktop gateway, supports VNC, RDP and SSH * [Guacamole](https://guacamole.apache.org/) - Web based remote desktop gateway, supports VNC, RDP and SSH
* [healthchecks.io](https://healthchecks.io/) - Ensure your NAS is online and get notified otherwise * [healthchecks.io](https://healthchecks.io/) - Ensure your NAS is online and get notified otherwise
* [Heimdall](https://heimdall.site/) - Home server dashboard * [Heimdall](https://heimdall.site/) - Home server dashboard
@ -70,6 +71,7 @@ If you have a spare domain name you can configure applications to be accessible
* [Piwigo](https://piwigo.org/) - Photo Gallery Software * [Piwigo](https://piwigo.org/) - Photo Gallery Software
* [Plex](https://www.plex.tv/) - Plex Media Server * [Plex](https://www.plex.tv/) - Plex Media Server
* [Portainer](https://portainer.io/) - for managing Docker and running custom images * [Portainer](https://portainer.io/) - for managing Docker and running custom images
* [Prometheus](https://prometheus.io/) - Time series database and monitoring system (via stats role).
* [Prowlarr](https://github.com/Prowlarr/Prowlarr) - Indexer aggregator for Sonarr, Radarr, Lidarr, etc. * [Prowlarr](https://github.com/Prowlarr/Prowlarr) - Indexer aggregator for Sonarr, Radarr, Lidarr, etc.
* [pyLoad](https://pyload.net/) - A download manager with a friendly web-interface * [pyLoad](https://pyload.net/) - A download manager with a friendly web-interface
* [PyTivo](http://pytivo.org) - An HMO and GoBack server for TiVos. * [PyTivo](http://pytivo.org) - An HMO and GoBack server for TiVos.
@ -80,6 +82,7 @@ If you have a spare domain name you can configure applications to be accessible
* [Sickchill](https://sickchill.github.io/) - for managing TV episodes * [Sickchill](https://sickchill.github.io/) - for managing TV episodes
* [Sonarr](https://sonarr.tv/) - for downloading and managing TV episodes * [Sonarr](https://sonarr.tv/) - for downloading and managing TV episodes
* [Speedtest-Tracker](https://github.com/henrywhitaker3/Speedtest-Tracker) - Continuously track your internet speed * [Speedtest-Tracker](https://github.com/henrywhitaker3/Speedtest-Tracker) - Continuously track your internet speed
* Stats - Monitor and visualise metrics about your NAS and internet connection using Grafana, Prometheus, Telegraf and more.
* [Syncthing](https://syncthing.net/) - sync directories with another device * [Syncthing](https://syncthing.net/) - sync directories with another device
* [Tautulli](http://tautulli.com/) - Monitor Your Plex Media Server * [Tautulli](http://tautulli.com/) - Monitor Your Plex Media Server
* [The Lounge](https://thelounge.chat) - Web based always-on IRC client * [The Lounge](https://thelounge.chat) - Web based always-on IRC client

View file

@ -0,0 +1,14 @@
# Stats
The stats role uses Prometheus, Grafana, Telegraf and a number of metrics exporters to collect and record lots of metrics about your NAS.
Telegraf also exposes an InfluxDB endpoint for applications that require it.
## Usage
Set `stats_enabled: true` in your `inventories/<your_inventory>/nas.yml` file. If you want to gather metrics on your internet connection, enable `stats_internet_speed_test_enabled` too.
If you want to access Grafana externally, set `stats_grafana_available_externally: true` in your `inventories/<your_inventory>/nas.yml` file. If you want to access Promethehus externally, set `stats_prometheus_available_externally: true` in your `inventories/<your_inventory>/nas.yml` file.
The Grafana web interface can be found at <http://ansible_nas_host_or_ip:3000>, Prometheus can be found at <http://ansible_nas_host_or_ip:9090>

View file

@ -365,6 +365,11 @@
- speedtest-tracker - speedtest-tracker
when: (speedtest_tracker_enabled | default(False)) when: (speedtest_tracker_enabled | default(False))
- role: stats
tags:
- stats
when: (stats_enabled | default(False))
- role: syncthing - role: syncthing
tags: tags:
- syncthing - syncthing

View file

@ -0,0 +1,34 @@
---
stats_enabled: false
stats_internet_speed_test_enabled: false
stats_prometheus_available_externally: false
stats_grafana_available_externally: false
# directories
stats_telegraf_config_directory: "{{ docker_home }}/stats/telegraf/config"
stats_prometheus_data_directory: "{{ docker_home }}/stats/prometheus/data"
stats_prometheus_config_directory: "{{ docker_home }}/stats/prometheus/config"
stats_grafana_data_directory: "{{ docker_home }}/stats/grafana/data"
stats_grafana_config_directory: "{{ docker_home }}/stats/grafana/config"
# network
stats_prometheus_port: "9090"
stats_telegraf_port: "9273"
stats_telegraf_influxdb_port: "8086"
stats_prometheus_smartctl_port: "9902"
stats_speedtest_exporter_port: "9798"
stats_prometheus_hostname: "prometheus"
stats_grafana_port: "3000"
stats_grafana_hostname: "grafana"
# specs
stats_telegraf_memory: 1g
stats_prometheus_memory: 1g
stats_prometheus_smartctl_memory: 1g
stats_speedtest_exporter_memory: 256m
stats_grafana_memory: 1g
# config
stats_prometheus_retention_time: 365d
stats_prometheus_retention_size: 30GB
stats_collection_interval: 15s

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,9 @@
apiVersion: 1
providers:
- name: dashboards
type: file
updateIntervalSeconds: 60
options:
path: /etc/dashboards
foldersFromFilesStructure: true

View file

@ -0,0 +1,22 @@
---
- name: Smartctl Exporter Docker Container
docker_container:
name: stats-smartctl
image: matusnovak/prometheus-smartctl:latest
pull: true
privileged: true
ports:
- "{{ stats_prometheus_smartctl_port }}:9902"
restart_policy: unless-stopped
memory: "{{ stats_prometheus_smartctl_memory }}"
- name: Speedtest Exporter Docker Container
docker_container:
name: stats-speedtest
image: miguelndecarvalho/speedtest-exporter
pull: true
ports:
"{{ stats_speedtest_exporter_port }}:9798"
restart_policy: unless-stopped
memory: "{{ stats_speedtest_exporter_memory }}"
when: stats_internet_speed_test_enabled

View file

@ -0,0 +1,52 @@
---
- name: Create Grafana Directories
file:
path: "{{ item }}"
state: directory
owner: "472"
recurse: yes
with_items:
- "{{ stats_grafana_data_directory }}"
- "{{ stats_grafana_config_directory }}"
- "{{ stats_grafana_config_directory }}/dashboards"
- "{{ stats_grafana_config_directory }}/provisioning/datasources"
- "{{ stats_grafana_config_directory }}/provisioning/dashboards"
- name: Template Grafana data source
template:
src: datasources/ansible-nas.yml
dest: "{{ stats_grafana_config_directory }}/provisioning/datasources/ansible-nas.yml"
- name: Copy Grafana dashboards configuration
copy:
src: dashboards/ansible-nas.yml
dest: "{{ stats_grafana_config_directory }}/provisioning/dashboards/ansible-nas.yml"
- name: Copy Grafana Ansible-NAS dashboard
copy:
src: dashboards/ansible-nas-overview.json
dest: "{{ stats_grafana_config_directory }}/dashboards/ansible-nas-overview.json"
- name: Grafana Docker Container
docker_container:
name: grafana
image: grafana/grafana
pull: true
volumes:
- "{{ stats_grafana_data_directory }}:/var/lib/grafana:rw"
- "{{ stats_grafana_config_directory }}/provisioning:/etc/grafana/provisioning:ro"
- "{{ stats_grafana_config_directory }}/dashboards:/etc/dashboards:ro"
ports:
- "{{ stats_grafana_port }}:3000"
env:
GF_PLUGINS_ENABLE_ALPHA: "true"
GF_UNIFIED_ALERTING_ENABLED: "true"
restart_policy: unless-stopped
memory: 1g
labels:
traefik.enable: "{{ stats_grafana_available_externally | string }}"
traefik.http.routers.grafana.rule: "Host(`grafana.{{ ansible_nas_domain }}`) "
traefik.http.routers.grafana.tls.certresolver: "letsencrypt"
traefik.http.routers.grafana.tls.domains[0].main: "{{ ansible_nas_domain }}"
traefik.http.routers.grafana.tls.domains[0].sans: "*.{{ ansible_nas_domain }}"
traefik.http.services.grafana.loadbalancer.server.port: "3000"

View file

@ -0,0 +1,5 @@
---
- import_tasks: prometheus.yml
- import_tasks: telegraf.yml
- import_tasks: exporters.yml
- import_tasks: grafana.yml

View file

@ -0,0 +1,46 @@
---
- name: Create Prometheus Config Directory
file:
path: "{{ item }}"
state: directory
with_items:
- "{{ stats_prometheus_data_directory }}"
- "{{ stats_prometheus_config_directory }}"
- name: Create Prometheus Data Directory
file:
path: "{{ item }}"
state: directory
mode: 0777
with_items:
- "{{ stats_prometheus_data_directory }}"
- "{{ stats_prometheus_config_directory }}"
- name: Template Prometheus config
template:
src: prometheus.yml
dest: "{{ stats_prometheus_config_directory }}/prometheus.yml"
register: prometheus_config
- name: Prometheus Docker Container
docker_container:
name: stats-prometheus
image: prom/prometheus
pull: true
volumes:
- "{{ stats_prometheus_config_directory }}/prometheus.yml:/etc/prometheus/prometheus.yml:ro"
- "{{ stats_prometheus_data_directory }}:/prometheus:rw"
- "/etc/timezone:/etc/timezone:ro"
ports:
- "{{ stats_prometheus_port }}:9090"
restart_policy: unless-stopped
memory: "{{ stats_prometheus_memory }}"
restart: "{{ prometheus_config is changed }}"
command: "--config.file=/etc/prometheus/prometheus.yml --storage.tsdb.retention.size={{ stats_prometheus_retention_size }} --storage.tsdb.retention.time={{ stats_prometheus_retention_time }}"
labels:
traefik.enable: "{{ stats_prometheus_available_externally | string }}"
traefik.http.routers.prometheus.rule: "Host(`{{ stats_prometheus_hostname }}.{{ ansible_nas_domain }}`)"
traefik.http.routers.prometheus.tls.certresolver: "letsencrypt"
traefik.http.routers.prometheus.tls.domains[0].main: "{{ ansible_nas_domain }}"
traefik.http.routers.prometheus.tls.domains[0].sans: "*.{{ ansible_nas_domain }}"
traefik.http.services.prometheus.loadbalancer.server.port: "9090"

View file

@ -0,0 +1,49 @@
---
- name: Create Telegraf Directory
file:
path: "{{ item }}"
state: directory
with_items:
- "{{ stats_telegraf_config_directory }}"
- name: Template telegraf.conf
template:
src: telegraf.conf.j2
dest: "{{ stats_telegraf_config_directory }}/telegraf.conf"
register: telegraf_config
- name: Get Docker daemon uid
command: stat -c '%g' /var/run/docker.sock
register: docker_uid
changed_when: false
- name: Telegraf Docker Container
docker_container:
name: stats-telegraf
image: telegraf
pull: true
privileged: true
ipc_mode: host
ports:
- "{{ stats_telegraf_influxdb_port }}:8086"
- "{{ stats_telegraf_port }}:9273"
user: "telegraf:{{ docker_uid.stdout }}"
volumes:
- "{{ stats_telegraf_config_directory }}/telegraf.conf:/etc/telegraf/telegraf.conf:ro"
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "/:/hostfs:ro"
- "/etc:/hostfs/etc:ro"
- "/proc:/hostfs/proc:ro"
- "/sys:/hostfs/sys:ro"
- "/var:/hostfs/var:ro"
- "/run:/hostfs/run:ro"
env:
HOST_ETC: "/hostfs/etc"
HOST_PROC: "/hostfs/proc"
HOST_SYS: "/hostfs/sys"
HOST_VAR: "/hostfs/var"
HOST_RUN: "/hostfs/run"
HOST_MOUNT_PREFIX: "/hostfs"
restart_policy: unless-stopped
memory: 1g
restart: "{{ telegraf_config is changed }}"

View file

@ -0,0 +1,18 @@
---
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}:{{ stats_prometheus_port }}
uid: ansible_nas
isDefault: true
version: 4
deleteDatasources:
- name: "InfluxDB"
orgId: 1
- name: "Alertmanager"
orgId: 1

View file

@ -0,0 +1,45 @@
# my global config
global:
scrape_interval: {{ stats_collection_interval }} # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
static_configs:
- targets: ["localhost:{{ stats_prometheus_port }}"]
- job_name: "telegraf"
static_configs:
- targets: [
"{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}:{{ stats_telegraf_port }}",
]
- job_name: "smartctl"
static_configs:
- targets: [
"{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}:{{ stats_prometheus_smartctl_port }}"
]
- job_name: "traefik"
static_configs:
- targets: [
"{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}:8083"
]
- job_name: "speedtest"
scrape_interval: 1h
scrape_timeout: 5m
static_configs:
- targets: [
"{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}:{{ stats_speedtest_exporter_port }}"
]

View file

@ -25,7 +25,7 @@
# Configuration for telegraf agent # Configuration for telegraf agent
[agent] [agent]
## Default data collection interval for all inputs ## Default data collection interval for all inputs
interval = "{{ stat_collection_interval }}" interval = "{{ stats_collection_interval }}"
## Rounds collection interval to 'interval' ## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc. ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
round_interval = true round_interval = true
@ -82,70 +82,9 @@
# OUTPUT PLUGINS # # OUTPUT PLUGINS #
############################################################################### ###############################################################################
# Configuration for sending metrics to InfluxDB [[outputs.prometheus_client]]
[[outputs.influxdb]] ## Address to listen on.
## The full HTTP or UDP URL for your InfluxDB instance. listen = ":9273"
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
# urls = ["unix:///var/run/influxdb.sock"]
# urls = ["udp://127.0.0.1:8089"]
urls = ["http://{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}:8086"]
## The target database for metrics; will be created as needed.
database = "telegraf"
## If true, no CREATE DATABASE queries will be sent. Set to true when using
## Telegraf with a user without permissions to create databases or when the
## database already exists.
# skip_database_creation = false
## Name of existing retention policy to write to. Empty string writes to
## the default retention policy.
# retention_policy = ""
## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
# write_consistency = "any"
## Timeout for HTTP messages.
# timeout = "5s"
## HTTP Basic Auth
# username = "telegraf"
# password = "metricsmetricsmetricsmetrics"
## HTTP User-Agent
# user_agent = "telegraf"
## UDP payload size is the maximum packet size to send.
# udp_payload = 512
## Optional SSL Config
# ssl_ca = "/etc/telegraf/ca.pem"
# ssl_cert = "/etc/telegraf/cert.pem"
# ssl_key = "/etc/telegraf/key.pem"
## Use SSL but skip chain & host verification
# insecure_skip_verify = false
## HTTP Proxy override, if unset values the standard proxy environment
## variables are consulted to determine which proxy, if any, should be used.
# http_proxy = "http://corporate.proxy:3128"
## Additional HTTP headers
# http_headers = {"X-Special-Header" = "Special-Value"}
## HTTP Content-Encoding for write request body, can be set to "gzip" to
## compress body or "identity" to apply no encoding.
# content_encoding = "identity"
## When true, Telegraf will output unsigned integers as unsigned values,
## i.e.: "42u". You will need a version of InfluxDB supporting unsigned
## integer values. Enabling this option will result in field type errors if
## existing data has been written.
# influx_uint_support = false
############################################################################### ###############################################################################
# INPUT PLUGINS # # INPUT PLUGINS #
@ -271,17 +210,17 @@
# insecure_skip_verify = false # insecure_skip_verify = false
# Monitor disks' temperatures using hddtemp # # Monitor disks' temperatures using hddtemp
[[inputs.hddtemp]] # [[inputs.hddtemp]]
## By default, telegraf gathers temps data from all disks detected by the # ## By default, telegraf gathers temps data from all disks detected by the
## hddtemp. # ## hddtemp.
## # ##
## Only collect temps from the selected disks. # ## Only collect temps from the selected disks.
## # ##
## A * as the device name will return the temperature values of all disks. # ## A * as the device name will return the temperature values of all disks.
## # ##
address = "hddtemp:7634" # address = "hddtemp:7634"
devices = ["*"] # devices = ["*"]
# Read metrics about network interface usage # Read metrics about network interface usage
[[inputs.net]] [[inputs.net]]
@ -420,3 +359,7 @@
## By default, don't gather zpool stats ## By default, don't gather zpool stats
poolMetrics = true poolMetrics = true
[[inputs.influxdb_v2_listener]]
## Address and port to host InfluxDB listener on
## (Double check the port. Could be 9999 if using OSS Beta)
service_address = ":8086"

View file

@ -1,25 +0,0 @@
---
apiVersion: 1
datasources:
- name: InfluxDB
type: influxdb
access: proxy
orgId: 1
url: http://{{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }}:8086
password:
user:
database: telegraf
basicAuth:
basicAuthUser:
basicAuthPassword:
withCredentials:
isDefault: true
jsonData:
timeInterval: "15s"
secureJsonData:
tlsCACert: "..."
tlsClientCert: "..."
tlsClientKey: "..."
version: 1
editable: false