Merge pull request #2 from paulfantom/init

initial migration of roles from cloudalchemy
This commit is contained in:
Ben Kochie 2022-09-23 13:31:17 +02:00 committed by GitHub
commit 0be6b43394
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
125 changed files with 4826 additions and 0 deletions

6
.ansible-lint Normal file
View file

@ -0,0 +1,6 @@
---
skip_list:
- '106'
- '204'
- '208'
- '602'

98
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,98 @@
# Contributor Guideline
This document provides an overview of how you can participate in improving this project or extending it. We are
grateful for all your help: bug reports and fixes, code contributions, documentation or ideas. Feel free to join, we
appreciate your support!!
## Communication
### GitHub repositories
Much of the issues, goals and ideas are tracked in the respective projects in GitHub. Please use this channel to report
bugs, ask questions, and request new features .
## git and GitHub
In order to contribute code please:
1. Fork the project on GitHub
2. Clone the project
3. Add changes (and tests)
4. Commit and push
5. Create a merge-request
To have your code merged, see the expectations listed below.
You can find a well-written guide [here](https://help.github.com/articles/fork-a-repo).
Please follow common commit best-practices. Be explicit, have a short summary, a well-written description and
references. This is especially important for the merge-request.
Some great guidelines can be found [here](https://wiki.openstack.org/wiki/GitCommitMessages) and
[here](http://robots.thoughtbot.com/5-useful-tips-for-a-better-commit-message).
## Releases
We try to stick to semantic versioning and our releases are automated. Release is created by assigning a keyword (in a
way similar to circle ci keyword [`[ci skip]`](https://docs.travis-ci.com/user/customizing-the-build#Skipping-a-build))
to a commit with merge request. Available keywords are (square brackets are important!):
* `[patch]`, `[fix]`, `[bugfix]` - for PATCH version release
* `[minor]`, `[feature]`, `[feat]` - for MINOR version release
* `[major]`, `[breaking change]` - for MAJOR version release
## Changelog
Changelog is generated automatically during release process and all information is taken from github issues, PRs and
labels.
## Expectations
### Keep it simple
We try to provide production ready ansible roles which should be as much zero-conf as possible but this doesn't mean to
overcomplicate things. Just follow [KISS](https://en.wikipedia.org/wiki/KISS_principle).
### Be explicit
* Please avoid using nonsensical property and variable names.
* Use self-describing attribute names for user configuration.
* In case of failures, communicate what happened and why a failure occurs to the user. Make it easy to track the code
or action that produced the error. Try to catch and handle errors if possible to provide improved failure messages.
### Add tests
We are striving to use at least two test scenarios located in [/molecule](molecule) directory. First one
([default](molecule/default)) is testing default configuration without any additional variables, second one
([alternative](molecule/alternative)) is testing what happens when many variables from
[/defaults/main.yml](defaults/main.yml) are changed. When adding new functionalities please add tests to proper
scenarios. Tests are written in testinfra framework and are located in `/tests` subdirectory of scenario directory
(for example default tests are in [/molecule/default/tests](molecule/default/tests)).
More information about:
- [testinfra](http://testinfra.readthedocs.io/en/latest/index.html)
- [molecule](https://molecule.readthedocs.io/en/latest/index.html)
### Follow best practices
Please follow [ansible best practices](http://docs.ansible.com/ansible/latest/playbooks_best_practices.html) and
especially provide meaningful names to tasks and even comments where needed.
Our test framework automatically lints code with [`yamllint`](https://github.com/adrienverge/yamllint),
[`ansible-lint`](https://github.com/willthames/ansible-lint), and [`flake8`](https://gitlab.com/pycqa/flake8) programs
so be sure to follow their rules.
Remember: Code is generally read much more often than written.
### Use Markdown
Wherever possible, please refrain from any other formats and stick to simple markdown.
## Requirements regarding roles design
We are trying to create the best and most secure installation method for non-containerized prometheus stack components.
To accomplish this all roles need to support:
- current and at least one previous ansible version
- systemd as the only available process manager
- at least latest debian and CentOS distributions

43
galaxy.yml Normal file
View file

@ -0,0 +1,43 @@
### REQUIRED
# The namespace of the collection. This can be a company/brand/organization or product namespace under which all
# content lives. May only contain alphanumeric lowercase characters and underscores. Namespaces cannot start with
# underscores or numbers and cannot contain consecutive underscores
namespace: community
name: prometheus
version: 1.0.0
readme: README.md
authors:
- Ben Kochie (https://github.com/SuperQ)
- Paweł Krupa (https://github.com/paulfantom)
description: your collection description
license_file: LICENSE
tags:
- monitoring
- prometheus
- metrics
- alerts
- alerting
- molecule
- cloud
# Collections that this collection requires to be installed for it to be usable. The key of the dict is the
# collection label 'namespace.name'. The value is a version range
# L(specifiers,https://python-semanticversion.readthedocs.io/en/latest/#requirement-specification). Multiple version
# range specifiers can be set and are separated by ','
dependencies: {}
repository: https://github.com/prometheus-community/ansible
documentation: https://github.com/prometheus-community/ansible/blob/main/docs
homepage: https://prometheus.io
issues: https://github.com/prometheus-community/ansible/issues
# A list of file glob-like patterns used to filter any files or directories that should not be included in the build
# artifact. A pattern is matched from the relative path of the file or directory of the collection directory. This
# uses 'fnmatch' to match the files or directories. Some directories and files like 'galaxy.yml', '*.pyc', '*.retry',
# and '.git' are always filtered
build_ignore:
- 'tests/*'
- '*.tar.gz'
- 'docs/*'

1
meta/runtime.yml Normal file
View file

@ -0,0 +1 @@
requires_ansible: '>=2.9.10'

31
plugins/README.md Normal file
View file

@ -0,0 +1,31 @@
# Collections Plugins Directory
This directory can be used to ship various plugins inside an Ansible collection. Each plugin is placed in a folder that
is named after the type of plugin it is in. It can also include the `module_utils` and `modules` directory that
would contain module utils and modules respectively.
Here is an example directory of the majority of plugins currently supported by Ansible:
```
└── plugins
├── action
├── become
├── cache
├── callback
├── cliconf
├── connection
├── filter
├── httpapi
├── inventory
├── lookup
├── module_utils
├── modules
├── netconf
├── shell
├── strategy
├── terminal
├── test
└── vars
```
A full list of plugin types can be found at [Working With Plugins](https://docs.ansible.com/ansible-core/2.12/plugins/plugins.html).

View file

@ -0,0 +1,99 @@
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/Human-dialog-warning.svg/2000px-Human-dialog-warning.svg.png" alt="alert logo" title="alert" align="right" height="60" /></p>
# Ansible Role: alertmanager
## Description
Deploy and manage Prometheus [alertmanager](https://github.com/prometheus/alertmanager) service using ansible.
## Requirements
- Ansible >= 2.7 (It might work on previous versions, but we cannot guarantee it)
It would be nice to have prometheus installed somewhere
## Role Variables
All variables which can be overridden are stored in [defaults/main.yml](defaults/main.yml) file as well as in table below.
| Name | Default Value | Description |
| -------------- | ------------- | -----------------------------------|
| `alertmanager_version` | 0.21.0 | Alertmanager package version. Also accepts `latest` as parameter. |
| `alertmanager_binary_local_dir` | "" | Allows to use local packages instead of ones distributed on github. As parameter it takes a directory where `alertmanager` AND `amtool` binaries are stored on host on which ansible is ran. This overrides `alertmanager_version` parameter |
| `alertmanager_web_listen_address` | 0.0.0.0:9093 | Address on which alertmanager will be listening |
| `alertmanager_web_external_url` | http://localhost:9093/ | External address on which alertmanager is available. Useful when behind reverse proxy. Ex. example.org/alertmanager |
| `alertmanager_config_dir` | /etc/alertmanager | Path to directory with alertmanager configuration |
| `alertmanager_db_dir` | /var/lib/alertmanager | Path to directory with alertmanager database |
| `alertmanager_config_file` | alertmanager.yml.j2 | Variable used to provide custom alertmanager configuration file in form of ansible template |
| `alertmanager_config_flags_extra` | {} | Additional configuration flags passed to prometheus binary at startup |
| `alertmanager_template_files` | ['alertmanager/templates/*.tmpl'] | List of folders where ansible will look for template files which will be copied to `{{ alertmanager_config_dir }}/templates/`. Files must have `*.tmpl` extension |
| `alertmanager_resolve_timeout` | 3m | Time after which an alert is declared resolved |
| `alertmanager_smtp` | {} | SMTP (email) configuration |
| `alertmanager_http_config` | {} | Http config for using custom webhooks |
| `alertmanager_slack_api_url` | "" | Slack webhook url |
| `alertmanager_pagerduty_url` | "" | Pagerduty webhook url |
| `alertmanager_opsgenie_api_key` | "" | Opsgenie webhook key |
| `alertmanager_opsgenie_api_url` | "" | Opsgenie webhook url |
| `alertmanager_victorops_api_key` | "" | VictorOps webhook key |
| `alertmanager_victorops_api_url` | "" | VictorOps webhook url |
| `alertmanager_hipchat_api_url` | "" | Hipchat webhook url |
| `alertmanager_hipchat_auth_token` | "" | Hipchat authentication token |
| `alertmanager_wechat_url` | "" | Enterprise WeChat webhook url |
| `alertmanager_wechat_secret` | "" | Enterprise WeChat secret token |
| `alertmanager_wechat_corp_id` | "" | Enterprise WeChat corporation id |
| `alertmanager_cluster` | {listen-address: ""} | HA cluster network configuration. Disabled by default. More information in [alertmanager readme](https://github.com/prometheus/alertmanager#high-availability) |
| `alertmanager_receivers` | [] | A list of notification receivers. Configuration same as in [official docs](https://prometheus.io/docs/alerting/configuration/#<receiver>) |
| `alertmanager_inhibit_rules` | [] | List of inhibition rules. Same as in [official docs](https://prometheus.io/docs/alerting/configuration/#inhibit_rule) |
| `alertmanager_route` | {} | Alert routing. More in [official docs](https://prometheus.io/docs/alerting/configuration/#<route>) |
| `alertmanager_amtool_config_file` | amtool.yml.j2 | Template for amtool config |
| `alertmanager_amtool_config_alertmanager_url` | `alertmanager_web_external_url` | URL of the alertmanager |
| `alertmanager_amtool_config_output` | extended | Extended output, use `""` for simple output. |
## Example
### Playbook
```yaml
---
hosts: all
roles:
- ansible-alertmanager
vars:
alertmanager_version: latest
alertmanager_slack_api_url: "http://example.com"
alertmanager_receivers:
- name: slack
slack_configs:
- send_resolved: true
channel: '#alerts'
alertmanager_route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: slack
```
### Demo site
We provide demo site for full monitoring solution based on prometheus and grafana. Repository with code and links to running instances is [available on github](https://github.com/prometheus/demo-site) and site is hosted on [DigitalOcean](https://digitalocean.com).
## Local Testing
The preferred way of locally testing the role is to use Docker and [molecule](https://github.com/ansible-community/molecule) (v3.x). You will have to install Docker on your system. See "Get started" for a Docker package suitable to for your system. Running your tests is as simple as executing `molecule test`.
## Continuous Integration
Combining molecule and circle CI allows us to test how new PRs will behave when used with multiple ansible versions and multiple operating systems. This also allows use to create test scenarios for different role configurations. As a result we have a quite large test matrix which can take more time than local testing, so please be patient.
## Contributing
See [contributor guideline](CONTRIBUTING.md).
## Troubleshooting
See [troubleshooting](TROUBLESHOOTING.md).
## License
This project is licensed under MIT License. See [LICENSE](/LICENSE) for more details.

View file

@ -0,0 +1,124 @@
---
alertmanager_version: 0.21.0
alertmanager_binary_local_dir: ''
alertmanager_config_dir: /etc/alertmanager
alertmanager_db_dir: /var/lib/alertmanager
alertmanager_config_file: 'alertmanager.yml.j2'
alertmanager_template_files:
- alertmanager/templates/*.tmpl
alertmanager_web_listen_address: '0.0.0.0:9093'
alertmanager_web_external_url: 'http://localhost:9093/'
alertmanager_http_config: {}
alertmanager_resolve_timeout: 3m
alertmanager_config_flags_extra: {}
# alertmanager_config_flags_extra:
# data.retention: 10
# SMTP default params
alertmanager_smtp: {}
# alertmanager_smtp:
# from: ''
# smarthost: ''
# auth_username: ''
# auth_password: ''
# auth_secret: ''
# auth_identity: ''
# require_tls: "True"
# Default values you can see here -> https://prometheus.io/docs/alerting/configuration/
alertmanager_slack_api_url: ''
alertmanager_pagerduty_url: ''
alertmanager_opsgenie_api_key: ''
alertmanager_opsgenie_api_url: ''
alertmanager_victorops_api_key: ''
alertmanager_victorops_api_url: ''
alertmanager_hipchat_api_url: ''
alertmanager_hipchat_auth_token: ''
alertmanager_wechat_url: ''
alertmanager_wechat_secret: ''
alertmanager_wechat_corp_id: ''
# First read: https://github.com/prometheus/alertmanager#high-availability
alertmanager_cluster:
listen-address: ""
# alertmanager_cluster:
# listen-address: "{{ ansible_default_ipv4.address }}:6783"
# peers:
# - "{{ ansible_default_ipv4.address }}:6783"
# - "demo.cloudalchemy.org:6783"
alertmanager_receivers: []
# alertmanager_receivers:
# - name: slack
# slack_configs:
# - send_resolved: true
# channel: '#alerts'
alertmanager_inhibit_rules: []
# alertmanager_inhibit_rules:
# - target_match:
# label: value
# source_match:
# label: value
# equal: ['dc', 'rack']
# - target_match_re:
# label: value1|value2
# source_match_re:
# label: value3|value5
alertmanager_route: {}
# alertmanager_route:
# group_by: ['alertname', 'cluster', 'service']
# group_wait: 30s
# group_interval: 5m
# repeat_interval: 4h
# receiver: slack
# # This routes performs a regular expression match on alert labels to
# # catch alerts that are related to a list of services.
# - match_re:
# service: ^(foo1|foo2|baz)$
# receiver: team-X-mails
# # The service has a sub-route for critical alerts, any alerts
# # that do not match, i.e. severity != critical, fall-back to the
# # parent node and are sent to 'team-X-mails'
# routes:
# - match:
# severity: critical
# receiver: team-X-pager
# - match:
# service: files
# receiver: team-Y-mails
# routes:
# - match:
# severity: critical
# receiver: team-Y-pager
# # This route handles all alerts coming from a database service. If there's
# # no team to handle it, it defaults to the DB team.
# - match:
# service: database
# receiver: team-DB-pager
# # Also group alerts by affected database.
# group_by: [alertname, cluster, database]
# routes:
# - match:
# owner: team-X
# receiver: team-X-pager
# - match:
# owner: team-Y
# receiver: team-Y-pager
# The template for amtool's configuration
alertmanager_amtool_config_file: 'amtool.yml.j2'
# Location (URL) of the alertmanager
alertmanager_amtool_config_alertmanager_url: "{{ alertmanager_web_external_url }}"
# Extended output of `amtool` commands, use '' for less verbosity
alertmanager_amtool_config_output: 'extended'

View file

@ -0,0 +1,13 @@
---
- name: restart alertmanager
become: true
systemd:
daemon_reload: true
name: alertmanager
state: restarted
- name: reload alertmanager
become: true
systemd:
name: alertmanager
state: reloaded

View file

@ -0,0 +1,31 @@
---
galaxy_info:
author: Prometheus Community
description: Prometheus Alertmanager service
license: Apache
company: none
min_ansible_version: "2.7"
platforms:
- name: Ubuntu
versions:
- bionic
- xenial
- name: Debian
versions:
- stretch
- buster
- name: EL
versions:
- 7
- 8
- name: Fedora
versions:
- 30
- 31
galaxy_tags:
- monitoring
- prometheus
- alerting
- alert
dependencies: []

View file

@ -0,0 +1,70 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,34 @@
---
- hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.alertmanager
vars:
alertmanager_binary_local_dir: '/tmp/alertmanager-linux-amd64'
alertmanager_config_dir: /opt/am/etc
alertmanager_db_dir: /opt/am/lib
alertmanager_web_listen_address: '127.0.0.1:9093'
alertmanager_web_external_url: 'http://localhost:9093/alertmanager'
alertmanager_resolve_timeout: 10m
alertmanager_slack_api_url: "http://example.com"
alertmanager_receivers:
- name: slack
slack_configs:
- send_resolved: true
api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
channel: '#alerts'
alertmanager_route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: slack
routes:
- match_re:
service: ^(foo1|foo2|baz)$
receiver: slack
alertmanager_mesh:
listen-address: "127.0.0.1:6783"
peers:
- "127.0.0.1:6783"
- "demo.cloudalchemy.org:6783"

View file

@ -0,0 +1,37 @@
---
- name: Prepare
hosts: localhost
gather_facts: false
vars:
# Version seeds to be specified here as molecule doesn't have access to ansible_version at this stage
version: 0.19.0
tasks:
- name: download alertmanager binary to local folder
become: false
get_url:
url: "https://github.com/prometheus/alertmanager/releases/download/v{{ version }}/alertmanager-{{ version }}.linux-amd64.tar.gz"
dest: "/tmp/alertmanager-{{ version }}.linux-amd64.tar.gz"
register: _download_archive
until: _download_archive is succeeded
retries: 5
delay: 2
run_once: true
check_mode: false
- name: unpack alertmanager binaries
become: false
unarchive:
src: "/tmp/alertmanager-{{ version }}.linux-amd64.tar.gz"
dest: "/tmp"
creates: "/tmp/alertmanager-{{ version }}.linux-amd64/alertmanager"
run_once: true
check_mode: false
- name: link to alertmanager binaries directory
become: false
file:
src: "/tmp/alertmanager-{{ version }}.linux-amd64"
dest: "/tmp/alertmanager-linux-amd64"
state: link
run_once: true
check_mode: false

View file

@ -0,0 +1,43 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("dirs", [
"/opt/am/etc",
"/opt/am/etc/templates",
"/opt/am/lib"
])
def test_directories(host, dirs):
d = host.file(dirs)
assert d.is_directory
assert d.exists
@pytest.mark.parametrize("files", [
"/usr/local/bin/alertmanager",
"/usr/local/bin/amtool",
"/opt/am/etc/alertmanager.yml",
"/etc/systemd/system/alertmanager.service"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
def test_service(host):
s = host.service("alertmanager")
# assert s.is_enabled
assert s.is_running
@pytest.mark.parametrize("sockets", [
"tcp://127.0.0.1:9093",
"tcp://127.0.0.1:6783"
])
def test_socket(host, sockets):
assert host.socket(sockets).is_listening

View file

@ -0,0 +1,70 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,18 @@
---
- hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.alertmanager
vars:
alertmanager_slack_api_url: "http://example.com"
alertmanager_receivers:
- name: slack
slack_configs:
- send_resolved: true
channel: '#alerts'
alertmanager_route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: slack

View file

@ -0,0 +1,5 @@
---
- name: Prepare
hosts: all
gather_facts: false
tasks: []

View file

@ -0,0 +1,39 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("dirs", [
"/etc/alertmanager",
"/etc/alertmanager/templates",
"/var/lib/alertmanager"
])
def test_directories(host, dirs):
d = host.file(dirs)
assert d.is_directory
assert d.exists
@pytest.mark.parametrize("files", [
"/usr/local/bin/alertmanager",
"/usr/local/bin/amtool",
"/etc/alertmanager/alertmanager.yml",
"/etc/systemd/system/alertmanager.service"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
def test_service(host):
s = host.service("alertmanager")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
assert host.socket("tcp://0.0.0.0:9093").is_listening

View file

@ -0,0 +1,35 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
create: ../default/create.yml
prepare: ../default/prepare.yml
converge: playbook.yml
destroy: ../default/destroy.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,20 @@
---
- name: Run role
hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.alertmanager
vars:
alertmanager_version: latest
alertmanager_slack_api_url: "http://example.com"
alertmanager_receivers:
- name: slack
slack_configs:
- send_resolved: true
channel: '#alerts'
alertmanager_route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: slack

View file

@ -0,0 +1,28 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("files", [
"/etc/systemd/system/alertmanager.service",
"/usr/local/bin/alertmanager",
"/usr/local/bin/amtool"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
def test_service(host):
s = host.service("alertmanager")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
s = host.socket("tcp://0.0.0.0:9093")
assert s.is_listening

View file

@ -0,0 +1,43 @@
---
- name: copy amtool config
template:
force: true
src: "{{ alertmanager_amtool_config_file }}"
dest: "{{ _alertmanager_amtool_config_dir }}/config.yml"
owner: alertmanager
group: alertmanager
mode: 0644
- name: copy alertmanager config
template:
force: true
src: "{{ alertmanager_config_file }}"
dest: "{{ alertmanager_config_dir }}/alertmanager.yml"
owner: alertmanager
group: alertmanager
mode: 0644
validate: "{{ _alertmanager_binary_install_dir }}/amtool check-config %s"
notify:
- restart alertmanager
- name: create systemd service unit
template:
src: alertmanager.service.j2
dest: /etc/systemd/system/alertmanager.service
owner: root
group: root
mode: 0644
notify:
- restart alertmanager
- name: copy alertmanager template files
copy:
src: "{{ item }}"
dest: "{{ alertmanager_config_dir }}/templates/"
force: true
owner: alertmanager
group: alertmanager
mode: 0644
with_fileglob: "{{ alertmanager_template_files }}"
notify:
- restart alertmanager

View file

@ -0,0 +1,80 @@
---
- name: create alertmanager system group
group:
name: alertmanager
system: true
state: present
- name: create alertmanager system user
user:
name: alertmanager
system: true
shell: "/usr/sbin/nologin"
group: alertmanager
createhome: false
- name: create alertmanager directories
file:
path: "{{ item }}"
state: directory
owner: alertmanager
group: alertmanager
mode: 0755
with_items:
- "{{ alertmanager_config_dir }}"
- "{{ alertmanager_config_dir }}/templates"
- "{{ alertmanager_db_dir }}"
- "{{ _alertmanager_amtool_config_dir }}"
- block:
- name: download alertmanager binary to local folder
become: false
get_url:
url: "https://github.com/prometheus/alertmanager/releases/download/v{{ alertmanager_version }}/alertmanager-{{ alertmanager_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp/alertmanager-{{ alertmanager_version }}.linux-{{ go_arch }}.tar.gz"
checksum: "sha256:{{ __alertmanager_checksum }}"
register: _download_archive
until: _download_archive is succeeded
retries: 5
delay: 2
# run_once: true # <-- this can't be set due to multi-arch support
delegate_to: localhost
check_mode: false
- name: unpack alertmanager binaries
become: false
unarchive:
src: "/tmp/alertmanager-{{ alertmanager_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp"
mode: 0755
creates: "/tmp/alertmanager-{{ alertmanager_version }}.linux-{{ go_arch }}/alertmanager"
delegate_to: localhost
check_mode: false
- name: propagate official alertmanager and amtool binaries
copy:
src: "/tmp/alertmanager-{{ alertmanager_version }}.linux-{{ go_arch }}/{{ item }}"
dest: "{{ _alertmanager_binary_install_dir }}/{{ item }}"
mode: 0755
owner: root
group: root
with_items:
- alertmanager
- amtool
notify:
- restart alertmanager
when: alertmanager_binary_local_dir | length == 0
- name: propagate locally distributed alertmanager and amtool binaries
copy:
src: "{{ alertmanager_binary_local_dir }}/{{ item }}"
dest: "{{ _alertmanager_binary_install_dir }}/{{ item }}"
mode: 0755
owner: root
group: root
with_items:
- alertmanager
- amtool
when: alertmanager_binary_local_dir | length > 0
notify:
- restart alertmanager

View file

@ -0,0 +1,35 @@
---
- include: preflight.yml
tags:
- alertmanager_install
- alertmanager_configure
- alertmanager_run
- include: install.yml
become: true
tags:
- alertmanager_install
- import_tasks: selinux.yml
become: true
when: ansible_selinux.status == "enabled"
tags:
- alertmanager_configure
- include: configure.yml
become: true
tags:
- alertmanager_configure
- name: ensure alertmanager service is started and enabled
become: true
systemd:
daemon_reload: true
name: alertmanager
state: started
enabled: true
tags:
- alertmanager_run
- name: Flush alertmangaer handlers after run.
meta: flush_handlers

View file

@ -0,0 +1,135 @@
---
- name: Assert usage of systemd as an init system
assert:
that: ansible_service_mgr == 'systemd'
msg: "This module only works with systemd"
- name: Get systemd version
command: systemctl --version
changed_when: false
check_mode: false
register: __systemd_version
tags:
- skip_ansible_lint
- name: Set systemd version fact
set_fact:
alertmanager_systemd_version: "{{ __systemd_version.stdout_lines[0].split(' ')[-1] }}"
- block:
- name: Get latest release
uri:
url: "https://api.github.com/repos/prometheus/alertmanager/releases/latest"
method: GET
return_content: true
status_code: 200
body_format: json
user: "{{ lookup('env', 'GH_USER') | default(omit) }}"
password: "{{ lookup('env', 'GH_TOKEN') | default(omit) }}"
no_log: "{{ not lookup('env', 'MOLECULE_DEBUG') | bool }}"
register: _latest_release
until: _latest_release.status == 200
retries: 5
- name: "Set alertmanager version to {{ _latest_release.json.tag_name[1:] }}"
set_fact:
alertmanager_version: "{{ _latest_release.json.tag_name[1:] }}"
alertmanager_checksum_url: "https://github.com/prometheus/alertmanager/releases/download/v{{ alertmanager_version }}/sha256sums.txt"
when:
- alertmanager_version == "latest"
- alertmanager_binary_local_dir | length == 0
- block:
- name: "Get checksum list"
set_fact:
__alertmanager_checksums: "{{ lookup('url', 'https://github.com/prometheus/alertmanager/releases/download/v' + alertmanager_version + '/sha256sums.txt', wantlist=True) | list }}"
run_once: true
- name: "Get checksum for {{ go_arch }} architecture"
set_fact:
__alertmanager_checksum: "{{ item.split(' ')[0] }}"
with_items: "{{ __alertmanager_checksums }}"
when:
- "('linux-' + go_arch + '.tar.gz') in item"
delegate_to: localhost
when:
- alertmanager_binary_local_dir | length == 0
- name: Fail when extra config flags are duplicating ansible variables
fail:
msg: "Detected duplicate configuration entry. Please check your ansible variables and role README.md."
when:
(alertmanager_config_flags_extra['config.file'] is defined) or
(alertmanager_config_flags_extra['storage.path'] is defined) or
(alertmanager_config_flags_extra['web.listen-address'] is defined) or
(alertmanager_config_flags_extra['web.external-url'] is defined)
- name: Fail when there are no receivers defined
fail:
msg: "Configure alert receivers (`alertmanager_receivers`). Otherwise alertmanager won't know where to send alerts."
when:
- alertmanager_config_file == 'alertmanager.yml.j2'
- alertmanager_receivers == []
- name: Fail when there is no alert route defined
fail:
msg: "Configure alert routing (`alertmanager_route`). Otherwise alertmanager won't know how to send alerts."
when:
- alertmanager_config_file == 'alertmanager.yml.j2'
- alertmanager_route == {}
- name: "DEPRECATION WARNING: alertmanager version 0.15 and earlier are no longer supported and will be dropped from future releases"
ignore_errors: true
fail:
msg: "Please use `alertmanager_version >= v0.16.0`"
when: alertmanager_version is version_compare('0.16.0', '<')
- block:
- name: Backward compatibility of variable [part 1]
set_fact:
alertmanager_config_flags_extra: "{{ alertmanager_cli_flags }}"
- name: "DEPRECATION WARNING: `alertmanager_cli_flags` is no longer supported and will be dropped from future releases"
ignore_errors: true
fail:
msg: "Please use `alertmanager_config_flags_extra` instead of `alertmanager_cli_flags`"
when: alertmanager_cli_flags is defined
- block:
- name: Backward compatibility of variable [part 2]
set_fact:
alertmanager_web_listen_address: "{{ alertmanager_listen_address }}"
- name: "DEPRECATION WARNING: `alertmanager_listen_address` is no longer supported and will be dropped from future releases"
ignore_errors: true
fail:
msg: "Please use `alertmanager_web_listen_address` instead of `alertmanager_listen_address`"
when: alertmanager_listen_address is defined
- block:
- name: Backward compatibility of variable [part 3]
set_fact:
alertmanager_web_external_url: "{{ alertmanager_external_url }}"
- name: "DEPRECATION WARNING: `alertmanager_external_url` is no longer supported and will be dropped from future releases"
ignore_errors: true
fail:
msg: "Please use `alertmanager_web_external_url` instead of `alertmanager_external_url`"
when: alertmanager_external_url is defined
- block:
- name: HA config compatibility with alertmanager<0.15.0
set_fact:
alertmanager_cluster: "{{ alertmanager_mesh }}"
- name: "DEPRECATION WARNING: `alertmanager_mesh` is no longer supported and will be dropped from future releases"
ignore_errors: true
fail:
msg: "Please use `alertmanager_cluster` instead of `alertmanager_cluster`"
when: alertmanager_mesh is defined
- name: "`alertmanager_child_routes` is no longer supported"
fail:
msg: "Please move content of `alertmanager_child_routes` to `alertmanager_route.routes` as the former variable is deprecated and will be removed in future versions."
when: alertmanager_child_routes is defined

View file

@ -0,0 +1,39 @@
---
- name: Install selinux python packages [RHEL]
package:
name:
- "{{ ( (ansible_facts.distribution_major_version | int) < 8) | ternary('libselinux-python','python3-libselinux') }}"
- "{{ ( (ansible_facts.distribution_major_version | int) < 8) | ternary('libselinux-python','python3-policycoreutils') }}"
state: present
register: _install_selinux_packages
until: _install_selinux_packages is success
retries: 5
delay: 2
when:
- (ansible_distribution | lower == "redhat") or
(ansible_distribution | lower == "centos")
- name: Install selinux python packages [Fedora]
package:
name:
- "{{ ( (ansible_facts.distribution_major_version | int) < 29) | ternary('libselinux-python','python3-libselinux') }}"
- "{{ ( (ansible_facts.distribution_major_version | int) < 29) | ternary('libselinux-python','python3-policycoreutils') }}"
state: present
register: _install_selinux_packages
until: _install_selinux_packages is success
retries: 5
delay: 2
when:
- ansible_distribution | lower == "fedora"
- name: Install selinux python packages [clearlinux]
package:
name: sysadmin-basic
state: present
register: _install_selinux_packages
until: _install_selinux_packages is success
retries: 5
delay: 2
when:
- ansible_distribution | lower == "clearlinux"

View file

@ -0,0 +1,65 @@
{%- if alertmanager_version is version_compare('0.13.0', '>=') %}
{%- set pre = '-' %}
{%- else %}
{%- set pre = '' %}
{%- endif %}
{%- if alertmanager_version is version_compare('0.15.0', '<') %}
{%- set cluster_flag = 'mesh' %}
{%- else %}
{%- set cluster_flag = 'cluster' %}
{%- endif %}
{{ ansible_managed | comment }}
[Unit]
Description=Prometheus Alertmanager
After=network-online.target
StartLimitInterval=0
StartLimitIntervalSec=0
[Service]
Type=simple
PIDFile=/var/run/alertmanager.pid
User=alertmanager
Group=alertmanager
ExecReload=/bin/kill -HUP $MAINPID
ExecStart={{ _alertmanager_binary_install_dir }}/alertmanager \
{% for option, value in (alertmanager_cluster.items() | sort) %}
{% if option == "peers" %}
{% for peer in value %}
{{ pre }}-{{ cluster_flag }}.peer={{ peer }} \
{% endfor %}
{% else %}
{{ pre }}-{{ cluster_flag }}.{{ option }}={{ value }} \
{% endif %}
{% endfor %}
{{ pre }}-config.file={{ alertmanager_config_dir }}/alertmanager.yml \
{{ pre }}-storage.path={{ alertmanager_db_dir }} \
{{ pre }}-web.listen-address={{ alertmanager_web_listen_address }} \
{{ pre }}-web.external-url={{ alertmanager_web_external_url }}{% for flag, flag_value in alertmanager_config_flags_extra.items() %} \
{{ pre }}-{{ flag }}={{ flag_value }}{% endfor %}
SyslogIdentifier=alertmanager
Restart=always
RestartSec=5
CapabilityBoundingSet=CAP_SET_UID
LockPersonality=true
NoNewPrivileges=true
MemoryDenyWriteExecute=true
PrivateTmp=true
ProtectHome=true
ReadWriteDirectories={{ alertmanager_db_dir }}
RemoveIPC=true
RestrictSUIDSGID=true
{% if alertmanager_systemd_version | int >= 232 %}
PrivateUsers=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes
ProtectSystem=strict
{% else %}
ProtectSystem=full
{% endif %}
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,56 @@
{{ ansible_managed | comment }}
global:
resolve_timeout: {{ alertmanager_resolve_timeout | quote}}
{% for key, value in alertmanager_smtp.items() %}
smtp_{{ key }}: {{ value | quote }}
{% endfor %}
{% if alertmanager_slack_api_url | string | length %}
slack_api_url: {{ alertmanager_slack_api_url | quote }}
{% endif %}
{% if alertmanager_http_config | length %}
http_config:
{{ alertmanager_http_config | to_nice_yaml(indent=2) | indent(4, False)}}
{% endif %}
{% if alertmanager_pagerduty_url | string | length %}
pagerduty_url: {{ alertmanager_pagerduty_url | quote }}
{% endif %}
{% if alertmanager_opsgenie_api_key | string | length %}
opsgenie_api_key: {{ alertmanager_opsgenie_api_key | quote }}
{% endif %}
{% if alertmanager_opsgenie_api_url | string | length %}
opsgenie_api_url: {{ alertmanager_opsgenie_api_url | quote }}
{% endif %}
{% if alertmanager_victorops_api_key | string | length %}
victorops_api_key: {{ alertmanager_victorops_api_key | quote }}
{% endif %}
{% if alertmanager_victorops_api_url | string | length %}
victorops_api_url: {{ alertmanager_victorops_api_url | quote }}
{% endif %}
{% if alertmanager_hipchat_api_url | string | length %}
hipchat_api_url: {{ alertmanager_hipchat_api_url | quote }}
{% endif %}
{% if alertmanager_hipchat_auth_token | string | length %}
hipchat_auth_token: {{ alertmanager_hipchat_auth_token | quote }}
{% endif %}
{% if alertmanager_wechat_url | string | length %}
wechat_api_url: {{ alertmanager_wechat_url | quote }}
{% endif %}
{% if alertmanager_wechat_secret | string | length %}
wechat_api_secret: {{ alertmanager_wechat_secret | quote }}
{% endif %}
{% if alertmanager_wechat_corp_id | string | length %}
wechat_api_corp_id: {{ alertmanager_wechat_corp_id | quote }}
{% endif %}
templates:
- '{{ alertmanager_config_dir }}/templates/*.tmpl'
{% if alertmanager_receivers | length %}
receivers:
{{ alertmanager_receivers | to_nice_yaml(indent=2) }}
{% endif %}
{% if alertmanager_inhibit_rules | length %}
inhibit_rules:
{{ alertmanager_inhibit_rules | to_nice_yaml(indent=2) }}
{% endif %}
route:
{{ alertmanager_route | to_nice_yaml(indent=2) | indent(2, False) }}

View file

@ -0,0 +1,4 @@
alertmanager.url: "{{ alertmanager_amtool_config_alertmanager_url }}"
{%if alertmanager_amtool_config_output != "" %}
output: "{{ alertmanager_amtool_config_output }}"
{% endif %}

View file

@ -0,0 +1,13 @@
---
go_arch_map:
i386: '386'
x86_64: 'amd64'
aarch64: 'arm64'
armv7l: 'armv7'
armv6l: 'armv6'
go_arch: "{{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}"
_alertmanager_binary_install_dir: '/usr/local/bin'
# The expected location of the amtool configuration file
_alertmanager_amtool_config_dir: '/etc/amtool'

View file

@ -0,0 +1,58 @@
<p><img src="http://jacobsmedia.com/wp-content/uploads/2015/08/black-box-edit.png" alt="blackbox logo" title="blackbox" align="right" height="60" /></p>
# Ansible Role: Blackbox Exporter
# Description
Deploy and manage [blackbox exporter](https://github.com/prometheus/blackbox_exporter) which allows blackbox probing of endpoints over HTTP, HTTPS, DNS, TCP and ICMP.
## Requirements
- Ansible >= 2.7 (It might work on previous versions, but we cannot guarantee it)
- gnu-tar on Mac deployer host (`brew install gnu-tar`)
## Role Variables
All variables which can be overridden are stored in [defaults/main.yml](defaults/main.yml) file as well as in table below.
| Name | Default Value | Description |
| -------------- | ------------- | -----------------------------------|
| `blackbox_exporter_version` | 0.18.0 | Blackbox exporter package version |
| `blackbox_exporter_web_listen_address` | 0.0.0.0:9115 | Address on which blackbox exporter will be listening |
| `blackbox_exporter_cli_flags` | {} | Additional configuration flags passed to blackbox exporter binary at startup |
| `blackbox_exporter_configuration_modules` | http_2xx: { prober: http, timeout: 5s, http: '' } | |
## Example
### Playbook
```yaml
- hosts: all
become: true
roles:
- cloudalchemy.blackbox-exporter
```
### Demo site
We provide demo site for full monitoring solution based on prometheus and grafana. Repository with code and links to running instances is [available on github](https://github.com/prometheus/demo-site) and site is hosted on [DigitalOcean](https://digitalocean.com).
## Local Testing
The preferred way of locally testing the role is to use Docker and [molecule](https://github.com/ansible-community/molecule) (v3.x). You will have to install Docker on your system. See "Get started" for a Docker package suitable to for your system. Running your tests is as simple as executing `molecule test`.
## Continuous Intergation
Combining molecule and circle CI allows us to test how new PRs will behave when used with multiple ansible versions and multiple operating systems. This also allows use to create test scenarios for different role configurations. As a result we have a quite large test matrix which can take more time than local testing, so please be patient.
## Contributing
See [contributor guideline](CONTRIBUTING.md).
## Troubleshooting
See [troubleshooting](TROUBLESHOOTING.md).
## License
This project is licensed under MIT License. See [LICENSE](/LICENSE) for more details.

View file

@ -0,0 +1,63 @@
---
blackbox_exporter_version: 0.18.0
blackbox_exporter_web_listen_address: "0.0.0.0:9115"
blackbox_exporter_cli_flags: {}
# blackbox_exporter_cli_flags:
# log.level: "warn"
blackbox_exporter_configuration_modules:
http_2xx:
prober: http
timeout: 5s
http:
method: GET
valid_status_codes: []
# http_post_2xx:
# prober: http
# timeout: 5s
# http:
# method: POST
# basic_auth:
# username: "username"
# password: "mysecret"
# tcp_connect:
# prober: tcp
# timeout: 5s
# pop3s_banner:
# prober: tcp
# tcp:
# query_response:
# - expect: "^+OK"
# tls: true
# tls_config:
# insecure_skip_verify: false
# ssh_banner:
# prober: tcp
# timeout: 5s
# tcp:
# query_response:
# - expect: "^SSH-2.0-"
# irc_banner:
# prober: tcp
# timeout: 5s
# tcp:
# query_response:
# - send: "NICK prober"
# - send: "USER prober prober prober :prober"
# - expect: "PING :([^ ]+)"
# send: "PONG ${1}"
# - expect: "^:[^ ]+ 001"
# icmp_test:
# prober: icmp
# timeout: 5s
# icmp:
# preferred_ip_protocol: ip4
# dns_test:
# prober: dns
# timeout: 5s
# dns:
# preferred_ip_protocol: ip6
# validate_answer_rrs:
# fail_if_matches_regexp: [test]

View file

@ -0,0 +1,13 @@
---
- name: restart blackbox exporter
become: true
systemd:
daemon_reload: true
name: blackbox_exporter
state: restarted
- name: reload blackbox exporter
become: true
systemd:
name: blackbox_exporter
state: reloaded

View file

@ -0,0 +1,33 @@
---
galaxy_info:
author: Prometheus Community
description: Prometheus Blackbox Exporter
license: Apache
company: none
min_ansible_version: "2.7"
platforms:
- name: Ubuntu
versions:
- bionic
- xenial
- name: Debian
versions:
- stretch
- buster
- name: EL
versions:
- 7
- 8
- name: Fedora
versions:
- 30
- 31
galaxy_tags:
- exporter
- monitoring
- prometheus
- metrics
- blackbox
- probe
dependencies: []

View file

@ -0,0 +1,70 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,14 @@
---
- name: Run role
hosts: all
any_errors_fatal: true
roles:
- ansible-blackbox-exporter
vars:
blackbox_exporter_web_listen_address: "127.0.0.1:9000"
blackbox_exporter_cli_flags:
log.level: "warn"
blackbox_exporter_configuration_modules:
tcp_connect:
prober: tcp
timeout: 5s

View file

@ -0,0 +1,5 @@
---
- name: Prepare
hosts: all
gather_facts: false
tasks: []

View file

@ -0,0 +1,28 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("files", [
"/etc/blackbox_exporter.yml",
"/etc/systemd/system/blackbox_exporter.service",
"/usr/local/bin/blackbox_exporter"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
def test_service(host):
s = host.service("blackbox_exporter")
assert s.is_running
# assert s.is_enabled
def test_socket(host):
s = host.socket("tcp://127.0.0.1:9000")
assert s.is_listening

View file

@ -0,0 +1,70 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,5 @@
---
- hosts: all
any_errors_fatal: true
roles:
- ansible-blackbox-exporter

View file

@ -0,0 +1,5 @@
---
- name: Prepare
hosts: all
gather_facts: false
tasks: []

View file

@ -0,0 +1,28 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("files", [
"/etc/blackbox_exporter.yml",
"/etc/systemd/system/blackbox_exporter.service",
"/usr/local/bin/blackbox_exporter"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
def test_service(host):
s = host.service("blackbox_exporter")
assert s.is_running
# assert s.is_enabled
def test_socket(host):
s = host.socket("tcp://0.0.0.0:9115")
assert s.is_listening

View file

@ -0,0 +1,20 @@
---
- name: create systemd service unit
template:
src: blackbox_exporter.service.j2
dest: /etc/systemd/system/blackbox_exporter.service
owner: root
group: root
mode: 0644
notify:
- restart blackbox exporter
- name: configure blackbox exporter
template:
src: blackbox_exporter.yml.j2
dest: /etc/blackbox_exporter.yml
owner: blackbox-exp
group: blackbox-exp
mode: 0644
notify:
- reload blackbox exporter

View file

@ -0,0 +1,60 @@
---
- name: create blackbox_exporter system group
group:
name: blackbox-exp
system: true
state: present
- name: create blackbox_exporter system user
user:
name: blackbox-exp
system: true
shell: "/usr/sbin/nologin"
group: blackbox-exp
createhome: false
- name: download blackbox exporter binary to local folder
become: false
unarchive:
src: "https://github.com/prometheus/blackbox_exporter/releases/download/v{{ blackbox_exporter_version }}/blackbox_exporter-{{ blackbox_exporter_version }}.linux-{{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}.tar.gz"
dest: "/tmp"
remote_src: true
creates: "/tmp/blackbox_exporter-{{ blackbox_exporter_version }}.linux-{{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}/blackbox_exporter"
register: _download_binary
until: _download_binary is succeeded
retries: 5
delay: 2
delegate_to: localhost
check_mode: false
- name: propagate blackbox exporter binary
copy:
src: "/tmp/blackbox_exporter-{{ blackbox_exporter_version }}.linux-{{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}/blackbox_exporter"
dest: "/usr/local/bin/blackbox_exporter"
mode: 0750
owner: blackbox-exp
group: blackbox-exp
notify:
- restart blackbox exporter
- name: Install libcap on Debian systems
package:
name: "libcap2-bin"
state: present
register: _download_packages
until: _download_packages is succeeded
retries: 5
delay: 2
when: ansible_os_family | lower == "debian"
- name: Ensure blackbox exporter binary has cap_net_raw capability
capabilities:
path: '/usr/local/bin/blackbox_exporter'
capability: cap_net_raw+ep
state: present
when: not ansible_check_mode
- name: Check Debug Message
debug:
msg: "The capabilities module is skipped during check mode, as the file may not exist, causing execution to fail."
when: ansible_check_mode

View file

@ -0,0 +1,26 @@
---
- include: preflight.yml
tags:
- blackbox_exporter_install
- blackbox_exporter_configure
- blackbox_exporter_run
- include: install.yml
become: true
tags:
- blackbox_exporter_install
- include: configure.yml
become: true
tags:
- blackbox_exporter_configure
- name: ensure blackbox_exporter service is started and enabled
become: true
systemd:
daemon_reload: true
name: blackbox_exporter
state: started
enabled: true
tags:
- blackbox_exporter_run

View file

@ -0,0 +1,22 @@
---
- name: Assert usage of systemd as an init system
assert:
that: ansible_service_mgr == 'systemd'
msg: "This role only works with systemd"
- name: Get systemd version
command: systemctl --version
changed_when: false
check_mode: false
register: __systemd_version
tags:
- skip_ansible_lint
- name: Set systemd version fact
set_fact:
blackbox_exporter_systemd_version: "{{ __systemd_version.stdout_lines[0] | regex_replace('^systemd\\s(\\d+).*$', '\\1') }}"
- name: Naive assertion of proper listen address
assert:
that:
- "':' in blackbox_exporter_web_listen_address"

View file

@ -0,0 +1,45 @@
{{ ansible_managed | comment }}
[Unit]
Description=Blackbox Exporter
After=network-online.target
StartLimitInterval=0
StartLimitIntervalSec=0
[Service]
Type=simple
User=blackbox-exp
Group=blackbox-exp
PermissionsStartOnly=true
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/blackbox_exporter \
--config.file=/etc/blackbox_exporter.yml \
{% for flag, flag_value in blackbox_exporter_cli_flags.items() -%}
--{{ flag }}={{ flag_value }} \
{% endfor -%}
--web.listen-address={{ blackbox_exporter_web_listen_address }}
SyslogIdentifier=blackbox_exporter
KillMode=process
Restart=always
RestartSec=5
LockPersonality=true
NoNewPrivileges=true
MemoryDenyWriteExecute=true
PrivateTmp=true
ProtectHome=true
RemoveIPC=true
RestrictSUIDSGID=true
AmbientCapabilities=CAP_NET_RAW
{% if blackbox_exporter_systemd_version | int >= 232 %}
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes
ProtectSystem=strict
{% else %}
ProtectSystem=full
{% endif %}
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,2 @@
modules:
{{ blackbox_exporter_configuration_modules | to_nice_yaml(indent=2) | indent(2,False) }}

View file

@ -0,0 +1,7 @@
---
go_arch_map:
i386: '386'
x86_64: 'amd64'
aarch64: 'arm64'
armv7l: 'armv7'
armv6l: 'armv6'

View file

View file

@ -0,0 +1,99 @@
<p><img src="https://www.circonus.com/wp-content/uploads/2015/03/sol-icon-itOps.png" alt="graph logo" title="graph" align="right" height="60" /></p>
# Ansible Role: node exporter
## Warning
Due to limitations of galaxy.ansible.com we had to move the role to https://galaxy.ansible.com/cloudalchemy/node_exporter and use `_` instead of `-` in role name. This is a breaking change and unfortunately, it affects all versions of node_exporter role as ansible galaxy doesn't offer any form of redirection. We are sorry for the inconvenience.
## Description
Deploy prometheus [node exporter](https://github.com/prometheus/node_exporter) using ansible.
## Requirements
- Ansible >= 2.7 (It might work on previous versions, but we cannot guarantee it)
- gnu-tar on Mac deployer host (`brew install gnu-tar`)
- Passlib is required when using the basic authentication feature (`pip install passlib[bcrypt]`)
## Role Variables
All variables which can be overridden are stored in [defaults/main.yml](defaults/main.yml) and are listed in the table below.
| Name | Default Value | Description |
| -------------- | ------------- | -----------------------------------|
| `node_exporter_version` | 1.1.2 | Node exporter package version. Also accepts latest as parameter. |
| `node_exporter_binary_local_dir` | "" | Enables the use of local packages instead of those distributed on github. The parameter may be set to a directory where the `node_exporter` binary is stored on the host where ansible is run. This overrides the `node_exporter_version` parameter |
| `node_exporter_web_listen_address` | "0.0.0.0:9100" | Address on which node exporter will listen |
| `node_exporter_web_telemetry_path` | "/metrics" | Path under which to expose metrics |
| `node_exporter_enabled_collectors` | ```["systemd",{textfile: {directory: "{{node_exporter_textfile_dir}}"}}]``` | List of dicts defining additionally enabled collectors and their configuration. It adds collectors to [those enabled by default](https://github.com/prometheus/node_exporter#enabled-by-default). |
| `node_exporter_disabled_collectors` | [] | List of disabled collectors. By default node_exporter disables collectors listed [here](https://github.com/prometheus/node_exporter#disabled-by-default). |
| `node_exporter_textfile_dir` | "/var/lib/node_exporter" | Directory used by the [Textfile Collector](https://github.com/prometheus/node_exporter#textfile-collector). To get permissions to write metrics in this directory, users must be in `node-exp` system group. __Note__: More information in TROUBLESHOOTING.md guide.
| `node_exporter_tls_server_config` | {} | Configuration for TLS authentication. Keys and values are the same as in [node_exporter docs](https://github.com/prometheus/node_exporter/blob/master/https/README.md#sample-config). |
| `node_exporter_http_server_config` | {} | Config for HTTP/2 support. Keys and values are the same as in [node_exporter docs](https://github.com/prometheus/node_exporter/blob/master/https/README.md#sample-config). |
| `node_exporter_basic_auth_users` | {} | Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt. |
## Example
### Playbook
Use it in a playbook as follows:
```yaml
- hosts: all
roles:
- cloudalchemy.node_exporter
```
### TLS config
Before running node_exporter role, the user needs to provision their own certificate and key.
```yaml
- hosts: all
pre_tasks:
- name: Create node_exporter cert dir
file:
path: "/etc/node_exporter"
state: directory
owner: root
group: root
- name: Create cert and key
openssl_certificate:
path: /etc/node_exporter/tls.cert
csr_path: /etc/node_exporter/tls.csr
privatekey_path: /etc/node_exporter/tls.key
provider: selfsigned
roles:
- cloudalchemy.node_exporter
vars:
node_exporter_tls_server_config:
cert_file: /etc/node_exporter/tls.cert
key_file: /etc/node_exporter/tls.key
node_exporter_basic_auth_users:
randomuser: examplepassword
```
### Demo site
We provide an example site that demonstrates a full monitoring solution based on prometheus and grafana. The repository with code and links to running instances is [available on github](https://github.com/prometheus/demo-site) and the site is hosted on [DigitalOcean](https://digitalocean.com).
## Local Testing
The preferred way of locally testing the role is to use Docker and [molecule](https://github.com/ansible-community/molecule) (v3.x). You will have to install Docker on your system. See "Get started" for a Docker package suitable for your system. Running your tests is as simple as executing `molecule test`.
## Continuous Integration
Combining molecule and circle CI allows us to test how new PRs will behave when used with multiple ansible versions and multiple operating systems. This also allows use to create test scenarios for different role configurations. As a result we have quite a large test matrix which can take more time than local testing, so please be patient.
## Contributing
See [contributor guideline](CONTRIBUTING.md).
## Troubleshooting
See [troubleshooting](TROUBLESHOOTING.md).
## License
This project is licensed under MIT License. See [LICENSE](/LICENSE) for more details.

View file

@ -0,0 +1,43 @@
# Troubleshooting
## Bad requests (HTTP 400)
This role downloads checksums from the Github project to verify the integrity of artifacts installed on your servers. When downloading the checksums, a "bad request" error might occur.
This happens in environments which (knowningly or unknowling) use the [netrc mechanism](https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html) to auto-login into servers.
Unless netrc is needed by your playbook and ansible roles, please unset the var like so:
```
$ NETRC= ansible-playbook ...
```
Or:
```
$ export NETRC=
$ ansible-playbook ...
```
## node_exporter doesn't report data from textfile collector
There are 3 potential issues why node_exporter doesn't pick up data:
1. Duplicated metrics across multiple files.
2. File is not readable by node_exporter process.
3. Textfile collector is not enabled.
Solving first possibility is out of scope of the role as data is created somewhere else. When creating that data ensure
files are readable by `node-exp` user. To get access to the directory with files your process needs to be in `node-exp`
group.
Lastly ansible role misconfiguration can also lead to data not being picked up. Check if `node_exporter` textfile
collector is enabled in `node_exporter_enabled_collectors` as follows:
```yaml
node_exporter_enabled_collectors:
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
```
__note___: `node_exporter_textfile_dir` variable is only responsible for creating a directory not enabling a collector.

View file

@ -0,0 +1,28 @@
---
node_exporter_version: 1.1.2
node_exporter_binary_local_dir: ""
node_exporter_web_listen_address: "0.0.0.0:9100"
node_exporter_web_telemetry_path: "/metrics"
node_exporter_textfile_dir: "/var/lib/node_exporter"
node_exporter_tls_server_config: {}
node_exporter_http_server_config: {}
node_exporter_basic_auth_users: {}
node_exporter_enabled_collectors:
- systemd
- textfile:
directory: "{{ node_exporter_textfile_dir }}"
# - filesystem:
# ignored-mount-points: "^/(sys|proc|dev)($|/)"
# ignored-fs-types: "^(sys|proc|auto)fs$"
node_exporter_disabled_collectors: []
# Internal variables.
_node_exporter_binary_install_dir: "/usr/local/bin"
_node_exporter_system_group: "node-exp"
_node_exporter_system_user: "{{ _node_exporter_system_group }}"

View file

@ -0,0 +1,9 @@
---
- name: restart node_exporter
become: true
systemd:
daemon_reload: true
name: node_exporter
state: restarted
when:
- not ansible_check_mode

View file

@ -0,0 +1,32 @@
---
galaxy_info:
author: Prometheus Community
description: Prometheus Node Exporter
license: Apache
company: none
min_ansible_version: "2.7"
platforms:
- name: Ubuntu
versions:
- bionic
- xenial
- name: Debian
versions:
- stretch
- buster
- name: EL
versions:
- 7
- 8
- name: Fedora
versions:
- 30
- 31
galaxy_tags:
- monitoring
- prometheus
- exporter
- metrics
- system
dependencies: []

View file

@ -0,0 +1,70 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,38 @@
---
- name: Run role
hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.node_exporter
pre_tasks:
- name: Create node_exporter cert dir
file:
path: "{{ node_exporter_tls_server_config.cert_file | dirname }}"
state: directory
owner: root
group: root
- name: Copy cert and key
copy:
src: "{{ item.src }}"
dest: "{{ item.dest }}"
with_items:
- src: "/tmp/tls.cert"
dest: "{{ node_exporter_tls_server_config.cert_file }}"
- src: "/tmp/tls.key"
dest: "{{ node_exporter_tls_server_config.key_file }}"
vars:
node_exporter_binary_local_dir: "/tmp/node_exporter-linux-amd64"
node_exporter_web_listen_address: "127.0.0.1:8080"
node_exporter_textfile_dir: ""
node_exporter_enabled_collectors:
- entropy
node_exporter_disabled_collectors:
- diskstats
node_exporter_tls_server_config:
cert_file: /etc/node_exporter/tls.cert
key_file: /etc/node_exporter/tls.key
node_exporter_http_server_config:
http2: true
node_exporter_basic_auth_users:
randomuser: examplepassword

View file

@ -0,0 +1,57 @@
---
- name: Prepare
hosts: localhost
gather_facts: false
vars:
go_arch: amd64
node_exporter_version: 1.0.0
tasks:
- name: Download node_exporter binary to local folder
become: false
get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
register: _download_binary
until: _download_binary is succeeded
retries: 5
delay: 2
run_once: true
check_mode: false
- name: Unpack node_exporter binary
become: false
unarchive:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp"
creates: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}/node_exporter"
run_once: true
check_mode: false
- name: link to node_exporter binaries directory
become: false
file:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-amd64"
dest: "/tmp/node_exporter-linux-amd64"
state: link
run_once: true
check_mode: false
- name: install pyOpenSSL for certificate generation
pip:
name: "pyOpenSSL"
- name: Create private key
openssl_privatekey:
path: "/tmp/tls.key"
- name: Create CSR
openssl_csr:
path: "/tmp/tls.csr"
privatekey_path: "/tmp/tls.key"
- name: Create certificate
openssl_certificate:
path: "/tmp/tls.cert"
csr_path: "/tmp/tls.csr"
privatekey_path: "/tmp/tls.key"
provider: selfsigned

View file

@ -0,0 +1,29 @@
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
def test_directories(host):
dirs = [
"/var/lib/node_exporter"
]
for dir in dirs:
d = host.file(dir)
assert not d.exists
def test_service(host):
s = host.service("node_exporter")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
sockets = [
"tcp://127.0.0.1:8080"
]
for socket in sockets:
s = host.socket(socket)
assert s.is_listening

View file

@ -0,0 +1,70 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,7 @@
---
- hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.node_exporter
vars:
node_exporter_web_listen_address: "127.0.0.1:9100"

View file

@ -0,0 +1,5 @@
---
- name: Prepare
hosts: all
gather_facts: false
tasks: []

View file

@ -0,0 +1,63 @@
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
def test_directories(host):
dirs = [
"/var/lib/node_exporter"
]
for dir in dirs:
d = host.file(dir)
assert d.is_directory
assert d.exists
def test_files(host):
files = [
"/etc/systemd/system/node_exporter.service",
"/usr/local/bin/node_exporter"
]
for file in files:
f = host.file(file)
assert f.exists
assert f.is_file
def test_permissions_didnt_change(host):
dirs = [
"/etc",
"/root",
"/usr",
"/var"
]
for file in dirs:
f = host.file(file)
assert f.exists
assert f.is_directory
assert f.user == "root"
assert f.group == "root"
def test_user(host):
assert host.group("node-exp").exists
assert "node-exp" in host.user("node-exp").groups
assert host.user("node-exp").shell == "/usr/sbin/nologin"
assert host.user("node-exp").home == "/"
def test_service(host):
s = host.service("node_exporter")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
sockets = [
"tcp://127.0.0.1:9100"
]
for socket in sockets:
s = host.socket(socket)
assert s.is_listening

View file

@ -0,0 +1,35 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
create: ../default/create.yml
prepare: ../default/prepare.yml
converge: playbook.yml
destroy: ../default/destroy.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,8 @@
---
- name: Run role
hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.node_exporter
vars:
node_exporter_version: latest

View file

@ -0,0 +1,27 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("files", [
"/etc/systemd/system/node_exporter.service",
"/usr/local/bin/node_exporter"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
def test_service(host):
s = host.service("node_exporter")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
s = host.socket("tcp://0.0.0.0:9100")
assert s.is_listening

View file

@ -0,0 +1,51 @@
---
- name: Copy the node_exporter systemd service file
template:
src: node_exporter.service.j2
dest: /etc/systemd/system/node_exporter.service
owner: root
group: root
mode: 0644
notify: restart node_exporter
- block:
- name: Create node_exporter config directory
file:
path: "/etc/node_exporter"
state: directory
owner: root
group: root
mode: u+rwX,g+rwX,o=rX
- name: Copy the node_exporter config file
template:
src: config.yaml.j2
dest: /etc/node_exporter/config.yaml
owner: root
group: root
mode: 0644
notify: restart node_exporter
when:
( node_exporter_tls_server_config | length > 0 ) or
( node_exporter_http_server_config | length > 0 ) or
( node_exporter_basic_auth_users | length > 0 )
- name: Create textfile collector dir
file:
path: "{{ node_exporter_textfile_dir }}"
state: directory
owner: "{{ _node_exporter_system_user }}"
group: "{{ _node_exporter_system_group }}"
recurse: true
mode: u+rwX,g+rwX,o=rX
when: node_exporter_textfile_dir | length > 0
- name: Allow node_exporter port in SELinux on RedHat OS family
seport:
ports: "{{ node_exporter_web_listen_address.split(':')[-1] }}"
proto: tcp
setype: http_port_t
state: present
when:
- ansible_version.full is version_compare('2.4', '>=')
- ansible_selinux.status == "enabled"

View file

@ -0,0 +1,63 @@
---
- name: Create the node_exporter group
group:
name: "{{ _node_exporter_system_group }}"
state: present
system: true
when: _node_exporter_system_group != "root"
- name: Create the node_exporter user
user:
name: "{{ _node_exporter_system_user }}"
groups: "{{ _node_exporter_system_group }}"
append: true
shell: /usr/sbin/nologin
system: true
create_home: false
home: /
when: _node_exporter_system_user != "root"
- block:
- name: Download node_exporter binary to local folder
become: false
get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
checksum: "sha256:{{ node_exporter_checksum }}"
mode: '0644'
register: _download_binary
until: _download_binary is succeeded
retries: 5
delay: 2
delegate_to: localhost
check_mode: false
- name: Unpack node_exporter binary
become: false
unarchive:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp"
creates: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}/node_exporter"
delegate_to: localhost
check_mode: false
- name: Propagate node_exporter binaries
copy:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}/node_exporter"
dest: "{{ _node_exporter_binary_install_dir }}/node_exporter"
mode: 0755
owner: root
group: root
notify: restart node_exporter
when: not ansible_check_mode
when: node_exporter_binary_local_dir | length == 0
- name: propagate locally distributed node_exporter binary
copy:
src: "{{ node_exporter_binary_local_dir }}/node_exporter"
dest: "{{ _node_exporter_binary_install_dir }}/node_exporter"
mode: 0755
owner: root
group: root
when: node_exporter_binary_local_dir | length > 0
notify: restart node_exporter

View file

@ -0,0 +1,39 @@
---
- import_tasks: preflight.yml
tags:
- node_exporter_install
- node_exporter_configure
- node_exporter_run
- import_tasks: install.yml
become: true
when:
( not __node_exporter_is_installed.stat.exists ) or
( (__node_exporter_current_version_output.stderr_lines | length > 0) and (__node_exporter_current_version_output.stderr_lines[0].split(" ")[2] != node_exporter_version) ) or
( (__node_exporter_current_version_output.stdout_lines | length > 0) and (__node_exporter_current_version_output.stdout_lines[0].split(" ")[2] != node_exporter_version) ) or
( node_exporter_binary_local_dir | length > 0 )
tags:
- node_exporter_install
- import_tasks: selinux.yml
become: true
when: ansible_selinux.status == "enabled"
tags:
- node_exporter_configure
- import_tasks: configure.yml
become: true
tags:
- node_exporter_configure
- name: Ensure Node Exporter is enabled on boot
become: true
systemd:
daemon_reload: true
name: node_exporter
enabled: true
state: started
when:
- not ansible_check_mode
tags:
- node_exporter_run

View file

@ -0,0 +1,111 @@
---
- name: Assert usage of systemd as an init system
assert:
that: ansible_service_mgr == 'systemd'
msg: "This role only works with systemd"
- name: Get systemd version
command: systemctl --version
changed_when: false
check_mode: false
register: __systemd_version
tags:
- skip_ansible_lint
- name: Set systemd version fact
set_fact:
node_exporter_systemd_version: "{{ __systemd_version.stdout_lines[0] | regex_replace('^systemd\\s(\\d+).*$', '\\1') }}"
- name: Naive assertion of proper listen address
assert:
that:
- "':' in node_exporter_web_listen_address"
- name: Assert collectors are not both disabled and enabled at the same time
assert:
that:
- "item not in node_exporter_enabled_collectors"
with_items: "{{ node_exporter_disabled_collectors }}"
- block:
- name: Assert that TLS key and cert path are set
assert:
that:
- "node_exporter_tls_server_config.cert_file is defined"
- "node_exporter_tls_server_config.key_file is defined"
- name: Check existence of TLS cert file
stat:
path: "{{ node_exporter_tls_server_config.cert_file }}"
register: __node_exporter_cert_file
- name: Check existence of TLS key file
stat:
path: "{{ node_exporter_tls_server_config.key_file }}"
register: __node_exporter_key_file
- name: Assert that TLS key and cert are present
assert:
that:
- "{{ __node_exporter_cert_file.stat.exists }}"
- "{{ __node_exporter_key_file.stat.exists }}"
when: node_exporter_tls_server_config | length > 0
- name: Check if node_exporter is installed
stat:
path: "{{ _node_exporter_binary_install_dir }}/node_exporter"
register: __node_exporter_is_installed
check_mode: false
tags:
- node_exporter_install
- name: Gather currently installed node_exporter version (if any)
command: "{{ _node_exporter_binary_install_dir }}/node_exporter --version"
args:
warn: false
changed_when: false
register: __node_exporter_current_version_output
check_mode: false
when: __node_exporter_is_installed.stat.exists
tags:
- node_exporter_install
- skip_ansible_lint
- block:
- name: Get latest release
uri:
url: "https://api.github.com/repos/prometheus/node_exporter/releases/latest"
method: GET
return_content: true
status_code: 200
body_format: json
user: "{{ lookup('env', 'GH_USER') | default(omit) }}"
password: "{{ lookup('env', 'GH_TOKEN') | default(omit) }}"
no_log: "{{ not lookup('env', 'MOLECULE_DEBUG') | bool }}"
register: _latest_release
until: _latest_release.status == 200
retries: 5
- name: "Set node_exporter version to {{ _latest_release.json.tag_name[1:] }}"
set_fact:
node_exporter_version: "{{ _latest_release.json.tag_name[1:] }}"
when:
- node_exporter_version == "latest"
- node_exporter_binary_local_dir | length == 0
delegate_to: localhost
run_once: true
- block:
- name: Get checksum list from github
set_fact:
_checksums: "{{ lookup('url', 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt', wantlist=True) | list }}"
run_once: true
- name: "Get checksum for {{ go_arch }} architecture"
set_fact:
node_exporter_checksum: "{{ item.split(' ')[0] }}"
with_items: "{{ _checksums }}"
when:
- "('linux-' + go_arch + '.tar.gz') in item"
delegate_to: localhost
when: node_exporter_binary_local_dir | length == 0

View file

@ -0,0 +1,39 @@
---
- name: Install selinux python packages [RHEL]
package:
name:
- "{{ ( (ansible_facts.distribution_major_version | int) < 8) | ternary('libselinux-python','python3-libselinux') }}"
- "{{ ( (ansible_facts.distribution_major_version | int) < 8) | ternary('policycoreutils-python','python3-policycoreutils') }}"
state: present
register: _install_selinux_packages
until: _install_selinux_packages is success
retries: 5
delay: 2
when:
- (ansible_distribution | lower == "redhat") or
(ansible_distribution | lower == "centos")
- name: Install selinux python packages [Fedora]
package:
name:
- "{{ ( (ansible_facts.distribution_major_version | int) < 29) | ternary('libselinux-python','python3-libselinux') }}"
- "{{ ( (ansible_facts.distribution_major_version | int) < 29) | ternary('policycoreutils-python','python3-policycoreutils') }}"
state: present
register: _install_selinux_packages
until: _install_selinux_packages is success
retries: 5
delay: 2
when:
- ansible_distribution | lower == "fedora"
- name: Install selinux python packages [clearlinux]
package:
name: sysadmin-basic
state: present
register: _install_selinux_packages
until: _install_selinux_packages is success
retries: 5
delay: 2
when:
- ansible_distribution | lower == "clearlinux"

View file

@ -0,0 +1,18 @@
---
{{ ansible_managed | comment }}
{% if node_exporter_tls_server_config | length > 0 %}
tls_server_config:
{{ node_exporter_tls_server_config | to_nice_yaml | indent(2, true) }}
{% endif %}
{% if node_exporter_http_server_config | length > 0 %}
http_server_config:
{{ node_exporter_http_server_config | to_nice_yaml | indent(2, true) }}
{% endif %}
{% if node_exporter_basic_auth_users | length > 0 %}
basic_auth_users:
{% for k, v in node_exporter_basic_auth_users.items() %}
{{ k }}: {{ v | password_hash('bcrypt', ('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890' | shuffle(seed=inventory_hostname) | join)[:22], rounds=9) }}
{% endfor %}
{% endif %}

View file

@ -0,0 +1,54 @@
{{ ansible_managed | comment }}
[Unit]
Description=Prometheus Node Exporter
After=network-online.target
[Service]
Type=simple
User={{ _node_exporter_system_user }}
Group={{ _node_exporter_system_group }}
ExecStart={{ _node_exporter_binary_install_dir }}/node_exporter \
{% for collector in node_exporter_enabled_collectors -%}
{% if not collector is mapping %}
--collector.{{ collector }} \
{% else -%}
{% set name, options = (collector.items()|list)[0] -%}
--collector.{{ name }} \
{% for k,v in options|dictsort %}
--collector.{{ name }}.{{ k }}={{ v | quote }} \
{% endfor -%}
{% endif -%}
{% endfor -%}
{% for collector in node_exporter_disabled_collectors %}
--no-collector.{{ collector }} \
{% endfor %}
{% if node_exporter_tls_server_config | length > 0 or node_exporter_http_server_config | length > 0 or node_exporter_basic_auth_users | length > 0 %}
--web.config=/etc/node_exporter/config.yaml \
{% endif %}
--web.listen-address={{ node_exporter_web_listen_address }} \
--web.telemetry-path={{ node_exporter_web_telemetry_path }}
SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0
{% for m in ansible_mounts if m.mount == '/home' %}
ProtectHome=read-only
{% else %}
ProtectHome=yes
{% endfor %}
NoNewPrivileges=yes
{% if node_exporter_systemd_version | int >= 232 %}
ProtectSystem=strict
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes
{% else %}
ProtectSystem=full
{% endif %}
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,9 @@
---
go_arch_map:
i386: '386'
x86_64: 'amd64'
aarch64: 'arm64'
armv7l: 'armv7'
armv6l: 'armv6'
go_arch: "{{ go_arch_map[ansible_architecture] | default(ansible_architecture) }}"

153
roles/prometheus/README.md Normal file
View file

@ -0,0 +1,153 @@
<p><img src="https://cdn.worldvectorlogo.com/logos/prometheus.svg" alt="prometheus logo" title="prometheus" align="right" height="60" /></p>
# Ansible Role: prometheus
## Description
Deploy [Prometheus](https://github.com/prometheus/prometheus) monitoring system using ansible.
### Upgradability notice
When upgrading from <= 2.4.0 version of this role to >= 2.4.1 please turn off your prometheus instance. More in [2.4.1 release notes](https://github.com/cloudalchemy/ansible-prometheus/releases/tag/2.4.1)
## Requirements
- Ansible >= 2.7 (It might work on previous versions, but we cannot guarantee it)
- jmespath on deployer machine. If you are using Ansible from a Python virtualenv, install *jmespath* to the same virtualenv via pip.
- gnu-tar on Mac deployer host (`brew install gnu-tar`)
## Role Variables
All variables which can be overridden are stored in [defaults/main.yml](defaults/main.yml) file as well as in table below.
| Name | Default Value | Description |
| -------------- | ------------- | -----------------------------------|
| `prometheus_version` | 2.27.0 | Prometheus package version. Also accepts `latest` as parameter. Only prometheus 2.x is supported |
| `prometheus_skip_install` | false | Prometheus installation tasks gets skipped when set to true. |
| `prometheus_binary_local_dir` | "" | Allows to use local packages instead of ones distributed on github. As parameter it takes a directory where `prometheus` AND `promtool` binaries are stored on host on which ansible is ran. This overrides `prometheus_version` parameter |
| `prometheus_config_dir` | /etc/prometheus | Path to directory with prometheus configuration |
| `prometheus_db_dir` | /var/lib/prometheus | Path to directory with prometheus database |
| `prometheus_read_only_dirs`| [] | Additional paths that Prometheus is allowed to read (useful for SSL certs outside of the config directory) |
| `prometheus_web_listen_address` | "0.0.0.0:9090" | Address on which prometheus will be listening |
| `prometheus_web_config` | {} | A Prometheus [web config yaml](https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md) for configuring TLS and auth. |
| `prometheus_web_external_url` | "" | External address on which prometheus is available. Useful when behind reverse proxy. Ex. `http://example.org/prometheus` |
| `prometheus_storage_retention` | "30d" | Data retention period |
| `prometheus_storage_retention_size` | "0" | Data retention period by size |
| `prometheus_config_flags_extra` | {} | Additional configuration flags passed to prometheus binary at startup |
| `prometheus_alertmanager_config` | [] | Configuration responsible for pointing where alertmanagers are. This should be specified as list in yaml format. It is compatible with official [<alertmanager_config>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alertmanager_config) |
| `prometheus_alert_relabel_configs` | [] | Alert relabeling rules. This should be specified as list in yaml format. It is compatible with the official [<alert_relabel_configs>](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alert_relabel_configs) |
| `prometheus_global` | { scrape_interval: 60s, scrape_timeout: 15s, evaluation_interval: 15s } | Prometheus global config. Compatible with [official configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file) |
| `prometheus_remote_write` | [] | Remote write. Compatible with [official configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<remote_write>) |
| `prometheus_remote_read` | [] | Remote read. Compatible with [official configuration](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<remote_read>) |
| `prometheus_external_labels` | environment: "{{ ansible_fqdn \| default(ansible_host) \| default(inventory_hostname) }}" | Provide map of additional labels which will be added to any time series or alerts when communicating with external systems |
| `prometheus_targets` | {} | Targets which will be scraped. Better example is provided in our [demo site](https://github.com/cloudalchemy/demo-site/blob/2a8a56fc10ce613d8b08dc8623230dace6704f9a/group_vars/all/vars#L8) |
| `prometheus_scrape_configs` | [defaults/main.yml#L58](https://github.com/cloudalchemy/ansible-prometheus/blob/ff7830d06ba57be1177f2b6fca33a4dd2d97dc20/defaults/main.yml#L47) | Prometheus scrape jobs provided in same format as in [official docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) |
| `prometheus_config_file` | "prometheus.yml.j2" | Variable used to provide custom prometheus configuration file in form of ansible template |
| `prometheus_alert_rules` | [defaults/main.yml#L81](https://github.com/cloudalchemy/ansible-prometheus/blob/73d6df05a775ee5b736ac8f28d5605f2a975d50a/defaults/main.yml#L85) | Full list of alerting rules which will be copied to `{{ prometheus_config_dir }}/rules/ansible_managed.rules`. Alerting rules can be also provided by other files located in `{{ prometheus_config_dir }}/rules/` which have `*.rules` extension |
| `prometheus_alert_rules_files` | [defaults/main.yml#L78](https://github.com/cloudalchemy/ansible-prometheus/blob/73d6df05a775ee5b736ac8f28d5605f2a975d50a/defaults/main.yml#L78) | List of folders where ansible will look for files containing alerting rules which will be copied to `{{ prometheus_config_dir }}/rules/`. Files must have `*.rules` extension |
| `prometheus_static_targets_files` | [defaults/main.yml#L78](https://github.com/cloudalchemy/ansible-prometheus/blob/73d6df05a775ee5b736ac8f28d5605f2a975d50a/defaults/main.yml#L81) | List of folders where ansible will look for files containing custom static target configuration files which will be copied to `{{ prometheus_config_dir }}/file_sd/`. |
### Relation between `prometheus_scrape_configs` and `prometheus_targets`
#### Short version
`prometheus_targets` is just a map used to create multiple files located in "{{ prometheus_config_dir }}/file_sd" directory. Where file names are composed from top-level keys in that map with `.yml` suffix. Those files store [file_sd scrape targets data](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#file_sd_config) and they need to be read in `prometheus_scrape_configs`.
#### Long version
A part of *prometheus.yml* configuration file which describes what is scraped by prometheus is stored in `prometheus_scrape_configs`. For this variable same configuration options as described in [prometheus docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<scrape_config>) are used.
Meanwhile `prometheus_targets` is our way of adopting [prometheus scrape type `file_sd`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<file_sd_config>). It defines a map of files with their content. A top-level keys are base names of files which need to have their own scrape job in `prometheus_scrape_configs` and values are a content of those files.
All this mean that you CAN use custom `prometheus_scrape_configs` with `prometheus_targets` set to `{}`. However when you set anything in `prometheus_targets` it needs to be mapped to `prometheus_scrape_configs`. If it isn't you'll get an error in preflight checks.
#### Example
Lets look at our default configuration, which shows all features. By default we have this `prometheus_targets`:
```
prometheus_targets:
node: # This is a base file name. File is located in "{{ prometheus_config_dir }}/file_sd/<<BASENAME>>.yml"
- targets: #
- localhost:9100 # All this is a targets section in file_sd format
labels: #
env: test #
```
Such config will result in creating one file named `node.yml` in `{{ prometheus_config_dir }}/file_sd` directory.
Next this file needs to be loaded into scrape config. Here is modified version of our default `prometheus_scrape_configs`:
```
prometheus_scrape_configs:
- job_name: "prometheus" # Custom scrape job, here using `static_config`
metrics_path: "/metrics"
static_configs:
- targets:
- "localhost:9090"
- job_name: "example-node-file-servicediscovery"
file_sd_configs:
- files:
- "{{ prometheus_config_dir }}/file_sd/node.yml" # This line loads file created from `prometheus_targets`
```
## Example
### Playbook
```yaml
---
- hosts: all
roles:
- cloudalchemy.prometheus
vars:
prometheus_targets:
node:
- targets:
- localhost:9100
- demo.cloudalchemy.org:9100
labels:
env: demosite
```
### Demo site
Prometheus organization provide a demo site for full monitoring solution based on prometheus and grafana. Repository with code and links to running instances is [available on github](https://github.com/prometheus/demo-site).
### Defining alerting rules files
Alerting rules are defined in `prometheus_alert_rules` variable. Format is almost identical to one defined in[ Prometheus 2.0 documentation](https://prometheus.io/docs/prometheus/latest/configuration/template_examples/).
Due to similarities in templating engines, every templates should be wrapped in `{% raw %}` and `{% endraw %}` statements. Example is provided in [defaults/main.yml](defaults/main.yml) file.
## Local Testing
The preferred way of locally testing the role is to use Docker and [molecule](https://github.com/metacloud/molecule) (v2.x). You will have to install Docker on your system. See "Get started" for a Docker package suitable to for your system.
We are using tox to simplify process of testing on multiple ansible versions. To install tox execute:
```sh
pip3 install tox
```
To run tests on all ansible versions (WARNING: this can take some time)
```sh
tox
```
To run a custom molecule command on custom environment with only default test scenario:
```sh
tox -e py35-ansible28 -- molecule test -s default
```
For more information about molecule go to their [docs](http://molecule.readthedocs.io/en/latest/).
If you would like to run tests on remote docker host just specify `DOCKER_HOST` variable before running tox tests.
## CircleCI
Combining molecule and CircleCI allows us to test how new PRs will behave when used with multiple ansible versions and multiple operating systems. This also allows use to create test scenarios for different role configurations. As a result we have a quite large test matrix which will take more time than local testing, so please be patient.
## Contributing
See [contributor guideline](CONTRIBUTING.md).
## Troubleshooting
See [troubleshooting](TROUBLESHOOTING.md).
## License
This project is licensed under MIT License. See [LICENSE](/LICENSE) for more details.

View file

@ -0,0 +1,25 @@
#!/usr/bin/env bash
#
# Description: Generate the next release version
set -uo pipefail
latest_tag="$(git semver)"
if [[ -z "${latest_tag}" ]]; then
echo "ERROR: Couldn't get latest tag from git semver, try 'pip install git-semver'" 2>&1
exit 1
fi
# Use HEAD if CIRCLE_SHA1 is not set.
now="${CIRCLE_SHA1:-HEAD}"
new_tag='none'
git_log="$(git log --format=%B "${latest_tag}..${now}")"
case "${git_log}" in
*"[major]"*|*"[breaking change]"* ) new_tag=$(git semver --next-major) ;;
*"[minor]"*|*"[feat]"*|*"[feature]"* ) new_tag=$(git semver --next-minor) ;;
*"[patch]"*|*"[fix]"*|*"[bugfix]"* ) new_tag=$(git semver --next-patch) ;;
esac
echo "NEW_TAG=${new_tag}"

View file

@ -0,0 +1,219 @@
---
prometheus_version: 2.27.0
prometheus_binary_local_dir: ''
prometheus_skip_install: false
prometheus_config_dir: /etc/prometheus
prometheus_db_dir: /var/lib/prometheus
prometheus_read_only_dirs: []
prometheus_web_listen_address: "0.0.0.0:9090"
prometheus_web_external_url: ''
# See https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md
prometheus_web_config:
tls_server_config: {}
http_server_config: {}
basic_auth_users: {}
prometheus_storage_retention: "30d"
# Available since Prometheus 2.7.0
# [EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. Units
# supported: KB, MB, GB, TB, PB.
prometheus_storage_retention_size: "0"
prometheus_config_flags_extra: {}
# prometheus_config_flags_extra:
# storage.tsdb.retention: 15d
# alertmanager.timeout: 10s
prometheus_alertmanager_config: []
# prometheus_alertmanager_config:
# - scheme: https
# path_prefix: alertmanager/
# basic_auth:
# username: user
# password: pass
# static_configs:
# - targets: ["127.0.0.1:9093"]
# proxy_url: "127.0.0.2"
prometheus_alert_relabel_configs: []
# prometheus_alert_relabel_configs:
# - action: labeldrop
# regex: replica
prometheus_global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 15s
prometheus_remote_write: []
# prometheus_remote_write:
# - url: https://dev.kausal.co/prom/push
# basic_auth:
# password: FOO
prometheus_remote_read: []
# prometheus_remote_read:
# - url: https://demo.cloudalchemy.org:9201/read
# basic_auth:
# password: FOO
prometheus_external_labels:
environment: "{{ ansible_fqdn | default(ansible_host) | default(inventory_hostname) }}"
prometheus_targets: {}
# node:
# - targets:
# - localhost:9100
# labels:
# env: test
prometheus_scrape_configs:
- job_name: "prometheus"
metrics_path: "{{ prometheus_metrics_path }}"
static_configs:
- targets:
- "{{ ansible_fqdn | default(ansible_host) | default('localhost') }}:9090"
- job_name: "node"
file_sd_configs:
- files:
- "{{ prometheus_config_dir }}/file_sd/node.yml"
# Alternative config file name, searched in ansible templates path.
prometheus_config_file: 'prometheus.yml.j2'
prometheus_alert_rules_files:
- prometheus/rules/*.rules
prometheus_static_targets_files:
- prometheus/targets/*.yml
- prometheus/targets/*.json
prometheus_alert_rules:
- alert: Watchdog
expr: vector(1)
for: 10m
labels:
severity: warning
annotations:
description: "This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n\"DeadMansSnitch\" integration in PagerDuty."
summary: 'Ensure entire alerting pipeline is functional'
- alert: InstanceDown
expr: 'up == 0'
for: 5m
labels:
severity: critical
annotations:
description: '{% raw %}{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.{% endraw %}'
summary: '{% raw %}Instance {{ $labels.instance }} down{% endraw %}'
- alert: RebootRequired
expr: 'node_reboot_required > 0'
labels:
severity: warning
annotations:
description: '{% raw %}{{ $labels.instance }} requires a reboot.{% endraw %}'
summary: '{% raw %}Instance {{ $labels.instance }} - reboot required{% endraw %}'
- alert: NodeFilesystemSpaceFillingUp
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left and is filling up.{% endraw %}'
summary: 'Filesystem is predicted to run out of space within the next 24 hours.'
expr: "(\n node_filesystem_avail_bytes{job=\"node\",fstype!=\"\"} / node_filesystem_size_bytes{job=\"node\",fstype!=\"\"} * 100 < 40\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node\",fstype!=\"\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: warning
- alert: NodeFilesystemSpaceFillingUp
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left and is filling up fast.{% endraw %}'
summary: 'Filesystem is predicted to run out of space within the next 4 hours.'
expr: "(\n node_filesystem_avail_bytes{job=\"node\",fstype!=\"\"} / node_filesystem_size_bytes{job=\"node\",fstype!=\"\"} * 100 < 20\nand\n predict_linear(node_filesystem_avail_bytes{job=\"node\",fstype!=\"\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: critical
- alert: NodeFilesystemAlmostOutOfSpace
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left.{% endraw %}'
summary: 'Filesystem has less than 5% space left.'
expr: "(\n node_filesystem_avail_bytes{job=\"node\",fstype!=\"\"} / node_filesystem_size_bytes{job=\"node\",fstype!=\"\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: warning
- alert: NodeFilesystemAlmostOutOfSpace
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available space left.{% endraw %}'
summary: 'Filesystem has less than 3% space left.'
expr: "(\n node_filesystem_avail_bytes{job=\"node\",fstype!=\"\"} / node_filesystem_size_bytes{job=\"node\",fstype!=\"\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: critical
- alert: NodeFilesystemFilesFillingUp
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left and is filling up.{% endraw %}'
summary: 'Filesystem is predicted to run out of inodes within the next 24 hours.'
expr: "(\n node_filesystem_files_free{job=\"node\",fstype!=\"\"} / node_filesystem_files{job=\"node\",fstype!=\"\"} * 100 < 40\nand\n predict_linear(node_filesystem_files_free{job=\"node\",fstype!=\"\"}[6h], 24*60*60) < 0\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: warning
- alert: NodeFilesystemFilesFillingUp
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left and is filling up fast.{% endraw %}'
summary: 'Filesystem is predicted to run out of inodes within the next 4 hours.'
expr: "(\n node_filesystem_files_free{job=\"node\",fstype!=\"\"} / node_filesystem_files{job=\"node\",fstype!=\"\"} * 100 < 20\nand\n predict_linear(node_filesystem_files_free{job=\"node\",fstype!=\"\"}[6h], 4*60*60) < 0\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: critical
- alert: NodeFilesystemAlmostOutOfFiles
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left.{% endraw %}'
summary: 'Filesystem has less than 5% inodes left.'
expr: "(\n node_filesystem_files_free{job=\"node\",fstype!=\"\"} / node_filesystem_files{job=\"node\",fstype!=\"\"} * 100 < 5\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: warning
- alert: NodeFilesystemAlmostOutOfFiles
annotations:
description: '{% raw %}Filesystem on {{ $labels.device }} at {{ $labels.instance }} has only {{ printf "%.2f" $value }}% available inodes left.{% endraw %}'
summary: 'Filesystem has less than 3% inodes left.'
expr: "(\n node_filesystem_files_free{job=\"node\",fstype!=\"\"} / node_filesystem_files{job=\"node\",fstype!=\"\"} * 100 < 3\nand\n node_filesystem_readonly{job=\"node\",fstype!=\"\"} == 0\n)\n"
for: 1h
labels:
severity: critical
- alert: NodeNetworkReceiveErrs
annotations:
description: '{% raw %}{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} receive errors in the last two minutes.{% endraw %}'
summary: 'Network interface is reporting many receive errors.'
expr: "increase(node_network_receive_errs_total[2m]) > 10\n"
for: 1h
labels:
severity: warning
- alert: NodeNetworkTransmitErrs
annotations:
description: '{% raw %}{{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf "%.0f" $value }} transmit errors in the last two minutes.{% endraw %}'
summary: 'Network interface is reporting many transmit errors.'
expr: "increase(node_network_transmit_errs_total[2m]) > 10\n"
for: 1h
labels:
severity: warning
- alert: NodeHighNumberConntrackEntriesUsed
annotations:
description: '{% raw %}{{ $value | humanizePercentage }} of conntrack entries are used{% endraw %}'
summary: 'Number of conntrack are getting close to the limit'
expr: "(node_nf_conntrack_entries / node_nf_conntrack_entries_limit) > 0.75\n"
labels:
severity: warning
- alert: NodeClockSkewDetected
annotations:
message: '{% raw %}Clock on {{ $labels.instance }} is out of sync by more than 300s. Ensure NTP is configured correctly on this host.{% endraw %}'
summary: 'Clock skew detected.'
expr: "(\n node_timex_offset_seconds > 0.05\nand\n deriv(node_timex_offset_seconds[5m]) >= 0\n)\nor\n(\n node_timex_offset_seconds < -0.05\nand\n deriv(node_timex_offset_seconds[5m]) <= 0\n)\n"
for: 10m
labels:
severity: warning
- alert: NodeClockNotSynchronising
annotations:
message: '{% raw %}Clock on {{ $labels.instance }} is not synchronising. Ensure NTP is configured on this host.{% endraw %}'
summary: 'Clock not synchronising.'
expr: "min_over_time(node_timex_sync_status[5m]) == 0\n"
for: 10m
labels:
severity: warning

View file

@ -0,0 +1,13 @@
---
- name: restart prometheus
become: true
systemd:
daemon_reload: true
name: prometheus
state: restarted
- name: reload prometheus
become: true
systemd:
name: prometheus
state: reloaded

View file

@ -0,0 +1,34 @@
---
galaxy_info:
author: Prometheus Community
description: Prometheus monitoring system configuration and management
license: Apache
company: none
min_ansible_version: "2.7"
platforms:
- name: Ubuntu
versions:
- bionic
- xenial
- name: Debian
versions:
- stretch
- buster
- name: EL
versions:
- 7
- 8
- name: Fedora
versions:
- 30
- 31
galaxy_tags:
- monitoring
- prometheus
- metrics
- alerts
- alerting
- molecule
- cloud
dependencies: []

View file

@ -0,0 +1,70 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,89 @@
---
- name: Run role
hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.prometheus
vars:
prometheus_binary_local_dir: '/tmp/prometheus-linux-amd64'
prometheus_config_dir: /opt/prom/etc
prometheus_db_dir: /opt/prom/lib
prometheus_web_listen_address: "127.0.0.1:9090"
prometheus_web_external_url: "http://127.0.0.1:9090/prometheus"
prometheus_read_only_dirs:
- /etc
prometheus_storage_retention: "60d"
prometheus_storage_retention_size: "1GB"
prometheus_config_flags_extra:
alertmanager.timeout: 10s
web.enable-admin-api:
enable-feature:
- promql-at-modifier
- remote-write-receiver
prometheus_alertmanager_config:
- scheme: https
path_prefix: /alertmanager
basic_auth:
username: user
password: pass
static_configs:
- targets: ["127.0.0.1:9090"]
proxy_url: "127.0.0.2"
prometheus_alert_relabel_configs:
- action: labeldrop
regex: replica
prometheus_global:
scrape_interval: 3s
scrape_timeout: 2s
evaluation_interval: 10s
prometheus_remote_write:
- url: http://influx.cloudalchemy.org:8086/api/v1/prom/write?db=test
basic_auth:
username: prometheus
password: SuperSecret
prometheus_remote_read:
- url: http://influx.cloudalchemy.org:8086/api/v1/prom/read?db=cloudalchemy
prometheus_external_labels:
environment: "alternative"
prometheus_targets:
node:
- targets:
- demo.cloudalchemy.org:9100
- influx.cloudalchemy.org:9100
labels:
env: cloudalchemy
docker:
- targets:
- demo.cloudalchemy.org:8080
- influx.cloudalchemy.org:8080
labels:
env: cloudalchemy
prometheus_scrape_configs:
- job_name: "prometheus"
metrics_path: "{{ prometheus_metrics_path }}"
static_configs:
- targets:
- "{{ ansible_fqdn | default(ansible_host) | default('localhost') }}:9090"
- job_name: "node"
file_sd_configs:
- files:
- "{{ prometheus_config_dir }}/file_sd/node.yml"
- job_name: "docker"
file_sd_configs:
- files:
- "{{ prometheus_config_dir }}/file_sd/docker.yml"
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- http://demo.cloudalchemy.org:9100
- http://influx.cloudalchemy.org:9100
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # Blackbox exporter.

View file

@ -0,0 +1,38 @@
---
- name: Prepare
hosts: localhost
gather_facts: false
vars:
# This is meant to test a local prepared binary. It needs to be updated to support the minium
# flag features in the systemd service file.
version: 2.25.2
tasks:
- name: download prometheus binary to local folder
become: false
get_url:
url: "https://github.com/prometheus/prometheus/releases/download/v{{ version }}/prometheus-{{ version }}.linux-amd64.tar.gz"
dest: "/tmp/prometheus-{{ version }}.linux-amd64.tar.gz"
register: _download_archive
until: _download_archive is succeeded
retries: 5
delay: 2
run_once: true
check_mode: false
- name: unpack prometheus binaries
become: false
unarchive:
src: "/tmp/prometheus-{{ version }}.linux-amd64.tar.gz"
dest: "/tmp"
creates: "/tmp/prometheus-{{ version }}.linux-amd64/prometheus"
run_once: true
check_mode: false
- name: link to prometheus binaries directory
become: false
file:
src: "/tmp/prometheus-{{ version }}.linux-amd64"
dest: "/tmp/prometheus-linux-amd64"
state: link
run_once: true
check_mode: false

View file

@ -0,0 +1,58 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("dirs", [
"/opt/prom/etc",
"/opt/prom/etc/rules",
"/opt/prom/etc/file_sd",
"/opt/prom/lib"
])
def test_directories(host, dirs):
d = host.file(dirs)
assert d.is_directory
assert d.exists
@pytest.mark.parametrize("files", [
"/opt/prom/etc/prometheus.yml",
"/opt/prom/etc/rules/ansible_managed.rules",
"/opt/prom/etc/file_sd/node.yml",
"/opt/prom/etc/file_sd/docker.yml",
"/usr/local/bin/prometheus",
"/usr/local/bin/promtool"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
@pytest.mark.parametrize('file, content', [
("/etc/systemd/system/prometheus.service",
"ReadOnly.*=/etc"),
("/etc/systemd/system/prometheus.service",
"enable-feature=promql-at-modifier"),
("/etc/systemd/system/prometheus.service",
"enable-feature=remote-write-receiver"),
])
def test_file_contents(host, file, content):
f = host.file(file)
assert f.exists
assert f.is_file
assert f.contains(content)
def test_service(host):
s = host.service("prometheus")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
s = host.socket("tcp://127.0.0.1:9090")
assert s.is_listening

View file

@ -0,0 +1,75 @@
---
dependency:
name: galaxy
driver:
name: docker
# lint: |
# set -e
# yamllint .
# ansible-lint
# flake8
platforms:
- name: bionic
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-18.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: xenial
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:ubuntu-16.04
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: stretch
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-9
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos7
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-7
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: centos8
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:centos-8
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
prepare: prepare.yml
converge: playbook.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,6 @@
---
- name: Run role
hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.prometheus

View file

@ -0,0 +1,5 @@
---
- name: Prepare
hosts: all
gather_facts: false
tasks: []

View file

@ -0,0 +1,73 @@
import pytest
import os
import yaml
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.fixture()
def AnsibleDefaults():
with open("defaults/main.yml", 'r') as stream:
return yaml.load(stream)
@pytest.mark.parametrize("dirs", [
"/etc/prometheus",
"/etc/prometheus/console_libraries",
"/etc/prometheus/consoles",
"/etc/prometheus/rules",
"/etc/prometheus/file_sd",
"/var/lib/prometheus"
])
def test_directories(host, dirs):
d = host.file(dirs)
assert d.is_directory
assert d.exists
@pytest.mark.parametrize("files", [
"/etc/prometheus/prometheus.yml",
"/etc/prometheus/console_libraries/prom.lib",
"/etc/prometheus/consoles/prometheus.html",
"/etc/prometheus/web.yml",
"/etc/systemd/system/prometheus.service",
"/usr/local/bin/prometheus",
"/usr/local/bin/promtool"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
@pytest.mark.parametrize("files", [
"/etc/prometheus/rules/ansible_managed.rules"
])
def test_absent(host, files):
f = host.file(files)
assert f.exists
def test_user(host):
assert host.group("prometheus").exists
assert host.user("prometheus").exists
def test_service(host):
s = host.service("prometheus")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
s = host.socket("tcp://0.0.0.0:9090")
assert s.is_listening
def test_version(host, AnsibleDefaults):
version = os.getenv('PROMETHEUS', AnsibleDefaults['prometheus_version'])
run = host.run("/usr/local/bin/prometheus --version")
out = run.stdout+run.stderr
assert "prometheus, version " + version in out

View file

@ -0,0 +1,35 @@
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: buster
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:debian-10
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- name: fedora
pre_build_image: true
image: quay.io/paulfantom/molecule-systemd:fedora-30
docker_host: "${DOCKER_HOST:-unix://var/run/docker.sock}"
privileged: true
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
groups:
- python3
provisioner:
name: ansible
playbooks:
create: ../default/create.yml
prepare: ../default/prepare.yml
converge: playbook.yml
destroy: ../default/destroy.yml
inventory:
group_vars:
python3:
ansible_python_interpreter: /usr/bin/python3
verifier:
name: testinfra

View file

@ -0,0 +1,8 @@
---
- name: Run role
hosts: all
any_errors_fatal: true
roles:
- cloudalchemy.prometheus
vars:
prometheus_version: latest

View file

@ -0,0 +1,28 @@
import pytest
import os
import testinfra.utils.ansible_runner
testinfra_hosts = testinfra.utils.ansible_runner.AnsibleRunner(
os.environ['MOLECULE_INVENTORY_FILE']).get_hosts('all')
@pytest.mark.parametrize("files", [
"/etc/systemd/system/prometheus.service",
"/usr/local/bin/prometheus",
"/usr/local/bin/promtool"
])
def test_files(host, files):
f = host.file(files)
assert f.exists
assert f.is_file
def test_service(host):
s = host.service("prometheus")
# assert s.is_enabled
assert s.is_running
def test_socket(host):
s = host.socket("tcp://0.0.0.0:9090")
assert s.is_listening

View file

@ -0,0 +1,69 @@
---
- name: alerting rules file
template:
src: "alert.rules.j2"
dest: "{{ prometheus_config_dir }}/rules/ansible_managed.rules"
owner: root
group: prometheus
mode: 0640
validate: "{{ _prometheus_binary_install_dir }}/promtool check rules %s"
when:
- prometheus_alert_rules != []
notify:
- reload prometheus
- name: copy custom alerting rule files
copy:
src: "{{ item }}"
dest: "{{ prometheus_config_dir }}/rules/"
owner: root
group: prometheus
mode: 0640
validate: "{{ _prometheus_binary_install_dir }}/promtool check rules %s"
with_fileglob: "{{ prometheus_alert_rules_files }}"
notify:
- reload prometheus
- name: configure prometheus
template:
src: "{{ prometheus_config_file }}"
dest: "{{ prometheus_config_dir }}/prometheus.yml"
force: true
owner: root
group: prometheus
mode: 0640
validate: "{{ _prometheus_binary_install_dir }}/promtool check config %s"
notify:
- reload prometheus
- name: configure Prometheus web
copy:
content: "{{ prometheus_web_config | to_nice_yaml(indent=2,sort_keys=False) }}"
dest: "{{ prometheus_config_dir }}/web.yml"
force: true
owner: root
group: prometheus
mode: 0640
- name: configure prometheus static targets
copy:
content: |
#jinja2: lstrip_blocks: True
{{ item.value | to_nice_yaml(indent=2,sort_keys=False) }}
dest: "{{ prometheus_config_dir }}/file_sd/{{ item.key }}.yml"
force: true
owner: root
group: prometheus
mode: 0640
with_dict: "{{ prometheus_targets }}"
when: prometheus_targets != {}
- name: copy prometheus custom static targets
copy:
src: "{{ item }}"
dest: "{{ prometheus_config_dir }}/file_sd/"
force: true
owner: root
group: prometheus
mode: 0640
with_fileglob: "{{ prometheus_static_targets_files }}"

View file

@ -0,0 +1,137 @@
---
- name: create prometheus system group
group:
name: prometheus
system: true
state: present
- name: create prometheus system user
user:
name: prometheus
system: true
shell: "/usr/sbin/nologin"
group: prometheus
createhome: false
home: "{{ prometheus_db_dir }}"
- name: create prometheus data directory
file:
path: "{{ prometheus_db_dir }}"
state: directory
owner: prometheus
group: prometheus
mode: 0755
- name: create prometheus configuration directories
file:
path: "{{ item }}"
state: directory
owner: root
group: prometheus
mode: 0770
with_items:
- "{{ prometheus_config_dir }}"
- "{{ prometheus_config_dir }}/rules"
- "{{ prometheus_config_dir }}/file_sd"
- block:
- name: download prometheus binary to local folder
become: false
get_url:
url: "https://github.com/prometheus/prometheus/releases/download/v{{ prometheus_version }}/prometheus-{{ prometheus_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp/prometheus-{{ prometheus_version }}.linux-{{ go_arch }}.tar.gz"
checksum: "sha256:{{ __prometheus_checksum }}"
register: _download_archive
until: _download_archive is succeeded
retries: 5
delay: 2
# run_once: true # <-- this cannot be set due to multi-arch support
delegate_to: localhost
check_mode: false
- name: unpack prometheus binaries
become: false
unarchive:
src: "/tmp/prometheus-{{ prometheus_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp"
creates: "/tmp/prometheus-{{ prometheus_version }}.linux-{{ go_arch }}/prometheus"
delegate_to: localhost
check_mode: false
- name: propagate official prometheus and promtool binaries
copy:
src: "/tmp/prometheus-{{ prometheus_version }}.linux-{{ go_arch }}/{{ item }}"
dest: "{{ _prometheus_binary_install_dir }}/{{ item }}"
mode: 0755
owner: root
group: root
with_items:
- prometheus
- promtool
notify:
- restart prometheus
- name: propagate official console templates
copy:
src: "/tmp/prometheus-{{ prometheus_version }}.linux-{{ go_arch }}/{{ item }}/"
dest: "{{ prometheus_config_dir }}/{{ item }}/"
mode: 0644
owner: root
group: root
with_items:
- console_libraries
- consoles
notify:
- restart prometheus
when:
- prometheus_binary_local_dir | length == 0
- not prometheus_skip_install
- name: propagate locally distributed prometheus and promtool binaries
copy:
src: "{{ prometheus_binary_local_dir }}/{{ item }}"
dest: "{{ _prometheus_binary_install_dir }}/{{ item }}"
mode: 0755
owner: root
group: root
with_items:
- prometheus
- promtool
when:
- prometheus_binary_local_dir | length > 0
- not prometheus_skip_install
notify:
- restart prometheus
- name: create systemd service unit
template:
src: prometheus.service.j2
dest: /etc/systemd/system/prometheus.service
owner: root
group: root
mode: 0644
notify:
- restart prometheus
- name: Install SELinux dependencies
package:
name: "{{ item }}"
state: present
with_items: "{{ prometheus_selinux_packages }}"
register: _install_packages
until: _install_packages is succeeded
retries: 5
delay: 2
when:
- ansible_version.full is version('2.4', '>=')
- ansible_selinux.status == "enabled"
- name: Allow prometheus to bind to port in SELinux
seport:
ports: "{{ prometheus_web_listen_address.split(':')[1] }}"
proto: tcp
setype: http_port_t
state: present
when:
- ansible_version.full is version('2.4', '>=')
- ansible_selinux.status == "enabled"

View file

@ -0,0 +1,38 @@
---
- name: Gather variables for each operating system
include_vars: "{{ item }}"
with_first_found:
- "{{ ansible_distribution | lower }}-{{ ansible_distribution_major_version }}.yml"
- "{{ ansible_distribution | lower }}.yml"
- "{{ ansible_os_family | lower }}-{{ ansible_distribution_major_version | lower }}.yml"
- "{{ ansible_os_family | lower }}.yml"
tags:
- prometheus_configure
- prometheus_install
- prometheus_run
- include: preflight.yml
tags:
- prometheus_configure
- prometheus_install
- prometheus_run
- include: install.yml
become: true
tags:
- prometheus_install
- include: configure.yml
become: true
tags:
- prometheus_configure
- name: ensure prometheus service is started and enabled
become: true
systemd:
daemon_reload: true
name: prometheus
state: started
enabled: true
tags:
- prometheus_run

View file

@ -0,0 +1,114 @@
---
- name: Assert usage of systemd as an init system
assert:
that: ansible_service_mgr == 'systemd'
msg: "This module only works with systemd"
- name: Get systemd version
command: systemctl --version
changed_when: false
check_mode: false
register: __systemd_version
tags:
- skip_ansible_lint
- name: Set systemd version fact
set_fact:
prometheus_systemd_version: "{{ __systemd_version.stdout_lines[0].split(' ')[-1] }}"
- name: Assert no duplicate config flags
assert:
that:
- prometheus_config_flags_extra['config.file'] is not defined
- prometheus_config_flags_extra['storage.tsdb.path'] is not defined
- prometheus_config_flags_extra['storage.local.path'] is not defined
- prometheus_config_flags_extra['web.listen-address'] is not defined
- prometheus_config_flags_extra['web.external-url'] is not defined
msg: "Detected duplicate configuration entry. Please check your ansible variables and role README.md."
- name: Assert external_labels aren't configured twice
assert:
that: prometheus_global.external_labels is not defined
msg: "Use prometheus_external_labels to define external labels"
- name: Set prometheus external metrics path
set_fact:
prometheus_metrics_path: "/{{ ( prometheus_web_external_url + '/metrics' ) | regex_replace('^(.*://)?(.*?)/') }}"
- name: Fail when prometheus_config_flags_extra duplicates parameters set by other variables
fail:
msg: >
Whooops. You are duplicating configuration. Please look at your prometheus_config_flags_extra
and check against other variables in defaults/main.yml
with_items:
- 'storage.tsdb.retention'
- 'storage.tsdb.path'
- 'storage.local.retention'
- 'storage.local.path'
- 'config.file'
- 'web.listen-address'
- 'web.external-url'
when: item in prometheus_config_flags_extra.keys()
- name: Get all file_sd files from scrape_configs
set_fact:
file_sd_files: "{{ prometheus_scrape_configs | json_query('[*][].file_sd_configs[*][].files[]') }}"
- name: Fail when file_sd targets are not defined in scrape_configs
fail:
msg: >
Oh, snap! `{{ item.key }}` couldn't be found in your scrape configs. Please ensure you provided
all targets from prometheus_targets in prometheus_scrape_configs
when: not prometheus_config_dir + "/file_sd/" + item.key + ".yml" in file_sd_files
# when: not item | basename | splitext | difference(['.yml']) | join('') in prometheus_targets.keys()
with_dict: "{{ prometheus_targets }}"
- name: Alert when prometheus_alertmanager_config is empty, but prometheus_alert_rules is specified
debug:
msg: >
No alertmanager configuration was specified. If you want your alerts to be sent make sure to
specify a prometheus_alertmanager_config in defaults/main.yml.
when:
- prometheus_alertmanager_config == []
- prometheus_alert_rules != []
- block:
- name: Get latest release
uri:
url: "https://api.github.com/repos/prometheus/prometheus/releases/latest"
method: GET
return_content: true
status_code: 200
body_format: json
validate_certs: false
user: "{{ lookup('env', 'GH_USER') | default(omit) }}"
password: "{{ lookup('env', 'GH_TOKEN') | default(omit) }}"
no_log: "{{ not lookup('env', 'ANSIBLE_DEBUG') | bool }}"
register: _latest_release
until: _latest_release.status == 200
retries: 5
- name: "Set prometheus version to {{ _latest_release.json.tag_name[1:] }}"
set_fact:
prometheus_version: "{{ _latest_release.json.tag_name[1:] }}"
when:
- prometheus_version == "latest"
- prometheus_binary_local_dir | length == 0
- not prometheus_skip_install
- block:
- name: "Get checksum list"
set_fact:
__prometheus_checksums: "{{ lookup('url', 'https://github.com/prometheus/prometheus/releases/download/v' + prometheus_version + '/sha256sums.txt', wantlist=True) | list }}"
run_once: true
- name: "Get checksum for {{ go_arch }} architecture"
set_fact:
__prometheus_checksum: "{{ item.split(' ')[0] }}"
with_items: "{{ __prometheus_checksums }}"
when:
- "('linux-' + go_arch + '.tar.gz') in item"
delegate_to: localhost
when:
- prometheus_binary_local_dir | length == 0
- not prometheus_skip_install

View file

@ -0,0 +1,6 @@
{{ ansible_managed | comment }}
groups:
- name: ansible managed alert rules
rules:
{{ prometheus_alert_rules | to_nice_yaml(indent=2,sort_keys=False) | indent(2,False) }}

View file

@ -0,0 +1,85 @@
{{ ansible_managed | comment }}
[Unit]
Description=Prometheus
After=network-online.target
Requires=local-fs.target
After=local-fs.target
[Service]
Type=simple
Environment="GOMAXPROCS={{ ansible_processor_vcpus|default(ansible_processor_count) }}"
User=prometheus
Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart={{ _prometheus_binary_install_dir }}/prometheus \
--storage.tsdb.path={{ prometheus_db_dir }} \
{% if prometheus_version is version('2.7.0', '>=') %}
--storage.tsdb.retention.time={{ prometheus_storage_retention }} \
--storage.tsdb.retention.size={{ prometheus_storage_retention_size }} \
{% else %}
--storage.tsdb.retention={{ prometheus_storage_retention }} \
{% endif %}
{% if prometheus_version is version('2.24.0', '>=') %}
--web.config.file={{ prometheus_config_dir }}/web.yml \
{% endif %}
--web.console.libraries={{ prometheus_config_dir }}/console_libraries \
--web.console.templates={{ prometheus_config_dir }}/consoles \
--web.listen-address={{ prometheus_web_listen_address }} \
--web.external-url={{ prometheus_web_external_url }} \
{% for flag, flag_value in prometheus_config_flags_extra.items() %}
{% if not flag_value %}
--{{ flag }} \
{% elif flag_value is string %}
--{{ flag }}={{ flag_value }} \
{% elif flag_value is sequence %}
{% for flag_value_item in flag_value %}
--{{ flag }}={{ flag_value_item }} \
{% endfor %}
{% endif %}
{% endfor %}
--config.file={{ prometheus_config_dir }}/prometheus.yml
CapabilityBoundingSet=CAP_SET_UID
LimitNOFILE=65000
LockPersonality=true
NoNewPrivileges=true
MemoryDenyWriteExecute=true
PrivateDevices=true
PrivateTmp=true
ProtectHome=true
RemoveIPC=true
RestrictSUIDSGID=true
#SystemCallFilter=@signal @timer
{% if prometheus_systemd_version | int >= 231 %}
ReadWritePaths={{ prometheus_db_dir }}
{% for path in prometheus_read_only_dirs %}
ReadOnlyPaths={{ path }}
{% endfor %}
{% else %}
ReadWriteDirectories={{ prometheus_db_dir }}
{% for path in prometheus_read_only_dirs %}
ReadOnlyDirectories={{ path }}
{% endfor %}
{% endif %}
{% if prometheus_systemd_version | int >= 232 %}
PrivateUsers=true
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectSystem=strict
{% else %}
ProtectSystem=full
{% endif %}
{% if http_proxy is defined %}
Environment="HTTP_PROXY={{ http_proxy }}"{% if https_proxy is defined %} "HTTPS_PROXY={{ https_proxy }}{% endif %}"
{% endif %}
SyslogIdentifier=prometheus
Restart=always
[Install]
WantedBy=multi-user.target

View file

@ -0,0 +1,34 @@
#jinja2: trim_blocks: True, lstrip_blocks: True
{{ ansible_managed | comment }}
# http://prometheus.io/docs/operating/configuration/
global:
{{ prometheus_global | to_nice_yaml(indent=2,sort_keys=False) | indent(2, False) }}
external_labels:
{{ prometheus_external_labels | to_nice_yaml(indent=2,sort_keys=False) | indent(4, False) }}
{% if prometheus_remote_write != [] %}
remote_write:
{{ prometheus_remote_write | to_nice_yaml(indent=2,sort_keys=False) | indent(2, False) }}
{% endif %}
{% if prometheus_remote_read != [] %}
remote_read:
{{ prometheus_remote_read | to_nice_yaml(indent=2,sort_keys=False) | indent(2, False) }}
{% endif %}
rule_files:
- {{ prometheus_config_dir }}/rules/*.rules
{% if prometheus_alertmanager_config | length > 0 %}
alerting:
alertmanagers:
{{ prometheus_alertmanager_config | to_nice_yaml(indent=2,sort_keys=False) | indent(2,False) }}
{% if prometheus_alert_relabel_configs | length > 0 %}
alert_relabel_configs:
{{ prometheus_alert_relabel_configs | to_nice_yaml(indent=2,sort_keys=False) | indent(2,False) }}
{% endif %}
{% endif %}
scrape_configs:
{{ prometheus_scrape_configs | to_nice_yaml(indent=2,sort_keys=False) | indent(2,False) }}

View file

@ -0,0 +1,4 @@
---
prometheus_selinux_packages:
- python3-libselinux
- python3-policycoreutils

View file

@ -0,0 +1,4 @@
---
prometheus_selinux_packages:
- libselinux-python
- policycoreutils-python

View file

@ -0,0 +1,4 @@
---
prometheus_selinux_packages:
- python-selinux
- policycoreutils

Some files were not shown because too many files have changed in this diff Show more