prometheus之Alertmanage告警

运行环境

操作系统:Centos7
Server版本:prometheus-2.22.0
Alertmanage版本:alertmanager-0.21.0.linux-amd64

所需端口

Server端:9090
Alertmanage端:9093

软件部署

Github:Alertmanage-0.21.0

部署Alertmanage

[root@prometheus software]# cp -rf alertmanager-0.21.0.linux-amd64.tar.gz /usr/local/alertmanage
[root@prometheus software]# cd /usr/local/alertmanager/
[root@prometheus alertmanager]# ls
alertmanager  alertmanager.yml  alertmanager.yml.bak  amtool  data  email.tmpl  LICENSE  NOTICE
[root@prometheus alertmanager]# vim alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_from: 'im@lian.st'
  smtp_smarthost: 'smtp.地址:465'
  smtp_auth_username: '发件人用户名'
  smtp_auth_password: '发件人密码'
  smtp_require_tls: false
  smtp_hello: 'lian.st'
#templates: # 自定义邮件模板
  #- '/usr/local/alertmanager/email.tmpl'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '9763307@qq.com' # 收件人地址
    #html: '{{ template "email.to.html" . }}'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
 

进程守护

[root@prometheus alertmanager]# vim /usr/lib/systemd/system/alertmanager.service 
[Unit]
Description=alertmanager.server
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target
[root@prometheus alertmanager]# systemctl daemon-reload
[root@prometheus alertmanager]# systemctl enable alertmanager.service

检查进程/端口

[root@prometheus alertmanager]# ps aux | grep alertmanage
root     1517645  0.0  1.7 723836 32048 ?        Ssl  Nov02   0:33 /usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanagr/alertmanager.yml
root     1698866  0.0  0.0  12112  1088 pts/0    R+   10:01   0:00 grep --color=auto alertmanage
[root@prometheus alertmanager]# ss -anptu | grep 9093
tcp  LISTEN    0      128                        *:9093                       *:*            users:(("alertmanager",pid=1517645,fd=9))  

测试访问

alertmanage-web

定义报警规则

[root@prometheus alertmanager]# cd /usr/local/prometheus/rules/
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node"} == 0
    for: 5s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行!"
      description: "{{ $labels.instance }} 检测到异常停止!请重点关注!!!"
[root@prometheus rules]# systemctl restart prometheus_server
[root@prometheus rules]# systemctl restart alertmanager

测试告警

根据上述报警规则,我们将一台机器的node_exporter停止,来触发告警。

[root@pwd ~]# systemctl stop node_exporter
[root@pwd ~]# systemctl status node_exporter
● node_exporter.service - node_exporter
   Loaded: loaded (/usr/lib/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=timex
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=udp_queues
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=uname
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=vmstat
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=xfs
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=zfs
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:191 msg="Listening...s=:9100
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=tls_config.go:170 msg="TLS is disab...2=false
Nov 02 21:07:42 pwd.lian.st systemd[1]: Stopping node_exporter...
Nov 02 21:07:42 pwd.lian.st systemd[1]: Stopped node_exporter.
Hint: Some lines were ellipsized, use -l to show in full.

告警邮件

alertmanage-web

恢复邮件

alertmanage-web

TODO

告警介质:钉钉,微信,短信,TG;
告警规则:常用规则(linux server、nginx、apahce、mysql、redis、jvm)


文章最后更新时间 : 2020年11月03日 15:28:04

评论已关闭