Promethues部署教程（三）-阿南达文事网

Promethues部署教程（三）

实现目标：创建普通告警，严重告警两个飞书群，当cpu高的时候发送告警到普通群，服务器宕机时发送告警到严重告警群。

一、prometheus添加告警规则：

1、在prometheus中的安装目录中添加告警规则

[root@localhost rules]#vim node_exporter.rules

编写规则参考prometheus官网：

Alerting rules | Prometheus

2、添加告警规则路径，重启promethues服务

[root@localhost prometheus]# vim prometheus.yml

3、在promethues网页端看到以读到告警信息：

4、测试：

（1）压力测试工具：cpuburn

当cpu超过90%，页面会有告警。

（2）主机关机测试，将主机关机页面会有告警：

实现上面的效果说明告警规则是正确的。

二、创建两个飞书群

1、在飞书上创建常规，严重两个群。可以根据告警事件严重程度分别在两个群里面告警。

2、添加ip白名单是通过在发送告警的主机上输入：curl ifconfig.me查出来的，动态IP可能变化，直接加ip的c段。

三、安装alertmanager：

1、下载并安装：下载 |普罗米修斯

修改配置文件

[root@localhost alertmanager]# vim alertmanager.yml

修改prometheus的配置文件，添加alertmanager地址和端口

vim /etc/prometheus/prometheus.yml

4、将cpu超过90%的级别调整为warning，主机宕机调整为critical

vim /etc/prometheus/rules/vim node_exporter.rules

修改完上面的内容重启prometheus

5、启动alertmanager

[root@localhost ~]# vim /etc/systemd/system/alertmanager.service

[Unit]

Description=AlertManager Service

[Service]

Restart=always

ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml

[Install]

WantedBy=multi-user.target

[root@localhost ~]# systemctl daemon-relaod

[root@localhost ~]# systemctl start alertmanager

[root@localhost ~]# systemctl enable alertmanager

6、登录页面：

7、cpu压力测试：登录192.168.43.168:9093显示下面的效果，说明prometheus和alertmanager连通。

安装prometheusAlert

1、下载并安装

[root@localhost ~]# wget -P /usr/local/src .8.1/linux.zip

[root@localhost ~]# cd /usr/local/src

[root@localhost src]#unzip linux.zip

[root@localhost src]# mv linux prometheusAlert

[root@localhost ~]# cd /usr/local/prometheusAlert/

[root@localhost prometheusAlert]# ls

conf db logs PrometheusAlert PrometheusAlertVoice static user.csv views zabbix

2、修改配置文件

[root@localhost prometheusAlert]# vim conf/app.conf

默认的登录prometheusAlert的账号密码

添加飞书告警的机器人地址：

3、配置启动项：

vim /etc/systemd/system/prometheusAlert.service

[Service]

ExecStart=/usr/local/prometheusAlert/PrometheusAlert

WorkingDirectory=/usr/local/prometheusAlert

Restart=always

[Install]

WantedBy=multi-user.target

[Unit]

Description=Prometheus Alerting Service

After=network.target

systemctl daemon-reload

systemctl start prometheusalert

systemctl enable prometheusalert

4、登录页面：

端口，账号密码分别是配置文件中全局变量的监听端口，登录账号和密码也可以自行修改

5、修改告警模板

修改上面的模板内容为：（默认的模板告警时间与实际时间差8小时）

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}{{if eq $v.status "resolved"}}‌**[Prometheus恢复信息]({{$v.generatorURL}})**‌

*[{{$v.labels.alertname}}]({{$var}})*

告警级别：{{$v.labels.level}}

开始时间：{{GetCSTtime $v.startsAt}}

结束时间：{{GetCSTtime $v.endsAt}}

故障主机IP：{{$v.labels.instance}}

‌**{{$v.annotations.description}}**‌{{else}}‌**[Prometheus告警信息]({{$v.generatorURL}})**‌

*[{{$v.labels.alertname}}]({{$var}})*

告警级别：{{$v.labels.level}}

开始时间：{{GetCSTtime $v.startsAt}}

结束时间：{{GetCSTtime $v.endsAt}}

故障主机IP：{{$v.labels.instance}}

‌**{{$v.annotations.description}}**‌{{end}}{{ end }}

{{ $urimsg:=""}}{{ range $key,$value:=monLabels }}{{$urimsg = print $urimsg $key "%3D%22" $value "%22%2C" }}{{end}}[*** 点我屏蔽该告警]({{$var}}/#/silences/new?filter=%7B{{SplitString $urimsg 0 -3}}%7D)

6、在alertmanager目录下添加告警模板：

[root@localhost ~]#cd /usr/local/alertmanager

[root@localhost alertmanager]#mkdir template

vim template/test.tmpl

[报警项名]:{{ index $alert.Labels "alertname" }}

[报警主机]:{{ index $alert.Labels "instance" }}

[报警阀值]:{{ index $alert.Annotations "value" }}

[开始时间]:{{ $alert.StartsAt }}

7、修改alertmanager配置文件并重启alertmanager服务

验证效果：

1、在监控的主机上启动压力测试，测试cpu超过90%

Prometheus页面：

Alertmanager页面：

PromethuesAlert页面

飞书上以发送告警：

2、关闭监控的主机，测试主机宕机

Prometheus页面：

Alertmanager页面：

PromethuesAlert页面

飞书上显示

Promethues部署教程（三）