基于指标的告警

在本教程中，我们将基于先前在为 Go 编写的 HTTP 服务器埋点教程中埋点的 ping_request_count 指标创建告警。

在本教程中，为了演示目的，当 ping_request_count 指标大于 5 时我们将触发告警，请查阅现实世界的最佳实践以了解更多告警原则。

从这里下载适用于您操作系统的最新版本 Alertmanager

Alertmanager 支持多种接收器，例如 email、webhook、pagerduty、slack 等，通过它们可以在告警触发时进行通知。您可以在这里找到接收器列表以及如何配置它们。在本教程中，我们将使用 webhook 作为接收器，请访问webhook.site 并复制 Webhook URL，我们稍后将使用它来配置 Alertmanager。

首先，让我们使用 Webhook 接收器配置 Alertmanager。

alertmanager.yml

global:
  resolve_timeout: 5m
route:
  receiver: webhook_receiver
receivers:
    - name: webhook_receiver
      webhook_configs:
        - url: '<INSERT-YOUR-WEBHOOK>'
          send_resolved: false

将 alertmanager.yml 文件中的 <INSERT-YOUR-WEBHOOK> 替换为我们之前复制的 Webhook，然后使用以下命令运行 Alertmanager。

alertmanager --config.file=alertmanager.yml

Alertmanager 启动并运行后，导航到 http://localhost:9093 即可访问它。

现在我们已经使用 Webhook 接收器配置了 Alertmanager，接下来将规则添加到 Prometheus 配置中。

prometheus.yml

global:
 scrape_interval: 15s
 evaluation_interval: 10s
rule_files:
  - rules.yml
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - localhost:9093
scrape_configs:
 - job_name: prometheus
   static_configs:
       - targets: ["localhost:9090"]
 - job_name: simple_server
   static_configs:
       - targets: ["localhost:8090"]

如果您注意到 evaluation_interval、rule_files 和 alerting 部分已添加到 Prometheus 配置中，evaluation_interval 定义了规则评估的间隔，rule_files 接受一个定义规则的 YAML 文件数组，而 alerting 部分定义了 Alertmanager 配置。正如本教程开头所述，我们将创建一个基本规则：当 ping_request_count 值大于 5 时触发告警。

rules.yml

groups:
 - name: Count greater than 5
   rules:
   - alert: CountGreaterThan5
     expr: ping_request_count > 5
     for: 10s

现在使用以下命令运行 Prometheus。

prometheus --config.file=./prometheus.yml

在浏览器中打开 http://localhost:9090/rules 查看规则。接下来运行埋点好的 ping 服务器，访问 http://localhost:8090/ping 端点并至少刷新页面 6 次。您可以通过导航到 http://localhost:8090/metrics 端点来检查 ping 计数。要查看告警状态，请访问 http://localhost:9090/alerts。一旦条件 ping_request_count > 5 持续为 true 超过 10 秒，state 将变为 FIRING。现在，如果您返回到 webhook.site URL，您将看到告警消息。

类似地，Alertmanager 可以配置其他接收器，以便在告警触发时进行通知。

本文档是开源的。请通过提交问题或拉取请求来帮助改进它。

我们正在就 Prometheus 中的 OTLP 资源属性进行一项调查，请参与！