配置

Prometheus 通过命令行标志和配置文件进行配置。命令行标志配置不可变的系统参数(例如存储位置、磁盘和内存中保留的数据量等),而配置文件定义了与抓取作业及其实例以及要加载的规则文件相关的所有内容。

要查看所有可用的命令行标志,运行 ./prometheus -h

Prometheus 可以在运行时重新加载其配置。如果新配置格式不正确,更改将不会被应用。配置重新加载由向 Prometheus 进程发送 SIGHUP 信号或向 /-/reload 端点发送 HTTP POST 请求(当启用了 --web.enable-lifecycle 标志时)触发。这也会重新加载任何配置的规则文件。

配置文件

要指定加载哪个配置文件,使用 --config.file 标志。

文件采用YAML 格式编写,由以下描述的方案定义。括号表示参数是可选的。对于非列表参数,值设置为指定的默认值。

通用占位符定义如下

  • <boolean>: 布尔值,可以取值 truefalse
  • <duration>: 匹配正则表达式 ((([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?|0) 的持续时间,例如 1d, 1h30m, 5m, 10s
  • <filename>: 当前工作目录中的有效路径
  • <float>: 浮点数
  • <host>: 由主机名或 IP 后跟可选端口号组成的有效字符串
  • <int>: 整数值
  • <labelname>: 匹配正则表达式 [a-zA-Z_][a-zA-Z0-9_]* 的字符串。源标签中任何其他不支持的字符应转换为下划线。例如,标签 app.kubernetes.io/name 应写为 app_kubernetes_io_name
  • <labelvalue>: Unicode 字符组成的字符串
  • <path>: 有效的 URL 路径
  • <scheme>: 可以取值 httphttps 的字符串
  • <secret>: 常规字符串,是秘密,如密码
  • <string>: 常规字符串
  • <size>: 字节大小,例如 512MB。单位是必需的。支持的单位:B, KB, MB, GB, TB, PB, EB。
  • <tmpl_string>: 在使用前进行模板扩展的字符串

其他占位符是单独指定的。

一个有效的示例文件可以在这里找到。

全局配置指定在所有其他配置上下文中有效的参数。它们也作为其他配置部分的默认值。

global:
  # How frequently to scrape targets by default.
  [ scrape_interval: <duration> | default = 1m ]

  # How long until a scrape request times out.
  # It cannot be greater than the scrape interval.
  [ scrape_timeout: <duration> | default = 10s ]

  # The protocols to negotiate during a scrape with the client.
  # Supported values (case sensitive): PrometheusProto, OpenMetricsText0.0.1,
  # OpenMetricsText1.0.0, PrometheusText0.0.4.
  # The default value changes to [ PrometheusProto, OpenMetricsText1.0.0, OpenMetricsText0.0.1, PrometheusText0.0.4 ]
  # when native_histogram feature flag is set.
  [ scrape_protocols: [<string>, ...] | default = [ OpenMetricsText1.0.0, OpenMetricsText0.0.1, PrometheusText0.0.4 ] ]

  # How frequently to evaluate rules.
  [ evaluation_interval: <duration> | default = 1m ]

  # Offset the rule evaluation timestamp of this particular group by the
  # specified duration into the past to ensure the underlying metrics have
  # been received. Metric availability delays are more likely to occur when
  # Prometheus is running as a remote write target, but can also occur when
  # there's anomalies with scraping.
  [ rule_query_offset: <duration> | default = 0s ]

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  # Environment variable references `${var}` or `$var` are replaced according
  # to the values of the current environment variables.
  # References to undefined variables are replaced by the empty string.
  # The `$` character can be escaped by using `$$`.
  external_labels:
    [ <labelname>: <labelvalue> ... ]

  # File to which PromQL queries are logged.
  # Reloading the configuration will reopen the file.
  [ query_log_file: <string> ]

  # File to which scrape failures are logged.
  # Reloading the configuration will reopen the file.
  [ scrape_failure_log_file: <string> ]

  # An uncompressed response body larger than this many bytes will cause the
  # scrape to fail. 0 means no limit. Example: 100MB.
  # This is an experimental feature, this behaviour could
  # change or be removed in the future.
  [ body_size_limit: <size> | default = 0 ]

  # Per-scrape limit on the number of scraped samples that will be accepted.
  # If more than this number of samples are present after metric relabeling
  # the entire scrape will be treated as failed. 0 means no limit.
  [ sample_limit: <int> | default = 0 ]

  # Limit on the number of labels that will be accepted per sample. If more
  # than this number of labels are present on any sample post metric-relabeling,
  # the entire scrape will be treated as failed. 0 means no limit.
  [ label_limit: <int> | default = 0 ]

  # Limit on the length (in bytes) of each individual label name. If any label
  # name in a scrape is longer than this number post metric-relabeling, the
  # entire scrape will be treated as failed. Note that label names are UTF-8
  # encoded, and characters can take up to 4 bytes. 0 means no limit.
  [ label_name_length_limit: <int> | default = 0 ]

  # Limit on the length (in bytes) of each individual label value. If any label
  # value in a scrape is longer than this number post metric-relabeling, the
  # entire scrape will be treated as failed. Note that label values are UTF-8
  # encoded, and characters can take up to 4 bytes. 0 means no limit.
  [ label_value_length_limit: <int> | default = 0 ]

  # Limit per scrape config on number of unique targets that will be
  # accepted. If more than this number of targets are present after target
  # relabeling, Prometheus will mark the targets as failed without scraping them.
  # 0 means no limit. This is an experimental feature, this behaviour could
  # change in the future.
  [ target_limit: <int> | default = 0 ]

  # Limit per scrape config on the number of targets dropped by relabeling
  # that will be kept in memory. 0 means no limit.
  [ keep_dropped_targets: <int> | default = 0 ]

  # Specifies the validation scheme for metric and label names. Either blank or
  # "utf8" for full UTF-8 support, or "legacy" for letters, numbers, colons,
  # and underscores.
  [ metric_name_validation_scheme: <string> | default "utf8" ]

  # Specifies whether to convert all scraped classic histograms into native
  # histograms with custom buckets.
  [ convert_classic_histograms_to_nhcb: <bool> | default = false]

  # Specifies whether to scrape a classic histogram, even if it is also exposed as a native
  # histogram (has no effect without --enable-feature=native-histograms).
  [ always_scrape_classic_histograms: <boolean> | default = false ]


runtime:
  # Configure the Go garbage collector GOGC parameter
  # See: https://tip.golang.org/doc/gc-guide#GOGC
  # Lowering this number increases CPU usage.
  [ gogc: <int> | default = 75 ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
  [ - <filepath_glob> ... ]

# Scrape config files specifies a list of globs. Scrape configs are read from
# all matching files and appended to the list of scrape configs.
scrape_config_files:
  [ - <filepath_glob> ... ]

# A list of scrape configurations.
scrape_configs:
  [ - <scrape_config> ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# Settings related to the remote write feature.
remote_write:
  [ - <remote_write> ... ]

# Settings related to the OTLP receiver feature.
# See https://prometheus.ac.cn/docs/guides/opentelemetry/ for best practices.
otlp:
  [ promote_resource_attributes: [<string>, ...] | default = [ ] ]
  # Configures translation of OTLP metrics when received through the OTLP metrics
  # endpoint. Available values:
  # - "UnderscoreEscapingWithSuffixes" refers to commonly agreed normalization used
  #   by OpenTelemetry in https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/translator/prometheus
  # - "NoUTF8EscapingWithSuffixes" is a mode that relies on UTF-8 support in Prometheus.
  #   It preserves all special characters like dots, but still adds required metric name suffixes
  #   for units and _total, as UnderscoreEscapingWithSuffixes does.
  # - (EXPERIMENTAL) "NoTranslation" is a mode that relies on UTF-8 support in Prometheus.
  #   It preserves all special character like dots and won't append special suffixes for metric
  #   unit and type.
  #
  #   WARNING: The "NoTranslation" setting has significant known risks and limitations (see https://prometheus.ac.cn/docs/practices/naming/
  #   for details):
  #       * Impaired UX when using PromQL in plain YAML (e.g. alerts, rules, dashboard, autoscaling configuration).
  #       * Series collisions which in the best case may result in OOO errors, in the worst case a silently malformed
  #         time series. For instance, you may end up in situation of ingesting `foo.bar` series with unit
  #         `seconds` and a separate series `foo.bar` with unit `milliseconds`.
  [ translation_strategy: <string> | default = "UnderscoreEscapingWithSuffixes" ]
  # Enables adding "service.name", "service.namespace" and "service.instance.id"
  # resource attributes to the "target_info" metric, on top of converting
  # them into the "instance" and "job" labels.
  [ keep_identifying_resource_attributes: <boolean> | default = false]
  # Configures optional translation of OTLP explicit bucket histograms into native histograms with custom buckets.
  [ convert_histograms_to_nhcb: <boolean> | default = false]

# Settings related to the remote read feature.
remote_read:
  [ - <remote_read> ... ]

# Storage related settings that are runtime reloadable.
storage:
  [ tsdb: <tsdb> ]
  [ exemplars: <exemplars> ]

# Configures exporting traces.
tracing:
  [ <tracing_config> ]

<scrape_config>

scrape_config 部分指定了一组目标和描述如何抓取它们的参数。通常情况下,一个抓取配置指定一个作业。在高级配置中,这可能会改变。

目标可以通过 static_configs 参数静态配置,或使用支持的服务发现机制之一动态发现。

此外,relabel_configs 允许在抓取之前对任何目标及其标签进行高级修改。

# The job name assigned to scraped metrics by default.
job_name: <job_name>

# How frequently to scrape targets from this job.
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

# Per-scrape timeout when scraping this job.
# It cannot be greater than the scrape interval.
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]

# The protocols to negotiate during a scrape with the client.
# Supported values (case sensitive): PrometheusProto, OpenMetricsText0.0.1,
# OpenMetricsText1.0.0, PrometheusText0.0.4, PrometheusText1.0.0.
[ scrape_protocols: [<string>, ...] | default = <global_config.scrape_protocols> ]

# Fallback protocol to use if a scrape returns blank, unparseable, or otherwise
# invalid Content-Type.
# Supported values (case sensitive): PrometheusProto, OpenMetricsText0.0.1,
# OpenMetricsText1.0.0, PrometheusText0.0.4, PrometheusText1.0.0.
[ fallback_scrape_protocol: <string> ]

# Whether to scrape a classic histogram, even if it is also exposed as a native
# histogram (has no effect without --enable-feature=native-histograms).
[ always_scrape_classic_histograms: <boolean> |
default = <global.always_scrape_classic_hisotgrams> ]

# The HTTP resource path on which to fetch metrics from targets.
[ metrics_path: <path> | default = /metrics ]

# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
#
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels: <boolean> | default = false ]

# honor_timestamps controls whether Prometheus respects the timestamps present
# in scraped data.
#
# If honor_timestamps is set to "true", the timestamps of the metrics exposed
# by the target will be used.
#
# If honor_timestamps is set to "false", the timestamps of the metrics exposed
# by the target will be ignored.
[ honor_timestamps: <boolean> | default = true ]

# track_timestamps_staleness controls whether Prometheus tracks staleness of
# the metrics that have an explicit timestamps present in scraped data.
#
# If track_timestamps_staleness is set to "true", a staleness marker will be
# inserted in the TSDB when a metric is no longer present or the target
# is down.
[ track_timestamps_staleness: <boolean> | default = false ]

# Configures the protocol scheme used for requests.
[ scheme: <scheme> | default = http ]

# Optional HTTP URL parameters.
params:
  [ <string>: [<string>, ...] ]

# If enable_compression is set to "false", Prometheus will request uncompressed
# response from the scraped target.
[ enable_compression: <boolean> | default = true ]

# File to which scrape failures are logged.
# Reloading the configuration will reopen the file.
[ scrape_failure_log_file: <string> ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]

# List of DigitalOcean service discovery configurations.
digitalocean_sd_configs:
  [ - <digitalocean_sd_config> ... ]

# List of Docker service discovery configurations.
docker_sd_configs:
  [ - <docker_sd_config> ... ]

# List of Docker Swarm service discovery configurations.
dockerswarm_sd_configs:
  [ - <dockerswarm_sd_config> ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]

# List of Eureka service discovery configurations.
eureka_sd_configs:
  [ - <eureka_sd_config> ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]

# List of Hetzner service discovery configurations.
hetzner_sd_configs:
  [ - <hetzner_sd_config> ... ]

# List of HTTP service discovery configurations.
http_sd_configs:
  [ - <http_sd_config> ... ]


# List of IONOS service discovery configurations.
ionos_sd_configs:
  [ - <ionos_sd_config> ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

# List of Kuma service discovery configurations.
kuma_sd_configs:
  [ - <kuma_sd_config> ... ]

# List of Lightsail service discovery configurations.
lightsail_sd_configs:
  [ - <lightsail_sd_config> ... ]

# List of Linode service discovery configurations.
linode_sd_configs:
  [ - <linode_sd_config> ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]

# List of Nomad service discovery configurations.
nomad_sd_configs:
  [ - <nomad_sd_config> ... ]

# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]

# List of OVHcloud service discovery configurations.
ovhcloud_sd_configs:
  [ - <ovhcloud_sd_config> ... ]

# List of PuppetDB service discovery configurations.
puppetdb_sd_configs:
  [ - <puppetdb_sd_config> ... ]

# List of Scaleway service discovery configurations.
scaleway_sd_configs:
  [ - <scaleway_sd_config> ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]

# List of Uyuni service discovery configurations.
uyuni_sd_configs:
  [ - <uyuni_sd_config> ... ]

# List of labeled statically configured targets for this job.
static_configs:
  [ - <static_config> ... ]

# List of target relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]

# List of metric relabel configurations.
metric_relabel_configs:
  [ - <relabel_config> ... ]

# An uncompressed response body larger than this many bytes will cause the
# scrape to fail. 0 means no limit. Example: 100MB.
# This is an experimental feature, this behaviour could
# change or be removed in the future.
[ body_size_limit: <size> | default = 0 ]

# Per-scrape limit on the number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabeling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit: <int> | default = 0 ]

# Limit on the number of labels that will be accepted per sample. If more
# than this number of labels are present on any sample post metric-relabeling,
# the entire scrape will be treated as failed. 0 means no limit.
[ label_limit: <int> | default = 0 ]

# Limit on the length (in bytes) of each individual label name. If any label
# name in a scrape is longer than this number post metric-relabeling, the
# entire scrape will be treated as failed. Note that label names are UTF-8
# encoded, and characters can take up to 4 bytes. 0 means no limit.
[ label_name_length_limit: <int> | default = 0 ]

# Limit on the length (in bytes) of each individual label value. If any label
# value in a scrape is longer than this number post metric-relabeling, the
# entire scrape will be treated as failed. Note that label values are UTF-8
# encoded, and characters can take up to 4 bytes. 0 means no limit.
[ label_value_length_limit: <int> | default = 0 ]

# Limit per scrape config on number of unique targets that will be
# accepted. If more than this number of targets are present after target
# relabeling, Prometheus will mark the targets as failed without scraping them.
# 0 means no limit. This is an experimental feature, this behaviour could
# change in the future.
[ target_limit: <int> | default = 0 ]

# Limit per scrape config on the number of targets dropped by relabeling
# that will be kept in memory. 0 means no limit.
[ keep_dropped_targets: <int> | default = 0 ]

# Specifies the validation scheme for metric and label names. Either blank or
# "utf8" for full UTF-8 support, or "legacy" for letters, numbers, colons, and
# underscores.
[ metric_name_validation_scheme: <string> | default "utf8" ]

# Specifies the character escaping scheme that will be requested when scraping
# for metric and label names that do not conform to the legacy Prometheus
# character set. Available options are:
#   * `allow-utf-8`: Full UTF-8 support, no escaping needed.
#   * `underscores`: Escape all legacy-invalid characters to underscores.
#   * `dots`: Escapes dots to `_dot_`, underscores to `__`, and all other
#     legacy-invalid characters to underscores.
#   * `values`: Prepend the name with `U__` and replace all invalid
#     characters with their unicode value, surrounded by underscores. Single
#     underscores are replaced with double underscores.
#     e.g. "U__my_2e_dotted_2e_name".
# If this value is left blank, Prometheus will default to `allow-utf-8` if the
# validation scheme for the current scrape config is set to utf8, or
# `underscores` if the validation scheme is set to `legacy`.
[ metric_name_validation_scheme: <string> | default "utf8" ]

# Limit on total number of positive and negative buckets allowed in a single
# native histogram. The resolution of a histogram with more buckets will be
# reduced until the number of buckets is within the limit. If the limit cannot
# be reached, the scrape will fail.
# 0 means no limit.
[ native_histogram_bucket_limit: <int> | default = 0 ]

# Lower limit for the growth factor of one bucket to the next in each native
# histogram. The resolution of a histogram with a lower growth factor will be
# reduced as much as possible until it is within the limit.
# To set an upper limit for the schema (equivalent to "scale" in OTel's
# exponential histograms), use the following factor limits:
#
# +----------------------------+----------------------------+
# |        growth factor       | resulting schema AKA scale |
# +----------------------------+----------------------------+
# |          65536             |             -4             |
# +----------------------------+----------------------------+
# |            256             |             -3             |
# +----------------------------+----------------------------+
# |             16             |             -2             |
# +----------------------------+----------------------------+
# |              4             |             -1             |
# +----------------------------+----------------------------+
# |              2             |              0             |
# +----------------------------+----------------------------+
# |              1.4           |              1             |
# +----------------------------+----------------------------+
# |              1.1           |              2             |
# +----------------------------+----------------------------+
# |              1.09          |              3             |
# +----------------------------+----------------------------+
# |              1.04          |              4             |
# +----------------------------+----------------------------+
# |              1.02          |              5             |
# +----------------------------+----------------------------+
# |              1.01          |              6             |
# +----------------------------+----------------------------+
# |              1.005         |              7             |
# +----------------------------+----------------------------+
# |              1.002         |              8             |
# +----------------------------+----------------------------+
#
# 0 results in the smallest supported factor (which is currently ~1.0027 or
# schema 8, but might change in the future).
[ native_histogram_min_bucket_factor: <float> | default = 0 ]

# Specifies whether to convert classic histograms into native histograms with
# custom buckets (has no effect without --enable-feature=native-histograms).
[ convert_classic_histograms_to_nhcb: <bool> | default =
<global.convert_classic_histograms_to_nhcb>]

其中 <job_name> 在所有抓取配置中必须是唯一的。

<http_config>

http_config 允许配置 HTTP 请求。

# Sets the `Authorization` header on every request with the
# configured username and password.
# username and username_file are mutually exclusive.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ username_file: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

# Sets the `Authorization` header on every request with
# the configured credentials.
authorization:
  # Sets the authentication type of the request.
  [ type: <string> | default: Bearer ]
  # Sets the credentials of the request. It is mutually exclusive with
  # `credentials_file`.
  [ credentials: <secret> ]
  # Sets the credentials of the request with the credentials read from the
  # configured file. It is mutually exclusive with `credentials`.
  [ credentials_file: <filename> ]

# Optional OAuth 2.0 configuration.
# Cannot be used at the same time as basic_auth or authorization.
oauth2:
  [ <oauth2> ]

# Configure whether requests follow HTTP 3xx redirects.
[ follow_redirects: <boolean> | default = true ]

# Whether to enable HTTP2.
[ enable_http2: <boolean> | default: true ]

# Configures the request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]
# Comma-separated string that can contain IPs, CIDR notation, domain names
# that should be excluded from proxying. IP and domain names can
# contain port numbers.
[ no_proxy: <string> ]
# Use proxy URL indicated by environment variables (HTTP_PROXY, https_proxy, HTTPs_PROXY, https_proxy, and no_proxy)
[ proxy_from_environment: <boolean> | default: false ]
# Specifies headers to send to proxies during CONNECT requests.
[ proxy_connect_header:
  [ <string>: [<secret>, ...] ] ]

# Custom HTTP headers to be sent along with each request.
# Headers that are set by Prometheus itself can't be overwritten.
http_headers:
  # Header name.
  [ <string>:
    # Header values.
    [ values: [<string>, ...] ]
    # Headers values. Hidden in configuration page.
    [ secrets: [<secret>, ...] ]
    # Files to read header values from.
    [ files: [<string>, ...] ] ]

<tls_config>

tls_config 允许配置 TLS 连接。

# CA certificate to validate API server certificate with. At most one of ca and ca_file is allowed.
[ ca: <string> ]
[ ca_file: <filename> ]

# Certificate and key for client cert authentication to the server.
# At most one of cert and cert_file is allowed.
# At most one of key and key_file is allowed.
[ cert: <string> ]
[ cert_file: <filename> ]
[ key: <secret> ]
[ key_file: <filename> ]

# ServerName extension to indicate the name of the server.
# https://tools.ietf.org/html/rfc4366#section-3.1
[ server_name: <string> ]

# Disable validation of the server certificate.
[ insecure_skip_verify: <boolean> ]

# Minimum acceptable TLS version. Accepted values: TLS10 (TLS 1.0), TLS11 (TLS
# 1.1), TLS12 (TLS 1.2), TLS13 (TLS 1.3).
# If unset, Prometheus will use Go default minimum version, which is TLS 1.2.
# See MinVersion in https://pkg.go.dev/crypto/tls#Config.
[ min_version: <string> ]
# Maximum acceptable TLS version. Accepted values: TLS10 (TLS 1.0), TLS11 (TLS
# 1.1), TLS12 (TLS 1.2), TLS13 (TLS 1.3).
# If unset, Prometheus will use Go default maximum version, which is TLS 1.3.
# See MaxVersion in https://pkg.go.dev/crypto/tls#Config.
[ max_version: <string> ]

<oauth2>

使用客户端凭据或密码授权类型的 OAuth 2.0 认证。Prometheus 从指定端点使用给定的客户端访问和密钥获取访问令牌。

client_id: <string>
[ client_secret: <secret> ]

# Read the client secret from a file.
# It is mutually exclusive with `client_secret`.
[ client_secret_file: <filename> ]

# Scopes for the token request.
scopes:
  [ - <string> ... ]

# The URL to fetch the token from.
token_url: <string>

# Optional parameters to append to the token URL.
# To set 'password' grant type, add it to params:
# endpoint_params:
#   grant_type: 'password'
#   username: '[email protected]'
#   password: 'strongpassword'
endpoint_params:
  [ <string>: <string> ... ]

# Configures the token request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]
# Comma-separated string that can contain IPs, CIDR notation, domain names
# that should be excluded from proxying. IP and domain names can
# contain port numbers.
[ no_proxy: <string> ]
# Use proxy URL indicated by environment variables (HTTP_PROXY, https_proxy, HTTPs_PROXY, https_proxy, and no_proxy)
[ proxy_from_environment: <boolean> | default: false ]
# Specifies headers to send to proxies during CONNECT requests.
[ proxy_connect_header:
  [ <string>: [<secret>, ...] ] ]

# Custom HTTP headers to be sent along with each request.
# Headers that are set by Prometheus itself can't be overwritten.
http_headers:
  # Header name.
  [ <string>:
    # Header values.
    [ values: [<string>, ...] ]
    # Headers values. Hidden in configuration page.
    [ secrets: [<secret>, ...] ]
    # Files to read header values from.
    [ files: [<string>, ...] ] ]

<azure_sd_config>

Azure SD 配置允许从 Azure VM 中检索抓取目标。

发现至少需要以下权限

  • Microsoft.Compute/virtualMachines/read: VM 发现所需
  • Microsoft.Network/networkInterfaces/read: VM 发现所需
  • Microsoft.Compute/virtualMachineScaleSets/virtualMachines/read: 规模集 (VMSS) 发现所需
  • Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/read: 规模集 (VMSS) 发现所需

重标记期间,以下元标签可用于目标

  • __meta_azure_machine_id: 机器 ID
  • __meta_azure_machine_location: 机器运行的位置
  • __meta_azure_machine_name: 机器名称
  • __meta_azure_machine_computer_name: 机器计算机名
  • __meta_azure_machine_os_type: 机器操作系统
  • __meta_azure_machine_private_ip: 机器私有 IP
  • __meta_azure_machine_public_ip: 如果存在,机器的公共 IP
  • __meta_azure_machine_resource_group: 机器的资源组
  • __meta_azure_machine_tag_<tagname>: 机器的每个标签值
  • __meta_azure_machine_scale_set: VM 所属的规模集名称(此值仅在使用规模集时设置)
  • __meta_azure_machine_size: 机器大小
  • __meta_azure_subscription_id: 订阅 ID
  • __meta_azure_tenant_id: 租户 ID

有关 Azure 发现的配置选项,请参见下方

# The information to access the Azure API.
# The Azure environment.
[ environment: <string> | default = AzurePublicCloud ]

# The authentication method, either OAuth, ManagedIdentity or SDK.
# See https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview
# SDK authentication method uses environment variables by default.
# See https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication
[ authentication_method: <string> | default = OAuth]
# The subscription ID. Always required.
subscription_id: <string>
# Optional tenant ID. Only required with authentication_method OAuth.
[ tenant_id: <string> ]
# Optional client ID. Only required with authentication_method OAuth.
[ client_id: <string> ]
# Optional client secret. Only required with authentication_method OAuth.
[ client_secret: <secret> ]

# Optional resource group name. Limits discovery to this resource group.
[ resource_group: <string> ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 300s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<consul_sd_config>

Consul SD 配置允许从 Consul 的 Catalog API 中检索抓取目标。

重标记期间,以下元标签可用于目标

  • __meta_consul_address: 目标的地址
  • __meta_consul_dc: 目标的中心名称
  • __meta_consul_health: 服务的健康状态
  • __meta_consul_partition: 服务注册的管理分区名称
  • __meta_consul_metadata_<key>: 目标的每个节点元数据键值
  • __meta_consul_node: 为目标定义的节点名称
  • __meta_consul_service_address: 目标的地址
  • __meta_consul_service_id: 目标的 ID
  • __meta_consul_service_metadata_<key>: 目标的每个服务元数据键值
  • __meta_consul_service_port: 目标的端口
  • __meta_consul_service: 目标所属的服务名称
  • __meta_consul_tagged_address_<key>: 目标的每个节点标记地址键值
  • __meta_consul_tags: 目标的标签列表,由标签分隔符连接
# The information to access the Consul API. It is to be defined
# as the Consul documentation requires.
[ server: <host> | default = "localhost:8500" ]
# Prefix for URIs for when consul is behind an API gateway (reverse proxy).
[ path_prefix: <string> ]
[ token: <secret> ]
[ datacenter: <string> ]
# Namespaces are only supported in Consul Enterprise.
[ namespace: <string> ]
# Admin Partitions are only supported in Consul Enterprise.
[ partition: <string> ]
[ scheme: <string> | default = "http" ]
# The username and password fields are deprecated in favor of the basic_auth configuration.
[ username: <string> ]
[ password: <secret> ]

# A list of services for which targets are retrieved. If omitted, all services
# are scraped.
services:
  [ - <string> ]

# A Consul Filter expression used to filter the catalog results
# See https://www.consul.io/api-docs/catalog#list-services to know more
# about the filter expressions that can be used.
[ filter: <string> ]

# The `tags` and `node_meta` fields are deprecated in Consul in favor of `filter`.
# An optional list of tags used to filter nodes for a given service. Services must contain all tags in the list.
tags:
  [ - <string> ]

# Node metadata key/value pairs to filter nodes for a given service. As of Consul 1.14, consider `filter` instead.
[ node_meta:
  [ <string>: <string> ... ] ]

# The string by which Consul tags are joined into the tag label.
[ tag_separator: <string> | default = , ]

# Allow stale Consul results (see https://www.consul.io/api/features/consistency.html). Will reduce load on Consul.
[ allow_stale: <boolean> | default = true ]

# The time after which the provided names are refreshed.
# On large setup it might be a good idea to increase this value because the catalog will change all the time.
[ refresh_interval: <duration> | default = 30s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

请注意,用于抓取目标的 IP 地址和端口是组合而成的 <__meta_consul_address>:<__meta_consul_service_port>。但是,在某些 Consul 设置中,相关地址在 __meta_consul_service_address 中。在这些情况下,您可以使用 重标记功能来替换特殊的 __address__ 标签。

重标记阶段是基于任意标签过滤服务或节点的首选且更强大的方法。对于拥有数千服务的用户来说,直接使用 Consul API 可能更有效,它提供了对节点的基本过滤支持(目前按节点元数据和单个标签过滤)。

<digitalocean_sd_config>

DigitalOcean SD 配置允许从 DigitalOcean 的 Droplets API 中检索抓取目标。此服务发现默认使用公共 IPv4 地址,但这可以通过重标记更改,如Prometheus digitalocean-sd 配置文件中所示。

重标记期间,以下元标签可用于目标

  • __meta_digitalocean_droplet_id: droplet ID
  • __meta_digitalocean_droplet_name: droplet 名称
  • __meta_digitalocean_image: droplet 镜像的 slug
  • __meta_digitalocean_image_name: droplet 镜像的显示名称
  • __meta_digitalocean_private_ipv4: droplet 的私有 IPv4
  • __meta_digitalocean_public_ipv4: droplet 的公共 IPv4
  • __meta_digitalocean_public_ipv6: droplet 的公共 IPv6
  • __meta_digitalocean_region: droplet 的区域
  • __meta_digitalocean_size: droplet 的大小
  • __meta_digitalocean_status: droplet 的状态
  • __meta_digitalocean_features: droplet 功能的逗号分隔列表
  • __meta_digitalocean_tags: droplet 标签的逗号分隔列表
  • __meta_digitalocean_vpc: droplet 的 VPC ID
# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the droplets are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<docker_sd_config>

Docker SD 配置允许从 Docker Engine 主机中检索抓取目标。

此 SD 发现“容器”,并为容器配置的每个网络 IP 和端口创建一个目标。

可用的元标签

  • __meta_docker_container_id: 容器 ID
  • __meta_docker_container_name: 容器名称
  • __meta_docker_container_network_mode: 容器的网络模式
  • __meta_docker_container_label_<labelname>: 容器的每个标签,不支持的字符转换为下划线
  • __meta_docker_network_id: 网络 ID
  • __meta_docker_network_name: 网络名称
  • __meta_docker_network_ingress: 网络是否为 ingress
  • __meta_docker_network_internal: 网络是否为 internal
  • __meta_docker_network_label_<labelname>: 网络的每个标签,不支持的字符转换为下划线
  • __meta_docker_network_scope: 网络范围
  • __meta_docker_network_ip: 容器在此网络中的 IP
  • __meta_docker_port_private: 容器上的端口
  • __meta_docker_port_public: 如果存在端口映射,则为外部端口
  • __meta_docker_port_public_ip: 如果存在端口映射,则为公共 IP

有关 Docker 发现的配置选项,请参见下方

# Address of the Docker daemon.
host: <string>

# The port to scrape metrics from, when `role` is nodes, and for discovered
# tasks and services that don't have published ports.
[ port: <int> | default = 80 ]

# The host to use if the container is in host networking mode.
[ host_networking_host: <string> | default = "localhost" ]

# Sort all non-nil networks in ascending order based on network name and
# get the first network if the container has multiple networks defined,
# thus avoiding collecting duplicate targets.
[ match_first_network: <boolean> | default = true ]

# Optional filters to limit the discovery process to a subset of available
# resources.
# The available filters are listed in the upstream documentation:
# https://docs.docker.net.cn/engine/api/v1.40/#operation/ContainerList
[ filters:
  [ - name: <string>
      values: <string>, [...] ]

# The time after which the containers are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

重标记阶段是过滤容器的首选且更强大的方法。对于拥有数千容器的用户来说,直接使用 Docker API 可能更有效,它提供了对容器的基本过滤支持(使用 filters)。

有关为 Docker Engine 配置 Prometheus 的详细示例,请参见此示例 Prometheus 配置文件

<dockerswarm_sd_config>

Docker Swarm SD 配置允许从 Docker Swarm engine 中检索抓取目标。

可以配置以下角色之一来发现目标

services

services 角色发现所有Swarm 服务并将其端口暴露为目标。对于服务的每个已发布端口,生成一个目标。如果服务没有已发布端口,则使用 SD 配置中定义的 port 参数为每个服务创建一个目标。

可用的元标签

  • __meta_dockerswarm_service_id: 服务 ID
  • __meta_dockerswarm_service_name: 服务名称
  • __meta_dockerswarm_service_mode: 服务模式
  • __meta_dockerswarm_service_endpoint_port_name: 端点端口名称,如果可用
  • __meta_dockerswarm_service_endpoint_port_publish_mode: 端点端口的发布模式
  • __meta_dockerswarm_service_label_<labelname>: 服务的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_service_task_container_hostname: 目标的容器主机名,如果可用
  • __meta_dockerswarm_service_task_container_image: 目标的容器镜像
  • __meta_dockerswarm_service_updating_status: 服务的状态,如果可用
  • __meta_dockerswarm_network_id: 网络 ID
  • __meta_dockerswarm_network_name: 网络名称
  • __meta_dockerswarm_network_ingress: 网络是否为 ingress
  • __meta_dockerswarm_network_internal: 网络是否为 internal
  • __meta_dockerswarm_network_label_<labelname>: 网络的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_network_scope: 网络范围

tasks

tasks 角色发现所有Swarm 任务并将其端口暴露为目标。对于任务的每个已发布端口,生成一个目标。如果任务没有已发布端口,则使用 SD 配置中定义的 port 参数为每个任务创建一个目标。

可用的元标签

  • __meta_dockerswarm_container_label_<labelname>: 容器的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_task_id: 任务 ID
  • __meta_dockerswarm_task_container_id: 任务的容器 ID
  • __meta_dockerswarm_task_desired_state: 任务期望状态
  • __meta_dockerswarm_task_slot: 任务槽位
  • __meta_dockerswarm_task_state: 任务状态
  • __meta_dockerswarm_task_port_publish_mode: 任务端口的发布模式
  • __meta_dockerswarm_service_id: 服务 ID
  • __meta_dockerswarm_service_name: 服务名称
  • __meta_dockerswarm_service_mode: 服务模式
  • __meta_dockerswarm_service_label_<labelname>: 服务的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_network_id: 网络 ID
  • __meta_dockerswarm_network_name: 网络名称
  • __meta_dockerswarm_network_ingress: 网络是否为 ingress
  • __meta_dockerswarm_network_internal: 网络是否为 internal
  • __meta_dockerswarm_network_label_<labelname>: 网络的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_network_label: 网络的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_network_scope: 网络范围
  • __meta_dockerswarm_node_id: 节点 ID
  • __meta_dockerswarm_node_hostname: 节点主机名
  • __meta_dockerswarm_node_address: 节点地址
  • __meta_dockerswarm_node_availability: 节点可用性
  • __meta_dockerswarm_node_label_<labelname>: 节点的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_node_platform_architecture: 节点架构
  • __meta_dockerswarm_node_platform_os: 节点操作系统
  • __meta_dockerswarm_node_role: 节点角色
  • __meta_dockerswarm_node_status: 节点状态

对于使用 mode=host 发布的端口,不会填充 __meta_dockerswarm_network_* 元标签。

nodes

nodes 角色用于发现Swarm 节点

可用的元标签

  • __meta_dockerswarm_node_address: 节点地址
  • __meta_dockerswarm_node_availability: 节点可用性
  • __meta_dockerswarm_node_engine_version: 节点 engine 版本
  • __meta_dockerswarm_node_hostname: 节点主机名
  • __meta_dockerswarm_node_id: 节点 ID
  • __meta_dockerswarm_node_label_<labelname>: 节点的每个标签,不支持的字符转换为下划线
  • __meta_dockerswarm_node_manager_address: 节点的管理器组件地址
  • __meta_dockerswarm_node_manager_leader: 节点的管理器组件的领导状态(true 或 false)
  • __meta_dockerswarm_node_manager_reachability: 节点的管理器组件的可达性
  • __meta_dockerswarm_node_platform_architecture: 节点架构
  • __meta_dockerswarm_node_platform_os: 节点操作系统
  • __meta_dockerswarm_node_role: 节点角色
  • __meta_dockerswarm_node_status: 节点状态

有关 Docker Swarm 发现的配置选项,请参见下方

# Address of the Docker daemon.
host: <string>

# Role of the targets to retrieve. Must be `services`, `tasks`, or `nodes`.
role: <string>

# The port to scrape metrics from, when `role` is nodes, and for discovered
# tasks and services that don't have published ports.
[ port: <int> | default = 80 ]

# Optional filters to limit the discovery process to a subset of available
# resources.
# The available filters are listed in the upstream documentation:
# Services: https://docs.docker.net.cn/engine/api/v1.40/#operation/ServiceList
# Tasks: https://docs.docker.net.cn/engine/api/v1.40/#operation/TaskList
# Nodes: https://docs.docker.net.cn/engine/api/v1.40/#operation/NodeList
[ filters:
  [ - name: <string>
      values: <string>, [...] ]

# The time after which the service discovery data is refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

重标记阶段是过滤任务、服务或节点的首选且更强大的方法。对于拥有数千任务的用户来说,直接使用 Swarm API 可能更有效,它提供了对节点的基本过滤支持(使用 filters)。

有关为 Docker Swarm 配置 Prometheus 的详细示例,请参见此示例 Prometheus 配置文件

<dns_sd_config>

基于 DNS 的服务发现配置允许指定一组 DNS 域名,这些域名会定期被查询以发现目标列表。要联系的 DNS 服务器从 /etc/resolv.conf 读取。

此服务发现方法仅支持基本的 DNS A、AAAA、MX、NS 和 SRV 记录查询,不支持 RFC6763 中指定的高级 DNS-SD 方法。

重标记期间,以下元标签可用于目标

  • __meta_dns_name: 生成发现目标的记录名称。
  • __meta_dns_srv_record_target: SRV 记录的目标字段
  • __meta_dns_srv_record_port: SRV 记录的端口字段
  • __meta_dns_mx_record_target: MX 记录的目标字段
  • __meta_dns_ns_record_target: NS 记录的目标字段
# A list of DNS domain names to be queried.
names:
  [ - <string> ]

# The type of DNS query to perform. One of SRV, A, AAAA, MX or NS.
[ type: <string> | default = 'SRV' ]

# The port number used if the query type is not SRV.
[ port: <int>]

# The time after which the provided names are refreshed.
[ refresh_interval: <duration> | default = 30s ]

<ec2_sd_config>

EC2 SD 配置允许从 AWS EC2 实例中检索抓取目标。默认使用私有 IP 地址,但可以通过重标记更改为公共 IP 地址。

使用的 IAM 凭据必须具有 ec2:DescribeInstances 权限才能发现抓取目标,如果希望将可用区 ID 作为标签使用,也可以选择具有 ec2:DescribeAvailabilityZones 权限(参见下方)。

重标记期间,以下元标签可用于目标

  • __meta_ec2_ami: EC2 Amazon Machine Image
  • __meta_ec2_architecture: 实例架构
  • __meta_ec2_availability_zone: 实例运行的可用区
  • __meta_ec2_availability_zone_id: 实例运行的可用区 ID(需要 ec2:DescribeAvailabilityZones
  • __meta_ec2_instance_id: EC2 实例 ID
  • __meta_ec2_instance_lifecycle: EC2 实例生命周期,仅为“spot”或“scheduled”实例设置,否则为空
  • __meta_ec2_instance_state: EC2 实例状态
  • __meta_ec2_instance_type: EC2 实例类型
  • __meta_ec2_ipv6_addresses: 分配给实例网络接口的 IPv6 地址的逗号分隔列表,如果存在
  • __meta_ec2_owner_id: 拥有 EC2 实例的 AWS 账户 ID
  • __meta_ec2_platform: 操作系统平台,Windows 服务器上设置为“windows”,否则为空
  • __meta_ec2_primary_ipv6_addresses: 实例主要 IPv6 地址的逗号分隔列表,如果存在。列表按相应网络接口在附加顺序中的位置排序。
  • __meta_ec2_primary_subnet_id: 主要网络接口的子网 ID,如果可用
  • __meta_ec2_private_dns_name: 实例的私有 DNS 名称,如果可用
  • __meta_ec2_private_ip: 实例的私有 IP 地址,如果存在
  • __meta_ec2_public_dns_name: 实例的公共 DNS 名称,如果可用
  • __meta_ec2_public_ip: 实例的公共 IP 地址,如果可用
  • __meta_ec2_region: 实例所在区域
  • __meta_ec2_subnet_id: 实例运行所在子网 ID 的逗号分隔列表,如果可用
  • __meta_ec2_tag_<tagkey>: 实例的每个标签值
  • __meta_ec2_vpc_id: 实例运行所在 VPC 的 ID,如果可用

有关 EC2 发现的配置选项,请参见下方

# The information to access the EC2 API.

# The AWS region. If blank, the region from the instance metadata is used.
[ region: <string> ]

# Custom endpoint to be used.
[ endpoint: <string> ]

# The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
# and `AWS_SECRET_ACCESS_KEY` are used.
[ access_key: <string> ]
[ secret_key: <secret> ]
# Named AWS profile used to connect to the API.
[ profile: <string> ]

# AWS Role ARN, an alternative to using AWS API keys.
[ role_arn: <string> ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# Filters can be used optionally to filter the instance list by other criteria.
# Available filter criteria can be found here:
# https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstances.html
# Filter API documentation: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_Filter.html
filters:
  [ - name: <string>
      values: <string>, [...] ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

重标记阶段是基于任意标签过滤目标的首选且更强大的方法。对于拥有数千实例的用户来说,直接使用 EC2 API 可能更有效,它支持过滤实例。

<openstack_sd_config>

OpenStack SD 配置允许从 OpenStack Nova 实例中检索抓取目标。

可以配置以下 <openstack_role> 类型之一来发现目标

hypervisor

hypervisor 角色为每个 Nova hypervisor 节点发现一个目标。目标地址默认为 hypervisor 的 host_ip 属性。

重标记期间,以下元标签可用于目标

  • __meta_openstack_hypervisor_host_ip: hypervisor 节点的 IP 地址。
  • __meta_openstack_hypervisor_hostname: hypervisor 节点名称。
  • __meta_openstack_hypervisor_id: hypervisor 节点 ID。
  • __meta_openstack_hypervisor_state: hypervisor 节点状态。
  • __meta_openstack_hypervisor_status: hypervisor 节点状态。
  • __meta_openstack_hypervisor_type: hypervisor 节点类型。

instance

instance 角色为每个 Nova 实例的网络接口发现一个目标。目标地址默认为网络接口的私有 IP 地址。

重标记期间,以下元标签可用于目标

  • __meta_openstack_address_pool: 私有 IP 池。
  • __meta_openstack_instance_flavor: OpenStack 实例的 flavor 名称,如果 flavor 名称不可用,则为 flavor ID。
  • __meta_openstack_instance_id: OpenStack 实例 ID。
  • __meta_openstack_instance_image: OpenStack 实例使用的镜像 ID。
  • __meta_openstack_instance_name: OpenStack 实例名称。
  • __meta_openstack_instance_status: OpenStack 实例状态。
  • __meta_openstack_private_ip: OpenStack 实例的私有 IP。
  • __meta_openstack_project_id: 拥有此实例的项目(租户)。
  • __meta_openstack_public_ip: OpenStack 实例的公共 IP。
  • __meta_openstack_tag_<key>: 实例的每个元数据项,不支持的字符转换为下划线。
  • __meta_openstack_user_id: 拥有租户的用户账户。

loadbalancer

loadbalancer 角色为每个带有 PROMETHEUS 监听器的 Octavia 负载均衡器发现一个目标。目标地址默认为负载均衡器的 VIP 地址。

重标记期间,以下元标签可用于目标

  • __meta_openstack_loadbalancer_availability_zone: OpenStack 负载均衡器的可用区。
  • __meta_openstack_loadbalancer_floating_ip: OpenStack 负载均衡器的浮动 IP。
  • __meta_openstack_loadbalancer_id: OpenStack 负载均衡器 ID。
  • __meta_openstack_loadbalancer_name: OpenStack 负载均衡器名称。
  • __meta_openstack_loadbalancer_provider: OpenStack 负载均衡器的 Octavia 提供者。
  • __meta_openstack_loadbalancer_operating_status: OpenStack 负载均衡器的操作状态。
  • __meta_openstack_loadbalancer_provisioning_status: OpenStack 负载均衡器的供应状态。
  • __meta_openstack_loadbalancer_tags: OpenStack 负载均衡器的逗号分隔列表。
  • __meta_openstack_loadbalancer_vip: OpenStack 负载均衡器的 VIP。
  • __meta_openstack_project_id: 拥有此负载均衡器的项目(租户)。

有关 OpenStack 发现的配置选项,请参见下方

# The information to access the OpenStack API.

# The OpenStack role of entities that should be discovered.
role: <openstack_role>

# The OpenStack Region.
region: <string>

# identity_endpoint specifies the HTTP endpoint that is required to work with
# the Identity API of the appropriate version. While it's ultimately needed by
# all of the identity services, it will often be populated by a provider-level
# function.
[ identity_endpoint: <string> ]

# username is required if using Identity V2 API. Consult with your provider's
# control panel to discover your account's username. In Identity V3, either
# userid or a combination of username and domain_id or domain_name are needed.
[ username: <string> ]
[ userid: <string> ]

# password for the Identity V2 and V3 APIs. Consult with your provider's
# control panel to discover your account's preferred method of authentication.
[ password: <secret> ]

# At most one of domain_id and domain_name must be provided if using username
# with Identity V3. Otherwise, either are optional.
[ domain_name: <string> ]
[ domain_id: <string> ]

# The project_id and project_name fields are optional for the Identity V2 API.
# Some providers allow you to specify a project_name instead of the project_id.
# Some require both. Your provider's authentication policies will determine
# how these fields influence authentication.
[ project_name: <string> ]
[ project_id: <string> ]

# The application_credential_id or application_credential_name fields are
# required if using an application credential to authenticate. Some providers
# allow you to create an application credential to authenticate rather than a
# password.
[ application_credential_name: <string> ]
[ application_credential_id: <string> ]

# The application_credential_secret field is required if using an application
# credential to authenticate.
[ application_credential_secret: <secret> ]

# Whether the service discovery should list all instances for all projects.
# It is only relevant for the 'instance' role and usually requires admin permissions.
[ all_tenants: <boolean> | default: false ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# The availability of the endpoint to connect to. Must be one of public, admin or internal.
[ availability: <string> | default = "public" ]

# TLS configuration.
tls_config:
  [ <tls_config> ]

<ovhcloud_sd_config>

OVHcloud SD 配置允许从 OVHcloud 的专用服务器VPS 中使用其API检索抓取目标。Prometheus 会定期检查 REST 端点并为每个发现的服务器创建一个目标。该角色将尝试使用公共 IPv4 地址作为默认地址,如果不存在,它将尝试使用 IPv6 地址。这可以通过重标记更改。对于 OVHcloud 的公共云实例,您可以使用openstacksdconfig

VPS

  • __meta_ovhcloud_vps_cluster: 服务器的集群
  • __meta_ovhcloud_vps_datacenter: 服务器的数据中心
  • __meta_ovhcloud_vps_disk: 服务器的磁盘
  • __meta_ovhcloud_vps_display_name: 服务器的显示名称
  • __meta_ovhcloud_vps_ipv4: 服务器的 IPv4
  • __meta_ovhcloud_vps_ipv6: 服务器的 IPv6
  • __meta_ovhcloud_vps_keymap: 服务器的 KVM 键盘布局
  • __meta_ovhcloud_vps_maximum_additional_ip: 服务器的最大额外 IP
  • __meta_ovhcloud_vps_memory_limit: 服务器的内存限制
  • __meta_ovhcloud_vps_memory: 服务器的内存
  • __meta_ovhcloud_vps_monitoring_ip_blocks: 服务器的监控 IP 块
  • __meta_ovhcloud_vps_name: 服务器名称
  • __meta_ovhcloud_vps_netboot_mode: 服务器的网络启动模式
  • __meta_ovhcloud_vps_offer_type: 服务器的服务类型
  • __meta_ovhcloud_vps_offer: 服务器的服务
  • __meta_ovhcloud_vps_state: 服务器状态
  • __meta_ovhcloud_vps_vcore: 服务器的虚拟核心数量
  • __meta_ovhcloud_vps_version: 服务器版本
  • __meta_ovhcloud_vps_zone: 服务器区域

专用服务器

  • __meta_ovhcloud_dedicated_server_commercial_range: 服务器商业范围
  • __meta_ovhcloud_dedicated_server_datacenter: 服务器数据中心
  • __meta_ovhcloud_dedicated_server_ipv4: 服务器 IPv4
  • __meta_ovhcloud_dedicated_server_ipv6: 服务器 IPv6
  • __meta_ovhcloud_dedicated_server_link_speed: 服务器链路速度
  • __meta_ovhcloud_dedicated_server_name: 服务器名称
  • __meta_ovhcloud_dedicated_server_no_intervention: 服务器是否禁用数据中心干预
  • __meta_ovhcloud_dedicated_server_os: 服务器操作系统
  • __meta_ovhcloud_dedicated_server_rack: 服务器机架
  • __meta_ovhcloud_dedicated_server_reverse: 服务器反向 DNS 名称
  • __meta_ovhcloud_dedicated_server_server_id: 服务器 ID
  • __meta_ovhcloud_dedicated_server_state: 服务器状态
  • __meta_ovhcloud_dedicated_server_support_level: 服务器支持级别

有关 OVHcloud 发现的配置选项,请参见下方

# Access key to use. https://api.ovh.com
application_key: <string>
application_secret: <secret>
consumer_key: <secret>
# Service of the targets to retrieve. Must be `vps` or `dedicated_server`.
service: <string>
# API endpoint. https://github.com/ovh/go-ovh#supported-apis
[ endpoint: <string> | default = "ovh-eu" ]
# Refresh interval to re-read the resources list.
[ refresh_interval: <duration> | default = 60s ]

<puppetdb_sd_config>

PuppetDB SD 配置允许从 PuppetDB 资源中检索抓取目标。

此 SD 发现资源,并为 API 返回的每个资源创建一个目标。

资源地址是资源的 certname,可以在重标记期间更改。

重标记期间,以下元标签可用于目标

  • __meta_puppetdb_query: Puppet 查询语言 (PQL) 查询
  • __meta_puppetdb_certname: 与资源关联的节点名称
  • __meta_puppetdb_resource: 资源的类型、标题和参数的 SHA-1 哈希,用于标识
  • __meta_puppetdb_type: 资源类型
  • __meta_puppetdb_title: 资源标题
  • __meta_puppetdb_exported: 资源是否已导出("true""false"
  • __meta_puppetdb_tags: 资源标签的逗号分隔列表
  • __meta_puppetdb_file: 声明资源的清单文件
  • __meta_puppetdb_environment: 与资源关联的节点环境
  • __meta_puppetdb_parameter_<parametername>: 资源的参数

有关 PuppetDB 发现的配置选项,请参见下方

# The URL of the PuppetDB root query endpoint.
url: <string>

# Puppet Query Language (PQL) query. Only resources are supported.
# https://puppet.com/docs/puppetdb/latest/api/query/v4/pql.html
query: <string>

# Whether to include the parameters as meta labels.
# Due to the differences between parameter types and Prometheus labels,
# some parameters might not be rendered. The format of the parameters might
# also change in future releases.
#
# Note: Enabling this exposes parameters in the Prometheus UI and API. Make sure
# that you don't have secrets exposed as parameters if you enable this.
[ include_parameters: <boolean> | default = false ]

# Refresh interval to re-read the resources list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

有关使用 PuppetDB 配置 Prometheus 的详细示例,请参见此示例 Prometheus 配置文件

<file_sd_config>

基于文件的服务发现提供了一种更通用的方式来配置静态目标,并作为插入自定义服务发现机制的接口。

它读取一组包含零个或多个 <static_config> 的文件列表。通过磁盘监视检测到所有定义文件的更改并立即应用。

虽然这些单个文件被监视更改,但父目录也被隐式监视。这是为了有效地处理原子重命名并检测与配置的 glob 匹配的新文件。如果父目录包含大量其他文件,这可能会导致问题,因为每个文件也会被监视,即使与它们相关的事件不相关。

文件可以是 YAML 或 JSON 格式。只有导致格式良好的目标组的更改才会被应用。

文件必须包含静态配置列表,使用以下格式

JSON

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

YAML

- targets:
  [ - '<host>' ]
  labels:
    [ <labelname>: <labelvalue> ... ]

作为备用,文件内容也会定期按指定的刷新间隔重新读取。

每个目标在重标记阶段都有一个元标签 __meta_filepath。其值设置为提取目标的文件路径。

有与此发现机制相关的集成列表。

# Patterns for files from which target groups are extracted.
files:
  [ - <filename_pattern> ... ]

# Refresh interval to re-read the files.
[ refresh_interval: <duration> | default = 5m ]

其中 <filename_pattern> 可以是以 .json, .yml.yaml 结尾的路径。最后一个路径段可以包含一个匹配任何字符序列的单个 *,例如 my/path/tg_*.json

<gce_sd_config>

GCE SD 配置允许从 GCP GCE 实例中检索抓取目标。默认使用私有 IP 地址,但可以通过重标记更改为公共 IP 地址。

重标记期间,以下元标签可用于目标

  • __meta_gce_instance_id: 实例的数字 ID
  • __meta_gce_instance_name: 实例名称
  • __meta_gce_label_<labelname>: 实例的每个 GCE 标签,不支持的字符转换为下划线
  • __meta_gce_machine_type: 实例机器类型的完整或部分 URL
  • __meta_gce_metadata_<name>: 实例的每个元数据项
  • __meta_gce_network: 实例的网络 URL
  • __meta_gce_private_ip: 实例的私有 IP 地址
  • __meta_gce_interface_ipv4_<name>: 每个命名接口的 IPv4 地址
  • __meta_gce_project: 实例运行所在的 GCP 项目
  • __meta_gce_public_ip: 如果存在,实例的公共 IP 地址
  • __meta_gce_subnetwork: 实例的子网络 URL
  • __meta_gce_tags: 实例标签的逗号分隔列表
  • __meta_gce_zone: 实例运行所在的 GCE 区域 URL

有关 GCE 发现的配置选项,请参见下方

# The information to access the GCE API.

# The GCP Project
project: <string>

# The zone of the scrape targets. If you need multiple zones use multiple
# gce_sd_configs.
zone: <string>

# Filter can be used optionally to filter the instance list by other criteria
# Syntax of this filter string is described here in the filter query parameter section:
# https://cloud.google.com/compute/docs/reference/latest/instances/list
[ filter: <string> ]

# Refresh interval to re-read the instance list
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# The tag separator is used to separate the tags on concatenation
[ tag_separator: <string> | default = , ]

凭据由 Google Cloud SDK 默认客户端通过查看以下位置进行发现,优先使用找到的第一个位置

  1. GOOGLE_APPLICATION_CREDENTIALS 环境变量指定的 JSON 文件
  2. 已知路径 $HOME/.config/gcloud/application_default_credentials.json 中的 JSON 文件
  3. 从 GCE 元数据服务器获取

如果 Prometheus 在 GCE 中运行,则与其运行实例关联的服务帐户应至少对计算资源具有只读权限。如果在 GCE 外部运行,请确保创建适当的服务帐户并将凭据文件放置在预期位置之一。

<hetzner_sd_config>

Hetzner SD 配置允许从 Hetzner Cloud API 和 Robot API 中检索抓取目标。此服务发现默认使用公共 IPv4 地址,但这可以通过重标记更改,如Prometheus hetzner-sd 配置文件中所示。

重标记期间,以下元标签可用于所有目标

  • __meta_hetzner_server_id: 服务器 ID
  • __meta_hetzner_server_name: 服务器名称
  • __meta_hetzner_server_status: 服务器状态
  • __meta_hetzner_public_ipv4: 服务器的公共 IPv4 地址
  • __meta_hetzner_public_ipv6_network: 服务器的公共 IPv6 网络 (/64)
  • __meta_hetzner_datacenter: 服务器数据中心

以下标签仅适用于 role 设置为 hcloud 的目标

  • __meta_hetzner_hcloud_image_name: 服务器镜像名称
  • __meta_hetzner_hcloud_image_description: 服务器镜像描述
  • __meta_hetzner_hcloud_image_os_flavor: 服务器镜像 OS 版本风味
  • __meta_hetzner_hcloud_image_os_version: 服务器镜像 OS 版本
  • __meta_hetzner_hcloud_datacenter_location: 服务器位置
  • __meta_hetzner_hcloud_datacenter_location_network_zone: 服务器网络区域
  • __meta_hetzner_hcloud_server_type: 服务器类型
  • __meta_hetzner_hcloud_cpu_cores: 服务器 CPU 核心数
  • __meta_hetzner_hcloud_cpu_type: 服务器 CPU 类型(共享或专用)
  • __meta_hetzner_hcloud_memory_size_gb: 服务器内存大小(GB)
  • __meta_hetzner_hcloud_disk_size_gb: 服务器磁盘大小(GB)
  • __meta_hetzner_hcloud_private_ipv4_<networkname>: 服务器在给定网络中的私有 IPv4 地址
  • __meta_hetzner_hcloud_label_<labelname>: 服务器的每个标签,不支持的字符转换为下划线
  • __meta_hetzner_hcloud_labelpresent_<labelname>: 对于服务器的每个标签为 true,不支持的字符转换为下划线

以下标签仅适用于 role 设置为 robot 的目标

  • __meta_hetzner_robot_product: 服务器产品
  • __meta_hetzner_robot_cancelled: 服务器取消状态
# The Hetzner role of entities that should be discovered.
# One of robot or hcloud.
role: <string>

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the servers are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# Label selector used to filter the servers when fetching them from the API. See https://docs.hetzner.cloud/#label-selector for more details.
# Only used when role is hcloud.
[ label_selector: <string> ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<http_sd_config>

基于 HTTP 的服务发现提供了一种更通用的方式来配置静态目标,并作为插入自定义服务发现机制的接口。

它从包含零个或多个 <static_config> 的列表的 HTTP 端点获取目标。目标必须回复 HTTP 200 响应。HTTP 头 Content-Type 必须是 application/json,并且主体必须是有效的 JSON。

示例响应主体

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

端点会定期按指定的刷新间隔进行查询。prometheus_sd_http_failures_total 计数器指标跟踪刷新失败次数。

每个目标在重标记阶段都有一个元标签 __meta_url。其值设置为提取目标的 URL。

# URL from which the targets are fetched.
url: <string>

# Refresh interval to re-query the endpoint.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<ionos_sd_config>

IONOS SD 配置允许从 IONOS Cloud API 中检索抓取目标。此服务发现默认使用第一个网卡 IP 地址,但这可以通过重标记更改。在重标记期间,以下元标签可用于所有目标

  • __meta_ionos_server_availability_zone: 服务器可用区
  • __meta_ionos_server_boot_cdrom_id: 服务器启动使用的 CD-ROM ID
  • __meta_ionos_server_boot_image_id: 服务器启动使用的启动镜像或快照 ID
  • __meta_ionos_server_boot_volume_id: 启动卷 ID
  • __meta_ionos_server_cpu_family: 服务器 CPU 系列
  • __meta_ionos_server_id: 服务器 ID
  • __meta_ionos_server_ip: 分配给服务器的所有 IP 的逗号分隔列表
  • __meta_ionos_server_lifecycle: 服务器资源生命周期状态
  • __meta_ionos_server_name: 服务器名称
  • __meta_ionos_server_nic_ip_<nic_name>: 按连接到服务器的每个网卡名称分组的 IP 的逗号分隔列表
  • __meta_ionos_server_servers_id: 服务器所属的服务器 ID
  • __meta_ionos_server_state: 服务器执行状态
  • __meta_ionos_server_type: 服务器类型
# The unique ID of the data center.
datacenter_id: <string>

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the servers are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<kubernetes_sd_config>

Kubernetes SD 配置允许从 Kubernetes 的 REST API 中检索抓取目标,并始终与集群状态保持同步。

可以配置以下 role 类型之一来发现目标

node

node 角色为每个集群节点发现一个目标,地址默认为 Kubelet 的 HTTP 端口。目标地址默认为 Kubernetes 节点对象中按照 NodeInternalIPNodeExternalIPNodeLegacyHostIPNodeHostName 的地址类型顺序存在的第一个地址。

可用的元标签

  • __meta_kubernetes_node_name: 节点对象名称。
  • __meta_kubernetes_node_provider_id: 云提供商为节点对象指定的名称。
  • __meta_kubernetes_node_label_<labelname>: 节点对象的每个标签,不支持的字符转换为下划线。
  • __meta_kubernetes_node_labelpresent_<labelname>: 对于节点对象的每个标签为 true,不支持的字符转换为下划线。
  • __meta_kubernetes_node_annotation_<annotationname>: 节点对象的每个注解。
  • __meta_kubernetes_node_annotationpresent_<annotationname>: 对于节点对象的每个注解为 true
  • __meta_kubernetes_node_address_<address_type>: 每个节点地址类型的第一个地址,如果存在。

此外,节点的 instance 标签将被设置为从 API 服务器检索到的节点名称。

service

service 角色为每个服务的每个服务端口发现一个目标。这通常对于服务的黑盒监控很有用。地址将被设置为服务的 Kubernetes DNS 名称和相应的服务端口。

可用的元标签

  • __meta_kubernetes_namespace: 服务对象命名空间。
  • __meta_kubernetes_service_annotation_<annotationname>: 服务对象的每个注解。
  • __meta_kubernetes_service_annotationpresent_<annotationname>: 对于服务对象的每个注解为 "true"。
  • __meta_kubernetes_service_cluster_ip: 服务的集群 IP 地址。(不适用于类型为 ExternalName 的服务)
  • __meta_kubernetes_service_loadbalancer_ip: 负载均衡器的 IP 地址。(适用于类型为 LoadBalancer 的服务)
  • __meta_kubernetes_service_external_name: 服务的 DNS 名称。(适用于类型为 ExternalName 的服务)
  • __meta_kubernetes_service_label_<labelname>: 服务对象的每个标签,不支持的字符转换为下划线。
  • __meta_kubernetes_service_labelpresent_<labelname>: 对于服务对象的每个标签为 true,不支持的字符转换为下划线。
  • __meta_kubernetes_service_name: 服务对象名称。
  • __meta_kubernetes_service_port_name: 目标的服务端口名称。
  • __meta_kubernetes_service_port_number: 目标的服务端口号。
  • __meta_kubernetes_service_port_protocol: 目标的服务端口协议。
  • __meta_kubernetes_service_type: 服务类型。

pod

pod 角色发现所有 pod 并将它们的容器暴露为目标。对于容器的每个声明端口,生成一个目标。如果容器没有指定端口,则为每个容器创建一个无端口目标,以便通过重标记手动添加端口。

可用的元标签

  • __meta_kubernetes_namespace: pod 对象命名空间。
  • __meta_kubernetes_pod_name: pod 对象名称。
  • __meta_kubernetes_pod_ip: pod 对象的 pod IP。
  • __meta_kubernetes_pod_label_<labelname>: pod 对象的每个标签,不支持的字符转换为下划线。
  • __meta_kubernetes_pod_labelpresent_<labelname>: 对于 pod 对象的每个标签为 true,不支持的字符转换为下划线。
  • __meta_kubernetes_pod_annotation_<annotationname>: pod 对象的每个注解。
  • __meta_kubernetes_pod_annotationpresent_<annotationname>: 对于 pod 对象的每个注解为 true
  • __meta_kubernetes_pod_container_init: 如果容器是InitContainer,则为 true
  • __meta_kubernetes_pod_container_name: 目标地址指向的容器名称。
  • __meta_kubernetes_pod_container_id: 目标地址指向的容器 ID。ID 格式为 <type>://<container_id>
  • __meta_kubernetes_pod_container_image: 容器使用的镜像。
  • __meta_kubernetes_pod_container_port_name: 容器端口名称。
  • __meta_kubernetes_pod_container_port_number: 容器端口号。
  • __meta_kubernetes_pod_container_port_protocol: 容器端口协议。
  • __meta_kubernetes_pod_ready: 设置为 truefalse 表示 pod 的就绪状态。
  • __meta_kubernetes_pod_phase: 在生命周期中设置为 PendingRunningSucceededFailedUnknown
  • __meta_kubernetes_pod_node_name: pod 调度到的节点名称。
  • __meta_kubernetes_pod_host_ip: pod 对象的当前主机 IP。
  • __meta_kubernetes_pod_uid: pod 对象 UID。
  • __meta_kubernetes_pod_controller_kind: pod 控制器对象类型。
  • __meta_kubernetes_pod_controller_name: pod 控制器名称。

endpoints

endpoints 角色从服务列出的端点中发现目标。对于每个端点地址,按端口发现一个目标。如果端点由 pod 支持,则 pod 的所有未绑定到端点端口的额外容器端口也会作为目标被发现。

可用的元标签

  • __meta_kubernetes_namespace: endpoints 对象命名空间。
  • __meta_kubernetes_endpoints_name: endpoints 对象名称。
  • __meta_kubernetes_endpoints_label_<labelname>: endpoints 对象的每个标签,不支持的字符转换为下划线。
  • __meta_kubernetes_endpoints_labelpresent_<labelname>: 对于 endpoints 对象的每个标签为 true,不支持的字符转换为下划线。
  • __meta_kubernetes_endpoints_annotation_<annotationname>: endpoints 对象的每个注解。
  • __meta_kubernetes_endpoints_annotationpresent_<annotationname>: 对于 endpoints 对象的每个注解为 true
  • 对于直接从 endpoints 列表发现的所有目标(那些未从底层 pod 额外推断出的目标),附带以下标签
    • __meta_kubernetes_endpoint_hostname: 端点主机名。
    • __meta_kubernetes_endpoint_node_name: 托管端点的节点名称。
    • __meta_kubernetes_endpoint_ready: 设置为 truefalse 表示端点的就绪状态。
    • __meta_kubernetes_endpoint_port_name: 端点端口名称。
    • __meta_kubernetes_endpoint_port_protocol: 端点端口协议。
    • __meta_kubernetes_endpoint_address_target_kind: 端点地址目标类型。
    • __meta_kubernetes_endpoint_address_target_name: 端点地址目标名称。
  • 如果端点属于服务,则附带 role: service 发现的所有标签。
  • 对于由 pod 支持的所有目标,附带 role: pod 发现的所有标签。

endpointslice

endpointslice 角色从现有 endpointslices 中发现目标。对于 endpointslice 对象中引用的每个端点地址,发现一个目标。如果端点由 pod 支持,则 pod 的所有未绑定到端点端口的额外容器端口也会作为目标被发现。

此角色需要 discovery.k8s.io/v1 API 版本(自 Kubernetes v1.21 起可用)。

可用的元标签

  • __meta_kubernetes_namespace: endpoints 对象命名空间。
  • __meta_kubernetes_endpointslice_name: endpointslice 对象名称。
  • __meta_kubernetes_endpointslice_label_<labelname>: endpointslice 对象的每个标签,不支持的字符转换为下划线。
  • __meta_kubernetes_endpointslice_labelpresent_<labelname>: 对于 endpointslice 对象的每个标签为 true,不支持的字符转换为下划线。
  • __meta_kubernetes_endpointslice_annotation_<annotationname>: endpointslice 对象的每个注解。
  • __meta_kubernetes_endpointslice_annotationpresent_<annotationname>: 对于 endpointslice 对象的每个注解为 true
  • 对于直接从 endpointslice 列表发现的所有目标(那些未从底层 pod 额外推断出的目标),附带以下标签
    • __meta_kubernetes_endpointslice_address_target_kind: 引用对象的类型。
    • __meta_kubernetes_endpointslice_address_target_name: 引用对象名称。
    • __meta_kubernetes_endpointslice_address_type: 目标地址的 ip 协议族。
    • __meta_kubernetes_endpointslice_endpoint_conditions_ready: 设置为 truefalse 表示引用端点的就绪状态。
    • __meta_kubernetes_endpointslice_endpoint_conditions_serving: 设置为 truefalse 表示引用端点的服务状态。
    • __meta_kubernetes_endpointslice_endpoint_conditions_terminating: 设置为 truefalse 表示引用端点的终止状态。
    • __meta_kubernetes_endpointslice_endpoint_topology_kubernetes_io_hostname: 托管引用端点的节点名称。
    • __meta_kubernetes_endpointslice_endpoint_topology_present_kubernetes_io_hostname: 指示引用对象是否有 kubernetes.io/hostname 注解的标记。
    • __meta_kubernetes_endpointslice_endpoint_hostname: 引用端点主机名。
    • __meta_kubernetes_endpointslice_endpoint_node_name: 托管引用端点的节点名称。
    • __meta_kubernetes_endpointslice_endpoint_zone: 引用端点所在的区域。
    • __meta_kubernetes_endpointslice_port: 引用端点端口。
    • __meta_kubernetes_endpointslice_port_name: 引用端点的命名端口。
    • __meta_kubernetes_endpointslice_port_protocol: 引用端点协议。
  • 如果端点属于服务,则附带 role: service 发现的所有标签。
  • 对于由 pod 支持的所有目标,附带 role: pod 发现的所有标签。

ingress

ingress 角色为每个 ingress 的每个路径发现一个目标。这通常对于 ingress 的黑盒监控很有用。地址将被设置为 ingress spec 中指定的主机。

此角色需要 networking.k8s.io/v1 API 版本(自 Kubernetes v1.19 起可用)。

可用的元标签

  • __meta_kubernetes_namespace: ingress 对象命名空间。
  • __meta_kubernetes_ingress_name: ingress 对象名称。
  • __meta_kubernetes_ingress_label_<labelname>: ingress 对象的每个标签,不支持的字符转换为下划线。
  • __meta_kubernetes_ingress_labelpresent_<labelname>: 对于 ingress 对象的每个标签为 true,不支持的字符转换为下划线。
  • __meta_kubernetes_ingress_annotation_<annotationname>: ingress 对象的每个注解。
  • __meta_kubernetes_ingress_annotationpresent_<annotationname>: 对于 ingress 对象的每个注解为 true
  • __meta_kubernetes_ingress_class_name: ingress spec 中的类名,如果存在。
  • __meta_kubernetes_ingress_scheme: ingress 协议方案,如果设置了 TLS 配置则为 https。默认为 http
  • __meta_kubernetes_ingress_path: ingress spec 中的路径。默认为 /

有关 Kubernetes 发现的配置选项,请参见下方

# The information to access the Kubernetes API.

# The API server addresses. If left empty, Prometheus is assumed to run inside
# of the cluster and will discover API servers automatically and use the pod's
# CA certificate and bearer token file at /var/run/secrets/kubernetes.io/serviceaccount/.
[ api_server: <host> ]

# The Kubernetes role of entities that should be discovered.
# One of endpoints, endpointslice, service, pod, node, or ingress.
role: <string>

# Optional path to a kubeconfig file.
# Note that api_server and kube_config are mutually exclusive.
[ kubeconfig_file: <filename> ]

# Optional namespace discovery. If omitted, all namespaces are used.
namespaces:
  own_namespace: <boolean>
  names:
    [ - <string> ]

# Optional label and field selectors to limit the discovery process to a subset of available resources.
# See https://kubernetes.ac.cn/docs/concepts/overview/working-with-objects/field-selectors/
# and https://kubernetes.ac.cn/docs/concepts/overview/working-with-objects/labels/ to learn more about the possible
# filters that can be used. The endpoints role supports pod, service and endpoints selectors.
# The pod role supports node selectors when configured with `attach_metadata: {node: true}`.
# Other roles only support selectors matching the role itself (e.g. node role can only contain node selectors).

# Note: When making decision about using field/label selector make sure that this
# is the best approach - it will prevent Prometheus from reusing single list/watch
# for all scrape configs. This might result in a bigger load on the Kubernetes API,
# because per each selector combination there will be additional LIST/WATCH. On the other hand,
# if you just want to monitor small subset of pods in large cluster it's recommended to use selectors.
# Decision, if selectors should be used or not depends on the particular situation.
[ selectors:
  [ - role: <string>
    [ label: <string> ]
    [ field: <string> ] ]]

# Optional metadata to attach to discovered targets. If omitted, no additional metadata is attached.
attach_metadata:
# Attaches node metadata to discovered targets. Valid for roles: pod, endpoints, endpointslice.
# When set to true, Prometheus must have permissions to get Nodes.
  [ node: <boolean> | default = false ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

有关为 Kubernetes 配置 Prometheus 的详细示例,请参见此示例 Prometheus 配置文件

您可能希望查看第三方 Prometheus Operator,它可以自动在 Kubernetes 之上进行 Prometheus 设置。

<kuma_sd_config>

Kuma SD 配置允许从 Kuma 控制平面检索抓取目标。

此 SD 基于 Kuma 数据平面代理发现“监控分配”,通过 MADS v1 (Monitoring Assignment Discovery Service) xDS API,并将在启用 Prometheus 的 mesh 中为每个代理创建一个目标。

以下元标签可用于每个目标

  • __meta_kuma_mesh: 代理的 Mesh 名称
  • __meta_kuma_dataplane: 代理名称
  • __meta_kuma_service: 代理关联的服务名称
  • __meta_kuma_label_<tagname>: 代理的每个标签

有关 Kuma MonitoringAssignment 发现的配置选项,请参见下方

# Address of the Kuma Control Plane's MADS xDS server.
server: <string>

# Client id is used by Kuma Control Plane to compute Monitoring Assignment for specific Prometheus backend.
# This is useful when migrating between multiple Prometheus backends, or having separate backend for each Mesh.
# When not specified, system hostname/fqdn will be used if available, if not `prometheus` will be used.
[ client_id: <string> ]

# The time to wait between polling update requests.
[ refresh_interval: <duration> | default = 30s ]

# The time after which the monitoring assignments are refreshed.
[ fetch_timeout: <duration> | default = 2m ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

重标记阶段是过滤代理和用户定义标签的首选且更强大的方法。

<lightsail_sd_config>

Lightsail SD 配置允许从 AWS Lightsail 实例中检索抓取目标。默认使用私有 IP 地址,但可以通过重标记更改为公共 IP 地址。

重标记期间,以下元标签可用于目标

  • __meta_lightsail_availability_zone: 实例运行的可用区
  • __meta_lightsail_blueprint_id: Lightsail blueprint ID
  • __meta_lightsail_bundle_id: Lightsail bundle ID
  • __meta_lightsail_instance_name: Lightsail 实例名称
  • __meta_lightsail_instance_state: Lightsail 实例状态
  • __meta_lightsail_instance_support_code: Lightsail 实例支持代码
  • __meta_lightsail_ipv6_addresses: 分配给实例网络接口的 IPv6 地址的逗号分隔列表,如果存在
  • __meta_lightsail_private_ip: 实例的私有 IP 地址
  • __meta_lightsail_public_ip: 如果可用,实例的公共 IP 地址
  • __meta_lightsail_region: 实例区域
  • __meta_lightsail_tag_<tagkey>: 实例的每个标签值

有关 Lightsail 发现的配置选项,请参见下方

# The information to access the Lightsail API.

# The AWS region. If blank, the region from the instance metadata is used.
[ region: <string> ]

# Custom endpoint to be used.
[ endpoint: <string> ]

# The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
# and `AWS_SECRET_ACCESS_KEY` are used.
[ access_key: <string> ]
[ secret_key: <secret> ]
# Named AWS profile used to connect to the API.
[ profile: <string> ]

# AWS Role ARN, an alternative to using AWS API keys.
[ role_arn: <string> ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<linode_sd_config>

Linode SD 配置允许从 Linode 的 Linode APIv4 中检索抓取目标。此服务发现默认使用公共 IPv4 地址,但这可以通过重标记更改,如Prometheus linode-sd 配置文件中所示。

Linode APIv4 Token 必须创建时具有以下范围:linodes:read_onlyips:read_onlyevents:read_only

重标记期间,以下元标签可用于目标

  • __meta_linode_instance_id: linode 实例 ID
  • __meta_linode_instance_label: linode 实例标签
  • __meta_linode_image: linode 实例镜像的 slug
  • __meta_linode_private_ipv4: linode 实例私有 IPv4
  • __meta_linode_public_ipv4: linode 实例公共 IPv4
  • __meta_linode_public_ipv6: linode 实例公共 IPv6
  • __meta_linode_private_ipv4_rdns: linode 实例第一个私有 IPv4 的反向 DNS
  • __meta_linode_public_ipv4_rdns: linode 实例第一个公共 IPv4 的反向 DNS
  • __meta_linode_public_ipv6_rdns: linode 实例第一个公共 IPv6 的反向 DNS
  • __meta_linode_region: linode 实例区域
  • __meta_linode_type: linode 实例类型
  • __meta_linode_status:Linode 实例的状态
  • __meta_linode_tags:Linode 实例的标签列表,使用标签分隔符连接
  • __meta_linode_group:Linode 实例所属的显示组
  • __meta_linode_gpus:Linode 实例的 GPU 数量
  • __meta_linode_hypervisor:为 Linode 实例提供支持的虚拟化软件
  • __meta_linode_backups:Linode 实例的备份服务状态
  • __meta_linode_specs_disk_bytes:Linode 实例可访问的存储空间大小
  • __meta_linode_specs_memory_bytes:Linode 实例可访问的 RAM 大小
  • __meta_linode_specs_vcpus:此 Linode 可访问的 VCPU 数量
  • __meta_linode_specs_transfer_bytes:Linode 实例每月分配的网络流量大小
  • __meta_linode_extra_ips:分配给 Linode 实例的所有额外 IPv4 地址列表,使用标签分隔符连接
  • __meta_linode_ipv6_ranges:分配给 Linode 实例的带掩码的 IPv6 地址范围列表,使用标签分隔符连接

# Optional region to filter on.
[ region: <string> ]

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The string by which Linode Instance tags are joined into the tag label.
[ tag_separator: <string> | default = , ]

# The time after which the linode instances are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<marathon_sd_config>

Marathon SD 配置允许使用 Marathon REST API 来检索抓取目标。Prometheus 将定期检查 REST 端点以获取当前正在运行的任务,并为每个至少有一个健康任务的应用创建一个目标组。

重标记期间,以下元标签可用于目标

  • __meta_marathon_app:应用的名称(斜杠已替换为破折号)
  • __meta_marathon_image:使用的 Docker 镜像名称(如果可用)
  • __meta_marathon_task:Mesos 任务的 ID
  • __meta_marathon_app_label_<labelname>:附加到应用的 Marathon 标签,任何不支持的字符都已转换为下划线
  • __meta_marathon_port_definition_label_<labelname>:端口定义标签,任何不支持的字符都已转换为下划线
  • __meta_marathon_port_mapping_label_<labelname>:端口映射标签,任何不支持的字符都已转换为下划线
  • __meta_marathon_port_index:端口索引号(例如 1 对应 PORT1

请参阅下文了解 Marathon 服务发现的配置选项

# List of URLs to be used to contact Marathon servers.
# You need to provide at least one server URL.
servers:
  - <string>

# Polling interval
[ refresh_interval: <duration> | default = 30s ]

# Optional authentication information for token-based authentication
# https://docs.mesosphere.com/1.11/security/ent/iam-api/#passing-an-authentication-token
# It is mutually exclusive with `auth_token_file` and other authentication mechanisms.
[ auth_token: <secret> ]

# Optional authentication information for token-based authentication
# https://docs.mesosphere.com/1.11/security/ent/iam-api/#passing-an-authentication-token
# It is mutually exclusive with `auth_token` and other authentication mechanisms.
[ auth_token_file: <filename> ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

默认情况下,Marathon 中列出的每个应用都将被 Prometheus 抓取。如果并非所有服务都提供 Prometheus 指标,您可以使用 Marathon 标签和 Prometheus 重贴标签功能来控制实际抓取哪些实例。有关如何设置 Marathon 应用和 Prometheus 配置的实际示例,请参阅 Prometheus marathon-sd 配置文件

默认情况下,所有应用在 Prometheus 中都将显示为单个作业(配置文件中指定的作业),这也可以通过重贴标签来更改。

<nerve_sd_config>

Nerve SD 配置允许从 AirBnB 的 Nerve 检索抓取目标,这些目标存储在 Zookeeper 中。

重标记期间,以下元标签可用于目标

  • __meta_nerve_path:Zookeeper 中端点节点的完整路径
  • __meta_nerve_endpoint_host:端点的主机
  • __meta_nerve_endpoint_port:端点的端口
  • __meta_nerve_endpoint_name:端点的名称
# The Zookeeper servers.
servers:
  - <host>
# Paths can point to a single service, or the root of a tree of services.
paths:
  - <string>
[ timeout: <duration> | default = 10s ]

<nomad_sd_config>

Nomad SD 配置允许从 Nomad 的 Service API 检索抓取目标。

重标记期间,以下元标签可用于目标

  • __meta_nomad_address:目标的的服务地址
  • __meta_nomad_dc:目标的数据中心名称
  • __meta_nomad_namespace:目标的命名空间
  • __meta_nomad_node_id:为目标定义的节点名称
  • __meta_nomad_service:目标所属服务的名称
  • __meta_nomad_service_address:目标的的服务地址
  • __meta_nomad_service_id:目标的 服务 ID
  • __meta_nomad_service_port:目标的服务端口
  • __meta_nomad_tags:目标的标签列表,使用标签分隔符连接
# The information to access the Nomad API. It is to be defined
# as the Nomad documentation requires.
[ allow_stale: <boolean> | default = true ]
[ namespace: <string> | default = default ]
[ refresh_interval: <duration> | default = 60s ]
[ region: <string> | default = global ]
# The URL to connect to the API.
[ server: <string> ]
[ tag_separator: <string> | default = ,]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<serverset_sd_config>

Serverset SD 配置允许从 Serversets 检索抓取目标,这些目标存储在 Zookeeper 中。Serversets 通常由 FinagleAurora 使用。

重标记期间,以下元标签可用于目标

  • __meta_serverset_path:Zookeeper 中 serverset 成员节点的完整路径
  • __meta_serverset_endpoint_host:默认端点的主机
  • __meta_serverset_endpoint_port:默认端点的端口
  • __meta_serverset_endpoint_host_<endpoint>:给定端点的主机
  • __meta_serverset_endpoint_port_<endpoint>:给定端点的端口
  • __meta_serverset_shard:成员的分片编号
  • __meta_serverset_status:成员的状态
# The Zookeeper servers.
servers:
  - <host>
# Paths can point to a single serverset, or the root of a tree of serversets.
paths:
  - <string>
[ timeout: <duration> | default = 10s ]

Serverset 数据必须是 JSON 格式,目前不支持 Thrift 格式。

<triton_sd_config>

Triton SD 配置允许从 Container Monitor 服务发现端点检索抓取目标。

可以配置以下 <triton_role> 类型之一来发现目标

container

container 角色为每个由 account 拥有的“虚拟机”发现一个目标。这些是 SmartOS 区域或 lx/KVM/bhyve 品牌的区域。

重标记期间,以下元标签可用于目标

  • __meta_triton_groups:目标所属的组列表,使用逗号分隔符连接
  • __meta_triton_machine_alias:目标容器的别名
  • __meta_triton_machine_brand:目标容器的品牌
  • __meta_triton_machine_id:目标容器的 UUID
  • __meta_triton_machine_image:目标容器的镜像类型
  • __meta_triton_server_id:目标容器正在运行的服务器 UUID

cn

cn 角色为构成 Triton 基础设施的每个计算节点(也称为“服务器”或“全局区域”)发现一个目标。account 必须是 Triton 操作员,目前需要至少拥有一个 container

重标记期间,以下元标签可用于目标

  • __meta_triton_machine_alias:目标的主机名(需要 triton-cmon 1.7.0 或更高版本)
  • __meta_triton_machine_id:目标的 UUID

请参阅下文了解 Triton 服务发现的配置选项

# The information to access the Triton discovery API.

# The account to use for discovering new targets.
account: <string>

# The type of targets to discover, can be set to:
# * "container" to discover virtual machines (SmartOS zones, lx/KVM/bhyve branded zones) running on Triton
# * "cn" to discover compute nodes (servers/global zones) making up the Triton infrastructure
[ role : <string> | default = "container" ]

# The DNS suffix which should be applied to target.
dns_suffix: <string>

# The Triton discovery endpoint (e.g. 'cmon.us-east-3b.triton.zone'). This is
# often the same value as dns_suffix.
endpoint: <string>

# A list of groups for which targets are retrieved, only supported when `role` == `container`.
# If omitted all containers owned by the requesting account are scraped.
groups:
  [ - <string> ... ]

# The port to use for discovery and metric scraping.
[ port: <int> | default = 9163 ]

# The interval which should be used for refreshing targets.
[ refresh_interval: <duration> | default = 60s ]

# The Triton discovery API version.
[ version: <int> | default = 1 ]

# TLS configuration.
tls_config:
  [ <tls_config> ]

<eureka_sd_config>

Eureka SD 配置允许使用 Eureka REST API 检索抓取目标。Prometheus 将定期检查 REST 端点,并为每个应用实例创建一个目标。

重标记期间,以下元标签可用于目标

  • __meta_eureka_app_name:应用的名称
  • __meta_eureka_app_instance_id:应用实例的 ID
  • __meta_eureka_app_instance_hostname:实例的主机名
  • __meta_eureka_app_instance_homepage_url:应用实例的主页 URL
  • __meta_eureka_app_instance_statuspage_url:应用实例的状态页面 URL
  • __meta_eureka_app_instance_healthcheck_url:应用实例的健康检查 URL
  • __meta_eureka_app_instance_ip_addr:应用实例的 IP 地址
  • __meta_eureka_app_instance_vip_address:应用实例的 VIP 地址
  • __meta_eureka_app_instance_secure_vip_address:应用实例的安全 VIP 地址
  • __meta_eureka_app_instance_status:应用实例的状态
  • __meta_eureka_app_instance_port:应用实例的端口
  • __meta_eureka_app_instance_port_enabled:应用实例的端口是否启用
  • __meta_eureka_app_instance_secure_port:应用实例的安全端口地址
  • __meta_eureka_app_instance_secure_port_enabled:应用实例的安全端口
  • __meta_eureka_app_instance_country_id:应用实例的国家 ID
  • __meta_eureka_app_instance_metadata_<metadataname>:应用实例元数据
  • __meta_eureka_app_instance_datacenterinfo_name:应用实例的数据中心名称
  • __meta_eureka_app_instance_datacenterinfo_<metadataname>:数据中心元数据

请参阅下文了解 Eureka 服务发现的配置选项

# The URL to connect to the Eureka server.
server: <string>

# Refresh interval to re-read the app instance list.
[ refresh_interval: <duration> | default = 30s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

有关如何设置 Eureka 应用和 Prometheus 配置的实际示例,请参阅 Prometheus eureka-sd 配置文件

<scaleway_sd_config>

Scaleway SD 配置允许从 Scaleway 实例裸金属服务 检索抓取目标。

重标记期间,以下元标签可用于目标

实例角色

  • __meta_scaleway_instance_boot_type:服务器的引导类型
  • __meta_scaleway_instance_hostname:服务器的主机名
  • __meta_scaleway_instance_id:服务器的 ID
  • __meta_scaleway_instance_image_arch:服务器镜像的架构
  • __meta_scaleway_instance_image_id:服务器镜像的 ID
  • __meta_scaleway_instance_image_name:服务器镜像的名称
  • __meta_scaleway_instance_location_cluster_id:服务器位置的集群 ID
  • __meta_scaleway_instance_location_hypervisor_id:服务器位置的 hypervisor ID
  • __meta_scaleway_instance_location_node_id:服务器位置的节点 ID
  • __meta_scaleway_instance_name:服务器的名称
  • __meta_scaleway_instance_organization_id:服务器的组织
  • __meta_scaleway_instance_private_ipv4:服务器的私有 IPv4 地址
  • __meta_scaleway_instance_project_id:服务器的项目 ID
  • __meta_scaleway_instance_public_ipv4:服务器的公有 IPv4 地址
  • __meta_scaleway_instance_public_ipv6:服务器的公有 IPv6 地址
  • __meta_scaleway_instance_public_ipv4_addresses:服务器的公有 IPv4 地址列表
  • __meta_scaleway_instance_public_ipv6_addresses:服务器的公有 IPv6 地址列表
  • __meta_scaleway_instance_region:服务器的区域
  • __meta_scaleway_instance_security_group_id:服务器安全组的 ID
  • __meta_scaleway_instance_security_group_name:服务器安全组的名称
  • __meta_scaleway_instance_status:服务器的状态
  • __meta_scaleway_instance_tags:服务器的标签列表,使用标签分隔符连接
  • __meta_scaleway_instance_type:服务器的商业类型
  • __meta_scaleway_instance_zone:服务器的区域(例如:fr-par-1,完整列表请参见此处

此角色按以下顺序使用找到的第一个地址:私有 IPv4、公有 IPv4、公有 IPv6。这可以通过重贴标签进行更改,如 Prometheus scaleway-sd 配置文件 中所示。如果在重贴标签之前实例没有地址,它将不会被添加到目标列表,并且您将无法对其进行重贴标签。

裸金属角色

  • __meta_scaleway_baremetal_id:服务器的 ID
  • __meta_scaleway_baremetal_public_ipv4:服务器的公有 IPv4 地址
  • __meta_scaleway_baremetal_public_ipv6:服务器的公有 IPv6 地址
  • __meta_scaleway_baremetal_name:服务器的名称
  • __meta_scaleway_baremetal_os_name:服务器操作系统的名称
  • __meta_scaleway_baremetal_os_version:服务器操作系统的版本
  • __meta_scaleway_baremetal_project_id:服务器的项目 ID
  • __meta_scaleway_baremetal_status:服务器的状态
  • __meta_scaleway_baremetal_tags:服务器的标签列表,使用标签分隔符连接
  • __meta_scaleway_baremetal_type:服务器的商业类型
  • __meta_scaleway_baremetal_zone:服务器的区域(例如:fr-par-1,完整列表请参见此处

此角色默认使用公有 IPv4 地址。这可以通过重贴标签进行更改,如 Prometheus scaleway-sd 配置文件 中所示。

请参阅下文了解 Scaleway 服务发现的配置选项

# Access key to use. https://console.scaleway.com/project/credentials
access_key: <string>

# Secret key to use when listing targets. https://console.scaleway.com/project/credentials
# It is mutually exclusive with `secret_key_file`.
[ secret_key: <secret> ]

# Sets the secret key with the credentials read from the configured file.
# It is mutually exclusive with `secret_key`.
[ secret_key_file: <filename> ]

# Project ID of the targets.
project_id: <string>

# Role of the targets to retrieve. Must be `instance` or `baremetal`.
role: <string>

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# API URL to use when doing the server listing requests.
[ api_url: <string> | default = "https://api.scaleway.com" ]

# Zone is the availability zone of your targets (e.g. fr-par-1).
[ zone: <string> | default = fr-par-1 ]

# NameFilter specify a name filter (works as a LIKE) to apply on the server listing request.
[ name_filter: <string> ]

# TagsFilter specify a tag filter (a server needs to have all defined tags to be listed) to apply on the server listing request.
tags_filter:
[ - <string> ]

# Refresh interval to re-read the targets list.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<uyuni_sd_config>

Uyuni SD 配置允许通过 Uyuni API 从被管理系统检索抓取目标。

重标记期间,以下元标签可用于目标

  • __meta_uyuni_endpoint_name:应用端点的名称
  • __meta_uyuni_exporter:为目标公开指标的 exporter
  • __meta_uyuni_groups:目标的系统组
  • __meta_uyuni_metrics_path:目标的指标路径
  • __meta_uyuni_minion_hostname:Uyuni 客户端的主机名
  • __meta_uyuni_primary_fqdn:Uyuni 客户端的主 FQDN
  • __meta_uyuni_proxy_module:如果为目标配置了 Exporter Exporter 代理,则为模块名称
  • __meta_uyuni_scheme:用于请求的协议方案
  • __meta_uyuni_system_id:客户端的系统 ID

请参阅下文了解 Uyuni 服务发现的配置选项

# The URL to connect to the Uyuni server.
server: <string>

# Credentials are used to authenticate the requests to Uyuni API.
username: <string>
password: <secret>

# The entitlement string to filter eligible systems.
[ entitlement: <string> | default = monitoring_entitled ]

# The string by which Uyuni group names are joined into the groups label.
[ separator: <string> | default = , ]

# Refresh interval to re-read the managed targets list.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

有关如何设置 Uyuni Prometheus 配置的实际示例,请参阅 Prometheus uyuni-sd 配置文件

<vultr_sd_config>

Vultr SD 配置允许从 Vultr 检索抓取目标。

此服务发现默认使用主 IPv4 地址,这可以通过重贴标签进行更改,如 Prometheus vultr-sd 配置文件 中所示。

重标记期间,以下元标签可用于目标

  • __meta_vultr_instance_id:Vultr 实例的唯一 ID。
  • __meta_vultr_instance_label:此实例的用户提供的标签。
  • __meta_vultr_instance_os:操作系统名称。
  • __meta_vultr_instance_os_id:此实例使用的操作系统 ID。
  • __meta_vultr_instance_region:实例所在的区域 ID。
  • __meta_vultr_instance_plan:计划的唯一 ID。
  • __meta_vultr_instance_main_ip:主 IPv4 地址。
  • __meta_vultr_instance_internal_ip:私有 IP 地址。
  • __meta_vultr_instance_main_ipv6:主 IPv6 地址。
  • __meta_vultr_instance_features:此实例可用的功能列表。
  • __meta_vultr_instance_tags:与此实例关联的标签列表。
  • __meta_vultr_instance_hostname:此实例的主机名。
  • __meta_vultr_instance_server_status:服务器健康状态。
  • __meta_vultr_instance_vcpu_count:vCPU 数量。
  • __meta_vultr_instance_ram_mb:RAM 大小(MB)。
  • __meta_vultr_instance_disk_gb:磁盘大小(GB)。
  • __meta_vultr_instance_allowed_bandwidth_gb:每月带宽配额(GB)。
# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the instances are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<static_config>

static_config 允许指定目标列表及其共同的标签集合。这是在抓取配置中指定静态目标的标准方式。

# The targets specified by the static config.
targets:
  [ - '<host>' ]

# Labels assigned to all metrics scraped from the targets.
labels:
  [ <labelname>: <labelvalue> ... ]

<relabel_config>

重贴标签是一种强大的工具,可以在抓取目标之前动态重写目标的标签集合。每个抓取配置可以配置多个重贴标签步骤。它们按照在配置文件中出现的顺序应用于每个目标的标签集合。

最初,除了配置的每个目标的标签外,目标的 job 标签被设置为相应抓取配置的 job_name 值。__address__ 标签被设置为目标的 <host>:<port> 地址。重贴标签后,如果 instance 标签在重贴标签过程中未设置,则默认将其设置为 __address__ 的值。

__scheme____metrics_path__ 标签分别被设置为目标的协议方案和指标路径,如 scrape_config 中指定。

__param_<name> 标签被设置为第一个传递的 URL 参数的值,该参数名为 <name>,如 scrape_config 中定义。

__scrape_interval____scrape_timeout__ 标签被设置为目标的抓取间隔和超时,如 scrape_config 中指定。

在重贴标签阶段,可能还有以 __meta_ 为前缀的额外标签可用。这些标签由提供目标的服务发现机制设置,并因机制而异。

__ 开头的标签将在目标重贴标签完成后从标签集合中移除。

如果重贴标签步骤只需要临时存储标签值(作为后续重贴标签步骤的输入),请使用 __tmp 标签名前缀。Prometheus 保证永远不会使用此前缀。

# The source_labels tells the rule what labels to fetch from the series. Any
# labels which do not exist get a blank value ("").  Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
[ source_labels: '[' <labelname> [, ...] ']' ]

# Separator placed between concatenated source label values.
[ separator: <string> | default = ; ]

# Label to which the resulting value is written in a replace action.
# It is mandatory for replace actions. Regex capture groups are available.
[ target_label: <labelname> ]

# Regular expression against which the extracted value is matched.
[ regex: <regex> | default = (.*) ]

# Modulus to take of the hash of the source label values.
[ modulus: <int> ]

# Replacement value against which a regex replace is performed if the
# regular expression matches. Regex capture groups are available.
[ replacement: <string> | default = $1 ]

# Action to perform based on regex matching.
[ action: <relabel_action> | default = replace ]

<regex> 是任何有效的 RE2 正则表达式。它对于 replace, keep, drop, labelmap,labeldroplabelkeep 操作是必需的。正则表达式在两端都被锚定。要取消锚定,请使用 .*<regex>.*

<relabel_action> 决定要执行的重贴标签操作

  • replace:将 regex 与连接后的 source_labels 进行匹配。然后,将 target_label 设置为 replacement,其中 replacement 中的匹配组引用(${1}, ${2}, ...)替换为它们的值。如果 regex 不匹配,则不进行替换。
  • lowercase:将连接后的 source_labels 转换为小写。
  • uppercase:将连接后的 source_labels 转换为大写。
  • keep:丢弃 regex 与连接后的 source_labels 不匹配的目标。
  • drop:丢弃 regex 与连接后的 source_labels 匹配的目标。
  • keepequal:丢弃连接后的 source_labelstarget_label 不匹配的目标。
  • dropequal:丢弃连接后的 source_labelstarget_label 匹配的目标。
  • hashmod:将 target_label 设置为连接后的 source_labels 的哈希值的 modulus 模。
  • labelmap:将 regex 与所有源标签名称(而不仅仅是 source_labels 中指定的名称)进行匹配。然后将匹配标签的值复制到由 replacement 给定的标签名称,其中 replacement 中的匹配组引用(${1}, ${2}, ...)替换为它们的值。
  • labeldrop:将 regex 与所有标签名称进行匹配。任何匹配的标签都将从标签集合中移除。
  • labelkeep:将 regex 与所有标签名称进行匹配。任何不匹配的标签都将从标签集合中移除。

使用 labeldroplabelkeep 时必须小心,以确保一旦标签被移除,指标仍然具有唯一的标签。

<metric_relabel_configs>

指标重贴标签是在摄取样本之前的最后一步应用。它具有与目标重贴标签相同的配置格式和操作。指标重贴标签不适用于自动生成的时间序列,例如 up

这的一种用途是排除摄取成本过高的时间序列。

<alert_relabel_configs>

在将警报发送到 Alertmanager 之前,会对其应用警报重贴标签。它具有与目标重贴标签相同的配置格式和操作。警报重贴标签在外部标签之后应用。

这的一种用途是确保具有不同外部标签的 Prometheus 服务器 HA 对发送相同的警报。

<alertmanager_config>

alertmanager_config 部分指定 Prometheus 服务器发送警报的 Alertmanager 实例。它还提供参数来配置如何与这些 Alertmanager 通信。

Alertmanager 可以通过 static_configs 参数静态配置,或使用受支持的服务发现机制之一动态发现。

此外,relabel_configs 允许从发现的实体中选择 Alertmanager,并对使用的 API 路径进行高级修改,该路径通过 __alerts_path__ 标签公开。

# Per-target Alertmanager timeout when pushing alerts.
[ timeout: <duration> | default = 10s ]

# The api version of Alertmanager.
[ api_version: <string> | default = v2 ]

# Prefix for the HTTP path alerts are pushed to.
[ path_prefix: <path> | default = / ]

# Configures the protocol scheme used for requests.
[ scheme: <scheme> | default = http ]

# Optionally configures AWS's Signature Verification 4 signing process to sign requests.
# Cannot be set at the same time as basic_auth, authorization, oauth2, azuread or google_iam.
# To use the default credentials from the AWS SDK, use `sigv4: {}`.
sigv4:
  # The AWS region. If blank, the region from the default credentials chain
  # is used.
  [ region: <string> ]

  # The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
  # and `AWS_SECRET_ACCESS_KEY` are used.
  [ access_key: <string> ]
  [ secret_key: <secret> ]

  # Named AWS profile used to authenticate.
  [ profile: <string> ]

  # AWS Role ARN, an alternative to using AWS API keys.
  [ role_arn: <string> ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]

# List of Eureka service discovery configurations.
eureka_sd_configs:
  [ - <eureka_sd_config> ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]

# List of DigitalOcean service discovery configurations.
digitalocean_sd_configs:
  [ - <digitalocean_sd_config> ... ]

# List of Docker service discovery configurations.
docker_sd_configs:
  [ - <docker_sd_config> ... ]

# List of Docker Swarm service discovery configurations.
dockerswarm_sd_configs:
  [ - <dockerswarm_sd_config> ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]

# List of Hetzner service discovery configurations.
hetzner_sd_configs:
  [ - <hetzner_sd_config> ... ]

# List of HTTP service discovery configurations.
http_sd_configs:
  [ - <http_sd_config> ... ]

 # List of IONOS service discovery configurations.
ionos_sd_configs:
  [ - <ionos_sd_config> ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

# List of Lightsail service discovery configurations.
lightsail_sd_configs:
  [ - <lightsail_sd_config> ... ]

# List of Linode service discovery configurations.
linode_sd_configs:
  [ - <linode_sd_config> ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]

# List of Nomad service discovery configurations.
nomad_sd_configs:
  [ - <nomad_sd_config> ... ]

# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]

# List of OVHcloud service discovery configurations.
ovhcloud_sd_configs:
  [ - <ovhcloud_sd_config> ... ]

# List of PuppetDB service discovery configurations.
puppetdb_sd_configs:
  [ - <puppetdb_sd_config> ... ]

# List of Scaleway service discovery configurations.
scaleway_sd_configs:
  [ - <scaleway_sd_config> ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]

# List of Uyuni service discovery configurations.
uyuni_sd_configs:
  [ - <uyuni_sd_config> ... ]

# List of Vultr service discovery configurations.
vultr_sd_configs:
  [ - <vultr_sd_config> ... ]

# List of labeled statically configured Alertmanagers.
static_configs:
  [ - <static_config> ... ]

# List of Alertmanager relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]

# List of alert relabel configurations.
alert_relabel_configs:
  [ - <relabel_config> ... ]

<remote_write>

write_relabel_configs 是在将样本发送到远程端点之前应用的重贴标签。写入重贴标签在外部标签之后应用。这可以用于限制发送哪些样本。

有一个 小型演示 说明如何使用此功能。

# The URL of the endpoint to send samples to.
url: <string>

# protobuf message to use when writing to the remote write endpoint.
#
# * The `prometheus.WriteRequest` represents the message introduced in Remote Write 1.0, which
# will be deprecated eventually.
# * The `io.prometheus.write.v2.Request` was introduced in Remote Write 2.0 and replaces the former,
# by improving efficiency and sending metadata, created timestamp and native histograms by default.
#
# Before changing this value, consult with your remote storage provider (or test) what message it supports.
# Read more on https://prometheus.ac.cn/docs/specs/remote_write_spec_2_0/#io-prometheus-write-v2-request
[ protobuf_message: <prometheus.WriteRequest | io.prometheus.write.v2.Request> | default = prometheus.WriteRequest ]

# Timeout for requests to the remote write endpoint.
[ remote_timeout: <duration> | default = 30s ]

# Custom HTTP headers to be sent along with each remote write request.
# Be aware that headers that are set by Prometheus itself can't be overwritten.
headers:
  [ <string>: <string> ... ]

# List of remote write relabel configurations.
write_relabel_configs:
  [ - <relabel_config> ... ]

# Name of the remote write config, which if specified must be unique among remote write configs.
# The name will be used in metrics and logging in place of a generated value to help users distinguish between
# remote write configs.
[ name: <string> ]

# Enables sending of exemplars over remote write. Note that exemplar storage itself must be enabled for exemplars to be scraped in the first place.
[ send_exemplars: <boolean> | default = false ]

# Enables sending of native histograms, also known as sparse histograms, over remote write.
# For the `io.prometheus.write.v2.Request` message, this option is noop (always true).
[ send_native_histograms: <boolean> | default = false ]

# When enabled, remote-write will resolve the URL host name via DNS, choose one of the IP addresses at random, and connect to it.
# When disabled, remote-write relies on Go's standard behavior, which is to try to connect to each address in turn.
# The connection timeout applies to the whole operation, i.e. in the latter case it is spread over all attempt.
# This is an experimental feature, and its behavior might still change, or even get removed.
[ round_robin_dns: <boolean> | default = false ]

# Optionally configures AWS's Signature Verification 4 signing process to
# sign requests. Cannot be set at the same time as basic_auth, authorization, oauth2, or azuread.
# To use the default credentials from the AWS SDK, use `sigv4: {}`.
sigv4:
  # The AWS region. If blank, the region from the default credentials chain
  # is used.
  [ region: <string> ]

  # The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
  # and `AWS_SECRET_ACCESS_KEY` are used.
  [ access_key: <string> ]
  [ secret_key: <secret> ]

  # Named AWS profile used to authenticate.
  [ profile: <string> ]

  # AWS Role ARN, an alternative to using AWS API keys.
  [ role_arn: <string> ]

# Optional AzureAD configuration.
# Cannot be used at the same time as basic_auth, authorization, oauth2, sigv4 or google_iam.
azuread:
  # The Azure Cloud. Options are 'AzurePublic', 'AzureChina', or 'AzureGovernment'.
  [ cloud: <string> | default = AzurePublic ]

  # Azure User-assigned Managed identity.
  [ managed_identity:
      [ client_id: <string> ] ]

  # Azure OAuth.
  [ oauth:
      [ client_id: <string> ]
      [ client_secret: <string> ]
      [ tenant_id: <string> ] ]

  # Azure SDK auth.
  # See https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication
  [ sdk:
      [ tenant_id: <string> ] ]

# WARNING: Remote write is NOT SUPPORTED by Google Cloud. This configuration is reserved for future use.
# Optional Google Cloud Monitoring configuration.
# Cannot be used at the same time as basic_auth, authorization, oauth2, sigv4 or azuread.
# To use the default credentials from the Google Cloud SDK, use `google_iam: {}`.
google_iam:
  # Service account key with monitoring write permissions.
  credentials_file: <file_name>

# Configures the queue used to write to remote storage.
queue_config:
  # Number of samples to buffer per shard before we block reading of more
  # samples from the WAL. It is recommended to have enough capacity in each
  # shard to buffer several requests to keep throughput up while processing
  # occasional slow remote requests.
  [ capacity: <int> | default = 10000 ]
  # Maximum number of shards, i.e. amount of concurrency.
  [ max_shards: <int> | default = 50 ]
  # Minimum number of shards, i.e. amount of concurrency.
  [ min_shards: <int> | default = 1 ]
  # Maximum number of samples per send.
  [ max_samples_per_send: <int> | default = 2000]
  # Maximum time a sample will wait for a send. The sample might wait less
  # if the buffer is full. Further time might pass due to potential retries.
  [ batch_send_deadline: <duration> | default = 5s ]
  # Initial retry delay. Gets doubled for every retry.
  [ min_backoff: <duration> | default = 30ms ]
  # Maximum retry delay.
  [ max_backoff: <duration> | default = 5s ]
  # Retry upon receiving a 429 status code from the remote-write storage.
  # This is experimental and might change in the future.
  [ retry_on_http_429: <boolean> | default = false ]
  # If set, any sample that is older than sample_age_limit
  # will not be sent to the remote storage. The default value is 0s,
  # which means that all samples are sent.
  [ sample_age_limit: <duration> | default = 0s ]

# Configures the sending of series metadata to remote storage
# if the `prometheus.WriteRequest` message was chosen. When
# `io.prometheus.write.v2.Request` is used, metadata is always sent.
#
# Metadata configuration is subject to change at any point
# or be removed in future releases.
metadata_config:
  # Whether metric metadata is sent to remote storage or not.
  [ send: <boolean> | default = true ]
  # How frequently metric metadata is sent to remote storage.
  [ send_interval: <duration> | default = 1m ]
  # Maximum number of samples per send.
  [ max_samples_per_send: <int> | default = 500]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
# enable_http2 defaults to false for remote-write.
[ <http_config> ]

有一个支持此功能的 集成列表

<remote_read>

# The URL of the endpoint to query from.
url: <string>

# Name of the remote read config, which if specified must be unique among remote read configs.
# The name will be used in metrics and logging in place of a generated value to help users distinguish between
# remote read configs.
[ name: <string> ]

# An optional list of equality matchers which have to be
# present in a selector to query the remote read endpoint.
required_matchers:
  [ <labelname>: <labelvalue> ... ]

# Timeout for requests to the remote read endpoint.
[ remote_timeout: <duration> | default = 1m ]

# Custom HTTP headers to be sent along with each remote read request.
# Be aware that headers that are set by Prometheus itself can't be overwritten.
headers:
  [ <string>: <string> ... ]

# Whether reads should be made for queries for time ranges that
# the local storage should have complete data for.
[ read_recent: <boolean> | default = false ]

# Whether to use the external labels as selectors for the remote read endpoint.
[ filter_external_labels: <boolean> | default = true ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

有一个支持此功能的 集成列表

<tsdb>

tsdb 允许您配置 TSDB 的运行时可重载配置设置。

# Configures how old an out-of-order/out-of-bounds sample can be w.r.t. the TSDB max time.
# An out-of-order/out-of-bounds sample is ingested into the TSDB as long as the timestamp
# of the sample is >= TSDB.MaxTime-out_of_order_time_window.
#
# When out_of_order_time_window is >0, the errors out-of-order and out-of-bounds are
# combined into a single error called 'too-old'; a sample is either (a) ingestible
# into the TSDB, i.e. it is an in-order sample or an out-of-order/out-of-bounds sample
# that is within the out-of-order window, or (b) too-old, i.e. not in-order
# and before the out-of-order window.
#
# When out_of_order_time_window is greater than 0, it also affects experimental agent. It allows
# the agent's WAL to accept out-of-order samples that fall within the specified time window relative
# to the timestamp of the last appended sample for the same series.
[ out_of_order_time_window: <duration> | default = 0s ]

<exemplars>

请注意,Exemplar 存储仍被视为实验性功能,必须通过 --enable-feature=exemplar-storage 启用。

# Configures the maximum size of the circular buffer used to store exemplars for all series. Resizable during runtime.
[ max_exemplars: <int> | default = 100000 ]

<tracing_config>

tracing_config 配置通过 OTLP 协议将 Prometheus 的跟踪数据导出到跟踪后端。跟踪目前是 实验性 功能,将来可能会发生变化。

# Client used to export the traces. Options are 'http' or 'grpc'.
[ client_type: <string> | default = grpc ]

# Endpoint to send the traces to. Should be provided in format <host>:<port>.
[ endpoint: <string> ]

# Sets the probability a given trace will be sampled. Must be a float from 0 through 1.
[ sampling_fraction: <float> | default = 0 ]

# If disabled, the client will use a secure connection.
[ insecure: <boolean> | default = false ]

# Key-value pairs to be used as headers associated with gRPC or HTTP requests.
headers:
  [ <string>: <string> ... ]

# Compression key for supported compression types. Supported compression: gzip.
[ compression: <string> ]

# Maximum time the exporter will wait for each batch export.
[ timeout: <duration> | default = 10s ]

# TLS configuration.
tls_config:
  [ <tls_config> ]

本文档是 开源的。请通过提交 issue 或 pull request 帮助改进它。