配置

Prometheus 通过命令行标志和配置文件进行配置。命令行标志配置不可变的系统参数(例如存储位置、磁盘和内存中要保留的数据量等),而配置文件定义与抓取任务及其实例相关的所有内容,以及要加载的规则文件

要查看所有可用的命令行标志,请运行 ./prometheus -h

Prometheus 可以在运行时重新加载其配置。如果新配置格式不正确,则不会应用更改。通过向 Prometheus 进程发送 SIGHUP 或向 /-/reload 端点发送 HTTP POST 请求(当启用 --web.enable-lifecycle 标志时)来触发配置重新加载。这还会重新加载任何已配置的规则文件。

配置文件

要指定要加载的配置文件,请使用 --config.file 标志。

该文件以 YAML 格式编写,由下面描述的方案定义。括号表示参数是可选的。对于非列表参数,值将设置为指定的默认值。

通用占位符定义如下

  • <boolean>: 一个布尔值,可以取值 truefalse
  • <duration>: 一个与正则表达式 ((([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?|0) 匹配的持续时间,例如 1d1h30m5m10s
  • <filename>: 当前工作目录中的有效路径
  • <float>: 一个浮点数
  • <host>: 一个由主机名或 IP 地址后跟可选端口号组成的有效字符串
  • <int>: 一个整数值
  • <labelname>: 一个与正则表达式 [a-zA-Z_][a-zA-Z0-9_]* 匹配的字符串。源标签中任何其他不支持的字符都应转换为下划线。例如,标签 app.kubernetes.io/name 应写为 app_kubernetes_io_name
  • <labelvalue>: 一个 Unicode 字符的字符串
  • <path>: 一个有效的 URL 路径
  • <scheme>: 一个可以取值 httphttps 的字符串
  • <secret>: 一个常规字符串,是一个秘密,例如密码
  • <string>: 一个常规字符串
  • <size>: 以字节为单位的大小,例如 512MB。需要一个单位。支持的单位:B、KB、MB、GB、TB、PB、EB。
  • <tmpl_string>: 一个在使用前进行模板扩展的字符串

其他占位符单独指定。

一个有效的示例文件可以在这里找到。

全局配置指定在所有其他配置上下文中都有效的参数。它们还用作其他配置部分的默认值。

global:
  # How frequently to scrape targets by default.
  [ scrape_interval: <duration> | default = 1m ]

  # How long until a scrape request times out.
  [ scrape_timeout: <duration> | default = 10s ]

  # The protocols to negotiate during a scrape with the client.
  # Supported values (case sensitive): PrometheusProto, OpenMetricsText0.0.1,
  # OpenMetricsText1.0.0, PrometheusText0.0.4.
  # The default value changes to [ PrometheusProto, OpenMetricsText1.0.0, OpenMetricsText0.0.1, PrometheusText0.0.4 ]
  # when native_histogram feature flag is set.
  [ scrape_protocols: [<string>, ...] | default = [ OpenMetricsText1.0.0, OpenMetricsText0.0.1, PrometheusText0.0.4 ] ]

  # How frequently to evaluate rules.
  [ evaluation_interval: <duration> | default = 1m ]

  # Offset the rule evaluation timestamp of this particular group by the
  # specified duration into the past to ensure the underlying metrics have
  # been received. Metric availability delays are more likely to occur when
  # Prometheus is running as a remote write target, but can also occur when
  # there's anomalies with scraping.
  [ rule_query_offset: <duration> | default = 0s ]

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager). 
  # Environment variable references `${var}` or `$var` are replaced according 
  # to the values of the current environment variables. 
  # References to undefined variables are replaced by the empty string.
  # The `$` character can be escaped by using `$$`.
  external_labels:
    [ <labelname>: <labelvalue> ... ]

  # File to which PromQL queries are logged.
  # Reloading the configuration will reopen the file.
  [ query_log_file: <string> ]

  # File to which scrape failures are logged.
  # Reloading the configuration will reopen the file.
  [ scrape_failure_log_file: <string> ]

  # An uncompressed response body larger than this many bytes will cause the
  # scrape to fail. 0 means no limit. Example: 100MB.
  # This is an experimental feature, this behaviour could
  # change or be removed in the future.
  [ body_size_limit: <size> | default = 0 ]

  # Per-scrape limit on the number of scraped samples that will be accepted.
  # If more than this number of samples are present after metric relabeling
  # the entire scrape will be treated as failed. 0 means no limit.
  [ sample_limit: <int> | default = 0 ]

  # Limit on the number of labels that will be accepted per sample. If more
  # than this number of labels are present on any sample post metric-relabeling,
  # the entire scrape will be treated as failed. 0 means no limit.
  [ label_limit: <int> | default = 0 ]

  # Limit on the length (in bytes) of each individual label name. If any label
  # name in a scrape is longer than this number post metric-relabeling, the
  # entire scrape will be treated as failed. Note that label names are UTF-8
  # encoded, and characters can take up to 4 bytes. 0 means no limit.
  [ label_name_length_limit: <int> | default = 0 ]

  # Limit on the length (in bytes) of each individual label value. If any label
  # value in a scrape is longer than this number post metric-relabeling, the
  # entire scrape will be treated as failed. Note that label values are UTF-8
  # encoded, and characters can take up to 4 bytes. 0 means no limit.
  [ label_value_length_limit: <int> | default = 0 ]

  # Limit per scrape config on number of unique targets that will be
  # accepted. If more than this number of targets are present after target
  # relabeling, Prometheus will mark the targets as failed without scraping them.
  # 0 means no limit. This is an experimental feature, this behaviour could
  # change in the future.
  [ target_limit: <int> | default = 0 ]

  # Limit per scrape config on the number of targets dropped by relabeling
  # that will be kept in memory. 0 means no limit.
  [ keep_dropped_targets: <int> | default = 0 ]

  # Specifies the validation scheme for metric and label names. Either blank or
  # "utf8" for full UTF-8 support, or "legacy" for letters, numbers, colons,
  # and underscores.
  [ metric_name_validation_scheme <string> | default "utf8" ]

runtime:
  # Configure the Go garbage collector GOGC parameter
  # See: https://tip.golang.org/doc/gc-guide#GOGC
  # Lowering this number increases CPU usage.
  [ gogc: <int> | default = 75 ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
  [ - <filepath_glob> ... ]

# Scrape config files specifies a list of globs. Scrape configs are read from
# all matching files and appended to the list of scrape configs.
scrape_config_files:
  [ - <filepath_glob> ... ]

# A list of scrape configurations.
scrape_configs:
  [ - <scrape_config> ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# Settings related to the remote write feature.
remote_write:
  [ - <remote_write> ... ]

# Settings related to the OTLP receiver feature.
# See https://prometheus.ac.cn/docs/guides/opentelemetry/ for best practices.
otlp:
  [ promote_resource_attributes: [<string>, ...] | default = [ ] ]
  # Configures translation of OTLP metrics when received through the OTLP metrics
  # endpoint. Available values:
  # - "UnderscoreEscapingWithSuffixes" refers to commonly agreed normalization used
  #   by OpenTelemetry in https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/translator/prometheus
  # - "NoUTF8EscapingWithSuffixes" is a mode that relies on UTF-8 support in Prometheus.
  #   It preserves all special characters like dots, but still adds required metric name suffixes
  #   for units and _total, as UnderscoreEscapingWithSuffixes does.
  [ translation_strategy: <string> | default = "UnderscoreEscapingWithSuffixes" ]
  # Enables adding "service.name", "service.namespace" and "service.instance.id"
  # resource attributes to the "target_info" metric, on top of converting
  # them into the "instance" and "job" labels.
  [ keep_identifying_resource_attributes: <boolean> | default = false]

# Settings related to the remote read feature.
remote_read:
  [ - <remote_read> ... ]

# Storage related settings that are runtime reloadable.
storage:
  [ tsdb: <tsdb> ]
  [ exemplars: <exemplars> ]

# Configures exporting traces.
tracing:
  [ <tracing_config> ]

<scrape_config>

scrape_config 部分指定一组目标和描述如何抓取它们的参数。在一般情况下,一个抓取配置指定一个任务。在高级配置中,这可能会更改。

目标可以通过 static_configs 参数静态配置,或者使用支持的服务发现机制动态发现。

此外,relabel_configs 允许在抓取之前对任何目标及其标签进行高级修改。

# The job name assigned to scraped metrics by default.
job_name: <job_name>

# How frequently to scrape targets from this job.
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

# Per-scrape timeout when scraping this job.
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]

# The protocols to negotiate during a scrape with the client.
# Supported values (case sensitive): PrometheusProto, OpenMetricsText0.0.1,
# OpenMetricsText1.0.0, PrometheusText0.0.4, PrometheusText1.0.0.
[ scrape_protocols: [<string>, ...] | default = <global_config.scrape_protocols> ]

# Fallback protocol to use if a scrape returns blank, unparseable, or otherwise
# invalid Content-Type.
# Supported values (case sensitive): PrometheusProto, OpenMetricsText0.0.1,
# OpenMetricsText1.0.0, PrometheusText0.0.4, PrometheusText1.0.0.
[ fallback_scrape_protocol: <string> ]

# Whether to scrape a classic histogram, even if it is also exposed as a native
# histogram (has no effect without --enable-feature=native-histograms).
[ always_scrape_classic_histograms: <boolean> | default = false ]

# The HTTP resource path on which to fetch metrics from targets.
[ metrics_path: <path> | default = /metrics ]

# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
#
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels: <boolean> | default = false ]

# honor_timestamps controls whether Prometheus respects the timestamps present
# in scraped data.
#
# If honor_timestamps is set to "true", the timestamps of the metrics exposed
# by the target will be used.
#
# If honor_timestamps is set to "false", the timestamps of the metrics exposed
# by the target will be ignored.
[ honor_timestamps: <boolean> | default = true ]

# track_timestamps_staleness controls whether Prometheus tracks staleness of
# the metrics that have an explicit timestamps present in scraped data.
#
# If track_timestamps_staleness is set to "true", a staleness marker will be
# inserted in the TSDB when a metric is no longer present or the target
# is down.
[ track_timestamps_staleness: <boolean> | default = false ]

# Configures the protocol scheme used for requests.
[ scheme: <scheme> | default = http ]

# Optional HTTP URL parameters.
params:
  [ <string>: [<string>, ...] ]

# If enable_compression is set to "false", Prometheus will request uncompressed
# response from the scraped target.
[ enable_compression: <boolean> | default = true ]

# File to which scrape failures are logged.
# Reloading the configuration will reopen the file.
[ scrape_failure_log_file: <string> ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]

# List of DigitalOcean service discovery configurations.
digitalocean_sd_configs:
  [ - <digitalocean_sd_config> ... ]

# List of Docker service discovery configurations.
docker_sd_configs:
  [ - <docker_sd_config> ... ]

# List of Docker Swarm service discovery configurations.
dockerswarm_sd_configs:
  [ - <dockerswarm_sd_config> ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]

# List of Eureka service discovery configurations.
eureka_sd_configs:
  [ - <eureka_sd_config> ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]

# List of Hetzner service discovery configurations.
hetzner_sd_configs:
  [ - <hetzner_sd_config> ... ]

# List of HTTP service discovery configurations.
http_sd_configs:
  [ - <http_sd_config> ... ]


# List of IONOS service discovery configurations.
ionos_sd_configs:
  [ - <ionos_sd_config> ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

# List of Kuma service discovery configurations.
kuma_sd_configs:
  [ - <kuma_sd_config> ... ]

# List of Lightsail service discovery configurations.
lightsail_sd_configs:
  [ - <lightsail_sd_config> ... ]

# List of Linode service discovery configurations.
linode_sd_configs:
  [ - <linode_sd_config> ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]

# List of Nomad service discovery configurations.
nomad_sd_configs:
  [ - <nomad_sd_config> ... ]

# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]

# List of OVHcloud service discovery configurations.
ovhcloud_sd_configs:
  [ - <ovhcloud_sd_config> ... ]

# List of PuppetDB service discovery configurations.
puppetdb_sd_configs:
  [ - <puppetdb_sd_config> ... ]

# List of Scaleway service discovery configurations.
scaleway_sd_configs:
  [ - <scaleway_sd_config> ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]

# List of Uyuni service discovery configurations.
uyuni_sd_configs:
  [ - <uyuni_sd_config> ... ]

# List of labeled statically configured targets for this job.
static_configs:
  [ - <static_config> ... ]

# List of target relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]

# List of metric relabel configurations.
metric_relabel_configs:
  [ - <relabel_config> ... ]

# An uncompressed response body larger than this many bytes will cause the
# scrape to fail. 0 means no limit. Example: 100MB.
# This is an experimental feature, this behaviour could
# change or be removed in the future.
[ body_size_limit: <size> | default = 0 ]

# Per-scrape limit on the number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabeling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit: <int> | default = 0 ]

# Limit on the number of labels that will be accepted per sample. If more
# than this number of labels are present on any sample post metric-relabeling,
# the entire scrape will be treated as failed. 0 means no limit.
[ label_limit: <int> | default = 0 ]

# Limit on the length (in bytes) of each individual label name. If any label
# name in a scrape is longer than this number post metric-relabeling, the
# entire scrape will be treated as failed. Note that label names are UTF-8
# encoded, and characters can take up to 4 bytes. 0 means no limit.
[ label_name_length_limit: <int> | default = 0 ]

# Limit on the length (in bytes) of each individual label value. If any label
# value in a scrape is longer than this number post metric-relabeling, the
# entire scrape will be treated as failed. Note that label values are UTF-8
# encoded, and characters can take up to 4 bytes. 0 means no limit.
[ label_value_length_limit: <int> | default = 0 ]

# Limit per scrape config on number of unique targets that will be
# accepted. If more than this number of targets are present after target
# relabeling, Prometheus will mark the targets as failed without scraping them.
# 0 means no limit. This is an experimental feature, this behaviour could
# change in the future.
[ target_limit: <int> | default = 0 ]

# Limit per scrape config on the number of targets dropped by relabeling
# that will be kept in memory. 0 means no limit.
[ keep_dropped_targets: <int> | default = 0 ]

# Specifies the validation scheme for metric and label names. Either blank or 
# "utf8" for full UTF-8 support, or "legacy" for letters, numbers, colons, and
# underscores.
[ metric_name_validation_scheme <string> | default "utf8" ]

# Limit on total number of positive and negative buckets allowed in a single
# native histogram. The resolution of a histogram with more buckets will be
# reduced until the number of buckets is within the limit. If the limit cannot
# be reached, the scrape will fail.
# 0 means no limit.
[ native_histogram_bucket_limit: <int> | default = 0 ]

# Lower limit for the growth factor of one bucket to the next in each native
# histogram. The resolution of a histogram with a lower growth factor will be
# reduced as much as possible until it is within the limit.
# To set an upper limit for the schema (equivalent to "scale" in OTel's
# exponential histograms), use the following factor limits:
# 
# +----------------------------+----------------------------+
# |        growth factor       | resulting schema AKA scale |
# +----------------------------+----------------------------+
# |          65536             |             -4             |
# +----------------------------+----------------------------+
# |            256             |             -3             |
# +----------------------------+----------------------------+
# |             16             |             -2             |
# +----------------------------+----------------------------+
# |              4             |             -1             |
# +----------------------------+----------------------------+
# |              2             |              0             |
# +----------------------------+----------------------------+
# |              1.4           |              1             |
# +----------------------------+----------------------------+
# |              1.1           |              2             |
# +----------------------------+----------------------------+
# |              1.09          |              3             |
# +----------------------------+----------------------------+
# |              1.04          |              4             |
# +----------------------------+----------------------------+
# |              1.02          |              5             |
# +----------------------------+----------------------------+
# |              1.01          |              6             |
# +----------------------------+----------------------------+
# |              1.005         |              7             |
# +----------------------------+----------------------------+
# |              1.002         |              8             |
# +----------------------------+----------------------------+
# 
# 0 results in the smallest supported factor (which is currently ~1.0027 or
# schema 8, but might change in the future).
[ native_histogram_min_bucket_factor: <float> | default = 0 ]

其中 <job_name> 在所有抓取配置中必须是唯一的。

<http_config>

http_config 允许配置 HTTP 请求。

# Sets the `Authorization` header on every request with the
# configured username and password.
# username and username_file are mutually exclusive.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ username_file: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

# Sets the `Authorization` header on every request with
# the configured credentials.
authorization:
  # Sets the authentication type of the request.
  [ type: <string> | default: Bearer ]
  # Sets the credentials of the request. It is mutually exclusive with
  # `credentials_file`.
  [ credentials: <secret> ]
  # Sets the credentials of the request with the credentials read from the
  # configured file. It is mutually exclusive with `credentials`.
  [ credentials_file: <filename> ]

# Optional OAuth 2.0 configuration.
# Cannot be used at the same time as basic_auth or authorization.
oauth2:
  [ <oauth2> ]

# Configure whether requests follow HTTP 3xx redirects.
[ follow_redirects: <boolean> | default = true ]

# Whether to enable HTTP2.
[ enable_http2: <boolean> | default: true ]

# Configures the request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]
# Comma-separated string that can contain IPs, CIDR notation, domain names
# that should be excluded from proxying. IP and domain names can
# contain port numbers.
[ no_proxy: <string> ]
# Use proxy URL indicated by environment variables (HTTP_PROXY, https_proxy, HTTPs_PROXY, https_proxy, and no_proxy)
[ proxy_from_environment: <boolean> | default: false ]
# Specifies headers to send to proxies during CONNECT requests.
[ proxy_connect_header:
  [ <string>: [<secret>, ...] ] ]

# Custom HTTP headers to be sent along with each request.
# Headers that are set by Prometheus itself can't be overwritten.
http_headers:
  # Header name.
  [ <string>:
    # Header values.
    [ values: [<string>, ...] ]
    # Headers values. Hidden in configuration page.
    [ secrets: [<secret>, ...] ]
    # Files to read header values from.
    [ files: [<string>, ...] ] ]

<tls_config>

tls_config 允许配置 TLS 连接。

# CA certificate to validate API server certificate with. At most one of ca and ca_file is allowed.
[ ca: <string> ]
[ ca_file: <filename> ]

# Certificate and key for client cert authentication to the server.
# At most one of cert and cert_file is allowed.
# At most one of key and key_file is allowed.
[ cert: <string> ]
[ cert_file: <filename> ]
[ key: <secret> ]
[ key_file: <filename> ]

# ServerName extension to indicate the name of the server.
# https://tools.ietf.org/html/rfc4366#section-3.1
[ server_name: <string> ]

# Disable validation of the server certificate.
[ insecure_skip_verify: <boolean> ]

# Minimum acceptable TLS version. Accepted values: TLS10 (TLS 1.0), TLS11 (TLS
# 1.1), TLS12 (TLS 1.2), TLS13 (TLS 1.3).
# If unset, Prometheus will use Go default minimum version, which is TLS 1.2.
# See MinVersion in https://pkg.go.dev/crypto/tls#Config.
[ min_version: <string> ]
# Maximum acceptable TLS version. Accepted values: TLS10 (TLS 1.0), TLS11 (TLS
# 1.1), TLS12 (TLS 1.2), TLS13 (TLS 1.3).
# If unset, Prometheus will use Go default maximum version, which is TLS 1.3.
# See MaxVersion in https://pkg.go.dev/crypto/tls#Config.
[ max_version: <string> ]

<oauth2>

使用客户端凭据授权类型的 OAuth 2.0 身份验证。Prometheus 使用给定的客户端访问密钥和秘密密钥从指定的端点获取访问令牌。

client_id: <string>
[ client_secret: <secret> ]

# Read the client secret from a file.
# It is mutually exclusive with `client_secret`.
[ client_secret_file: <filename> ]

# Scopes for the token request.
scopes:
  [ - <string> ... ]

# The URL to fetch the token from.
token_url: <string>

# Optional parameters to append to the token URL.
endpoint_params:
  [ <string>: <string> ... ]

# Configures the token request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]
# Comma-separated string that can contain IPs, CIDR notation, domain names
# that should be excluded from proxying. IP and domain names can
# contain port numbers.
[ no_proxy: <string> ]
# Use proxy URL indicated by environment variables (HTTP_PROXY, https_proxy, HTTPs_PROXY, https_proxy, and no_proxy)
[ proxy_from_environment: <boolean> | default: false ]
# Specifies headers to send to proxies during CONNECT requests.
[ proxy_connect_header:
  [ <string>: [<secret>, ...] ] ]

# Custom HTTP headers to be sent along with each request.
# Headers that are set by Prometheus itself can't be overwritten.
http_headers:
  # Header name.
  [ <string>:
    # Header values.
    [ values: [<string>, ...] ]
    # Headers values. Hidden in configuration page.
    [ secrets: [<secret>, ...] ]
    # Files to read header values from.
    [ files: [<string>, ...] ] ]

<azure_sd_config>

Azure SD 配置允许从 Azure VM 检索抓取目标。

发现至少需要以下权限

  • Microsoft.Compute/virtualMachines/read:VM 发现所需
  • Microsoft.Network/networkInterfaces/read:VM 发现所需
  • Microsoft.Compute/virtualMachineScaleSets/virtualMachines/read:规模集 (VMSS) 发现所需
  • Microsoft.Compute/virtualMachineScaleSets/virtualMachines/networkInterfaces/read:规模集 (VMSS) 发现所需

重新标记期间,以下元标签在目标上可用

  • __meta_azure_machine_id: 机器 ID
  • __meta_azure_machine_location: 机器运行的位置
  • __meta_azure_machine_name: 机器名称
  • __meta_azure_machine_computer_name: 机器计算机名称
  • __meta_azure_machine_os_type: 机器操作系统
  • __meta_azure_machine_private_ip: 机器的私有 IP
  • __meta_azure_machine_public_ip: 机器的公共 IP(如果存在)
  • __meta_azure_machine_resource_group: 机器的资源组
  • __meta_azure_machine_tag_<tagname>: 机器的每个标签值
  • __meta_azure_machine_scale_set: VM 所属的规模集的名称(仅当您使用规模集时才会设置此值)
  • __meta_azure_machine_size: 机器大小
  • __meta_azure_subscription_id: 订阅 ID
  • __meta_azure_tenant_id: 租户 ID

请参阅下面有关 Azure 发现的配置选项

# The information to access the Azure API.
# The Azure environment.
[ environment: <string> | default = AzurePublicCloud ]

# The authentication method, either OAuth, ManagedIdentity or SDK.
# See https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview
# SDK authentication method uses environment variables by default.
# See https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication
[ authentication_method: <string> | default = OAuth]
# The subscription ID. Always required.
subscription_id: <string>
# Optional tenant ID. Only required with authentication_method OAuth.
[ tenant_id: <string> ]
# Optional client ID. Only required with authentication_method OAuth.
[ client_id: <string> ]
# Optional client secret. Only required with authentication_method OAuth.
[ client_secret: <secret> ]

# Optional resource group name. Limits discovery to this resource group.
[ resource_group: <string> ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 300s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<consul_sd_config>

Consul SD 配置允许从 Consul 的 Catalog API 检索抓取目标。

重新标记期间,以下元标签在目标上可用

  • __meta_consul_address: 目标的地址
  • __meta_consul_dc: 目标的 数据中心名称
  • __meta_consul_health: 服务的健康状态
  • __meta_consul_partition: 注册服务的管理分区名称
  • __meta_consul_metadata_<key>: 目标的每个节点元数据键值
  • __meta_consul_node: 为目标定义的节点名称
  • __meta_consul_service_address: 目标的服务地址
  • __meta_consul_service_id: 目标的服务 ID
  • __meta_consul_service_metadata_<key>: 目标的每个服务元数据键值
  • __meta_consul_service_port: 目标的服务端口
  • __meta_consul_service: 目标所属的服务的名称
  • __meta_consul_tagged_address_<key>: 目标的每个节点标记地址键值
  • __meta_consul_tags: 目标的标签列表,由标签分隔符连接
# The information to access the Consul API. It is to be defined
# as the Consul documentation requires.
[ server: <host> | default = "localhost:8500" ]
# Prefix for URIs for when consul is behind an API gateway (reverse proxy).
[ path_prefix: <string> ]
[ token: <secret> ]
[ datacenter: <string> ]
# Namespaces are only supported in Consul Enterprise.
[ namespace: <string> ]
# Admin Partitions are only supported in Consul Enterprise.
[ partition: <string> ]
[ scheme: <string> | default = "http" ]
# The username and password fields are deprecated in favor of the basic_auth configuration.
[ username: <string> ]
[ password: <secret> ]

# A list of services for which targets are retrieved. If omitted, all services
# are scraped.
services:
  [ - <string> ]

# A Consul Filter expression used to filter the catalog results
# See https://www.consul.io/api-docs/catalog#list-services to know more
# about the filter expressions that can be used.
[ filter: <string> ]

# The `tags` and `node_meta` fields are deprecated in Consul in favor of `filter`.
# An optional list of tags used to filter nodes for a given service. Services must contain all tags in the list.
tags:
  [ - <string> ]

# Node metadata key/value pairs to filter nodes for a given service. As of Consul 1.14, consider `filter` instead.
[ node_meta:
  [ <string>: <string> ... ] ]

# The string by which Consul tags are joined into the tag label.
[ tag_separator: <string> | default = , ]

# Allow stale Consul results (see https://www.consul.io/api/features/consistency.html). Will reduce load on Consul.
[ allow_stale: <boolean> | default = true ]

# The time after which the provided names are refreshed.
# On large setup it might be a good idea to increase this value because the catalog will change all the time.
[ refresh_interval: <duration> | default = 30s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

请注意,用于抓取目标的 IP 号和端口组装为 <__meta_consul_address>:<__meta_consul_service_port>。但是,在某些 Consul 设置中,相关地址在 __meta_consul_service_address 中。在这些情况下,您可以使用重新标记功能来替换特殊的 __address__ 标签。

重新标记阶段是根据任意标签过滤服务的首选且功能更强大的方式。对于有数千个服务的用户,直接使用 Consul API 会更有效率,该 API 基本支持过滤节点(目前按节点元数据和单个标签)。

<digitalocean_sd_config>

DigitalOcean SD 配置允许从 DigitalOcean 的 Droplets API 检索抓取目标。此服务发现默认使用公共 IPv4 地址,但可以使用重新标记进行更改,如Prometheus digitalocean-sd 配置文件中所述。

重新标记期间,以下元标签在目标上可用

  • __meta_digitalocean_droplet_id: droplet 的 ID
  • __meta_digitalocean_droplet_name: droplet 的名称
  • __meta_digitalocean_image: droplet 镜像的 slug
  • __meta_digitalocean_image_name: droplet 镜像的显示名称
  • __meta_digitalocean_private_ipv4: droplet 的私有 IPv4 地址
  • __meta_digitalocean_public_ipv4: droplet 的公共 IPv4 地址
  • __meta_digitalocean_public_ipv6: droplet 的公共 IPv6 地址
  • __meta_digitalocean_region: droplet 所在的区域
  • __meta_digitalocean_size: droplet 的大小
  • __meta_digitalocean_status: droplet 的状态
  • __meta_digitalocean_features: droplet 功能的逗号分隔列表
  • __meta_digitalocean_tags: droplet 标签的逗号分隔列表
  • __meta_digitalocean_vpc: droplet 的 VPC 的 ID
# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the droplets are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<docker_sd_config>

Docker SD 配置允许从 Docker Engine 主机检索抓取目标。

此 SD 发现 “容器”,并将为容器配置暴露的每个网络 IP 和端口创建一个目标。

可用的元标签

  • __meta_docker_container_id: 容器的 ID
  • __meta_docker_container_name: 容器的名称
  • __meta_docker_container_network_mode: 容器的网络模式
  • __meta_docker_container_label_<labelname>: 容器的每个标签,任何不支持的字符都转换为下划线
  • __meta_docker_network_id: 网络的 ID
  • __meta_docker_network_name: 网络的名称
  • __meta_docker_network_ingress: 网络是否为入口
  • __meta_docker_network_internal: 网络是否为内部网络
  • __meta_docker_network_label_<labelname>: 网络的每个标签,任何不支持的字符都转换为下划线
  • __meta_docker_network_scope: 网络的范围
  • __meta_docker_network_ip: 容器在此网络中的 IP
  • __meta_docker_port_private: 容器上的端口
  • __meta_docker_port_public: 如果存在端口映射,则为外部端口
  • __meta_docker_port_public_ip: 如果存在端口映射,则为公共 IP

请参阅下文了解 Docker 发现的配置选项

# Address of the Docker daemon.
host: <string>

# The port to scrape metrics from, when `role` is nodes, and for discovered
# tasks and services that don't have published ports.
[ port: <int> | default = 80 ]

# The host to use if the container is in host networking mode.
[ host_networking_host: <string> | default = "localhost" ]

# Sort all non-nil networks in ascending order based on network name and
# get the first network if the container has multiple networks defined, 
# thus avoiding collecting duplicate targets.
[ match_first_network: <boolean> | default = true ]

# Optional filters to limit the discovery process to a subset of available
# resources.
# The available filters are listed in the upstream documentation:
# https://docs.docker.net.cn/engine/api/v1.40/#operation/ContainerList
[ filters:
  [ - name: <string>
      values: <string>, [...] ]

# The time after which the containers are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

重新标记阶段是过滤容器的首选且更强大的方法。对于拥有数千个容器的用户,直接使用 Docker API 可能更有效,该 API 对过滤容器有基本支持(使用 filters)。

有关为 Docker Engine 配置 Prometheus 的详细示例,请参阅 此 Prometheus 配置文件示例

<dockerswarm_sd_config>

Docker Swarm SD 配置允许从 Docker Swarm 引擎检索抓取目标。

可以配置以下角色之一来发现目标

services

services 角色发现所有 Swarm 服务,并将其端口作为目标暴露。对于服务的每个已发布端口,都会生成一个目标。如果服务没有已发布端口,则会使用 SD 配置中定义的 port 参数为每个服务创建一个目标。

可用的元标签

  • __meta_dockerswarm_service_id: 服务的 ID
  • __meta_dockerswarm_service_name: 服务的名称
  • __meta_dockerswarm_service_mode: 服务的模式
  • __meta_dockerswarm_service_endpoint_port_name: 端点端口的名称(如果可用)
  • __meta_dockerswarm_service_endpoint_port_publish_mode: 端点端口的发布模式
  • __meta_dockerswarm_service_label_<labelname>: 服务的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_service_task_container_hostname: 目标容器的主机名(如果可用)
  • __meta_dockerswarm_service_task_container_image: 目标的容器镜像
  • __meta_dockerswarm_service_updating_status: 服务状态(如果可用)
  • __meta_dockerswarm_network_id: 网络的 ID
  • __meta_dockerswarm_network_name: 网络的名称
  • __meta_dockerswarm_network_ingress: 网络是否为入口
  • __meta_dockerswarm_network_internal: 网络是否为内部网络
  • __meta_dockerswarm_network_label_<labelname>: 网络的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_network_scope: 网络的范围

tasks

tasks 角色发现所有 Swarm 任务,并将其端口作为目标暴露。对于任务的每个已发布端口,都会生成一个目标。如果任务没有已发布端口,则会使用 SD 配置中定义的 port 参数为每个任务创建一个目标。

可用的元标签

  • __meta_dockerswarm_container_label_<labelname>: 容器的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_task_id: 任务的 ID
  • __meta_dockerswarm_task_container_id: 任务的容器 ID
  • __meta_dockerswarm_task_desired_state: 任务的期望状态
  • __meta_dockerswarm_task_slot: 任务的槽位
  • __meta_dockerswarm_task_state: 任务的状态
  • __meta_dockerswarm_task_port_publish_mode: 任务端口的发布模式
  • __meta_dockerswarm_service_id: 服务的 ID
  • __meta_dockerswarm_service_name: 服务的名称
  • __meta_dockerswarm_service_mode: 服务的模式
  • __meta_dockerswarm_service_label_<labelname>: 服务的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_network_id: 网络的 ID
  • __meta_dockerswarm_network_name: 网络的名称
  • __meta_dockerswarm_network_ingress: 网络是否为入口
  • __meta_dockerswarm_network_internal: 网络是否为内部网络
  • __meta_dockerswarm_network_label_<labelname>: 网络的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_network_label: 网络的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_network_scope: 网络的范围
  • __meta_dockerswarm_node_id: 节点的 ID
  • __meta_dockerswarm_node_hostname: 节点的主机名
  • __meta_dockerswarm_node_address: 节点的地址
  • __meta_dockerswarm_node_availability: 节点的可用性
  • __meta_dockerswarm_node_label_<labelname>: 节点的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_node_platform_architecture: 节点的架构
  • __meta_dockerswarm_node_platform_os: 节点的操作系统
  • __meta_dockerswarm_node_role: 节点的角色
  • __meta_dockerswarm_node_status: 节点的状态

对于使用 mode=host 发布的端口,不会填充 __meta_dockerswarm_network_* 元标签。

nodes

nodes 角色用于发现 Swarm 节点

可用的元标签

  • __meta_dockerswarm_node_address: 节点的地址
  • __meta_dockerswarm_node_availability: 节点的可用性
  • __meta_dockerswarm_node_engine_version: 节点引擎的版本
  • __meta_dockerswarm_node_hostname: 节点的主机名
  • __meta_dockerswarm_node_id: 节点的 ID
  • __meta_dockerswarm_node_label_<labelname>: 节点的每个标签,任何不支持的字符都转换为下划线
  • __meta_dockerswarm_node_manager_address: 节点的管理器组件的地址
  • __meta_dockerswarm_node_manager_leader: 节点的管理器组件的领导状态(true 或 false)
  • __meta_dockerswarm_node_manager_reachability: 节点的管理器组件的可达性
  • __meta_dockerswarm_node_platform_architecture: 节点的架构
  • __meta_dockerswarm_node_platform_os: 节点的操作系统
  • __meta_dockerswarm_node_role: 节点的角色
  • __meta_dockerswarm_node_status: 节点的状态

请参阅下文了解 Docker Swarm 发现的配置选项

# Address of the Docker daemon.
host: <string>

# Role of the targets to retrieve. Must be `services`, `tasks`, or `nodes`.
role: <string>

# The port to scrape metrics from, when `role` is nodes, and for discovered
# tasks and services that don't have published ports.
[ port: <int> | default = 80 ]

# Optional filters to limit the discovery process to a subset of available
# resources.
# The available filters are listed in the upstream documentation:
# Services: https://docs.docker.net.cn/engine/api/v1.40/#operation/ServiceList
# Tasks: https://docs.docker.net.cn/engine/api/v1.40/#operation/TaskList
# Nodes: https://docs.docker.net.cn/engine/api/v1.40/#operation/NodeList
[ filters:
  [ - name: <string>
      values: <string>, [...] ]

# The time after which the service discovery data is refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

重新标记阶段是过滤任务、服务或节点的首选且更强大的方法。对于拥有数千个任务的用户,直接使用 Swarm API 可能更有效,该 API 对过滤节点有基本支持(使用 filters)。

有关为 Docker Swarm 配置 Prometheus 的详细示例,请参阅 此 Prometheus 配置文件示例

<dns_sd_config>

基于 DNS 的服务发现配置允许指定一组 DNS 域名,这些域名会定期查询以发现目标列表。要联系的 DNS 服务器是从 /etc/resolv.conf 读取的。

此服务发现方法仅支持基本的 DNS A、AAAA、MX、NS 和 SRV 记录查询,而不支持 RFC6763 中指定的高级 DNS-SD 方法。

重新标记期间,以下元标签在目标上可用

  • __meta_dns_name: 生成发现目标的记录名称。
  • __meta_dns_srv_record_target: SRV 记录的目标字段
  • __meta_dns_srv_record_port: SRV 记录的端口字段
  • __meta_dns_mx_record_target: MX 记录的目标字段
  • __meta_dns_ns_record_target: NS 记录的目标字段
# A list of DNS domain names to be queried.
names:
  [ - <string> ]

# The type of DNS query to perform. One of SRV, A, AAAA, MX or NS.
[ type: <string> | default = 'SRV' ]

# The port number used if the query type is not SRV.
[ port: <int>]

# The time after which the provided names are refreshed.
[ refresh_interval: <duration> | default = 30s ]

<ec2_sd_config>

EC2 SD 配置允许从 AWS EC2 实例检索抓取目标。默认情况下使用私有 IP 地址,但可以使用重新标记将其更改为公共 IP 地址。

使用的 IAM 凭证必须具有 ec2:DescribeInstances 权限才能发现抓取目标,并且可以选择具有 ec2:DescribeAvailabilityZones 权限(如果您希望可用区 ID 可用作标签,请参阅下文)。

重新标记期间,以下元标签在目标上可用

  • __meta_ec2_ami: EC2 Amazon Machine Image
  • __meta_ec2_architecture: 实例的架构
  • __meta_ec2_availability_zone: 实例运行所在的可用区
  • __meta_ec2_availability_zone_id: 实例运行所在的可用区 ID(需要 ec2:DescribeAvailabilityZones
  • __meta_ec2_instance_id: EC2 实例 ID
  • __meta_ec2_instance_lifecycle: EC2 实例的生命周期,仅对 'spot' 或 'scheduled' 实例设置,否则不存在
  • __meta_ec2_instance_state: EC2 实例的状态
  • __meta_ec2_instance_type: EC2 实例的类型
  • __meta_ec2_ipv6_addresses: 分配给实例网络接口的 IPv6 地址的逗号分隔列表(如果存在)
  • __meta_ec2_owner_id: 拥有 EC2 实例的 AWS 账户的 ID
  • __meta_ec2_platform: 操作系统平台,在 Windows 服务器上设置为 “windows”,否则不存在
  • __meta_ec2_primary_ipv6_addresses: 实例的主要 IPv6 地址的逗号分隔列表(如果存在)。该列表根据每个相应网络接口在附件顺序中的位置进行排序。
  • __meta_ec2_primary_subnet_id: 主要网络接口的子网 ID(如果可用)
  • __meta_ec2_private_dns_name: 实例的私有 DNS 名称(如果可用)
  • __meta_ec2_private_ip: 实例的私有 IP 地址(如果存在)
  • __meta_ec2_public_dns_name:实例的公有 DNS 名称(如果可用)
  • __meta_ec2_public_ip:实例的公有 IP 地址(如果可用)
  • __meta_ec2_region:实例所在的区域
  • __meta_ec2_subnet_id:实例运行所在的子网 ID 的逗号分隔列表(如果可用)
  • __meta_ec2_tag_<tagkey>:实例的每个标签值
  • __meta_ec2_vpc_id:实例运行所在的 VPC 的 ID(如果可用)

请参阅下文了解 EC2 发现的配置选项

# The information to access the EC2 API.

# The AWS region. If blank, the region from the instance metadata is used.
[ region: <string> ]

# Custom endpoint to be used.
[ endpoint: <string> ]

# The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
# and `AWS_SECRET_ACCESS_KEY` are used.
[ access_key: <string> ]
[ secret_key: <secret> ]
# Named AWS profile used to connect to the API.
[ profile: <string> ]

# AWS Role ARN, an alternative to using AWS API keys.
[ role_arn: <string> ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# Filters can be used optionally to filter the instance list by other criteria.
# Available filter criteria can be found here:
# https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstances.html
# Filter API documentation: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_Filter.html
filters:
  [ - name: <string>
      values: <string>, [...] ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

标签重命名阶段是基于任意标签过滤目标的更首选和更强大的方法。对于拥有数千个实例的用户,直接使用 EC2 API 可能会更有效,因为它支持过滤实例。

<openstack_sd_config>

OpenStack SD 配置允许从 OpenStack Nova 实例检索抓取目标。

可以配置以下 <openstack_role> 类型之一来发现目标

hypervisor

hypervisor 角色为每个 Nova hypervisor 节点发现一个目标。目标地址默认为 hypervisor 的 host_ip 属性。

重新标记期间,以下元标签在目标上可用

  • __meta_openstack_hypervisor_host_ip:hypervisor 节点的 IP 地址。
  • __meta_openstack_hypervisor_hostname:hypervisor 节点的名称。
  • __meta_openstack_hypervisor_id:hypervisor 节点的 ID。
  • __meta_openstack_hypervisor_state:hypervisor 节点的状态。
  • __meta_openstack_hypervisor_status:hypervisor 节点的状况。
  • __meta_openstack_hypervisor_type:hypervisor 节点的类型。

instance

instance 角色为 Nova 实例的每个网络接口发现一个目标。目标地址默认为网络接口的私有 IP 地址。

重新标记期间,以下元标签在目标上可用

  • __meta_openstack_address_pool:私有 IP 的池。
  • __meta_openstack_instance_flavor:OpenStack 实例的风味名称,如果风味名称不可用,则为风味 ID。
  • __meta_openstack_instance_id:OpenStack 实例 ID。
  • __meta_openstack_instance_image:OpenStack 实例正在使用的镜像 ID。
  • __meta_openstack_instance_name:OpenStack 实例名称。
  • __meta_openstack_instance_status:OpenStack 实例的状态。
  • __meta_openstack_private_ip:OpenStack 实例的私有 IP。
  • __meta_openstack_project_id:拥有此实例的项目(租户)。
  • __meta_openstack_public_ip:OpenStack 实例的公有 IP。
  • __meta_openstack_tag_<key>:实例的每个元数据项,任何不支持的字符都转换为下划线。
  • __meta_openstack_user_id:拥有租户的用户帐户。

请参阅下文了解 OpenStack 发现的配置选项

# The information to access the OpenStack API.

# The OpenStack role of entities that should be discovered.
role: <openstack_role>

# The OpenStack Region.
region: <string>

# identity_endpoint specifies the HTTP endpoint that is required to work with
# the Identity API of the appropriate version. While it's ultimately needed by
# all of the identity services, it will often be populated by a provider-level
# function.
[ identity_endpoint: <string> ]

# username is required if using Identity V2 API. Consult with your provider's
# control panel to discover your account's username. In Identity V3, either
# userid or a combination of username and domain_id or domain_name are needed.
[ username: <string> ]
[ userid: <string> ]

# password for the Identity V2 and V3 APIs. Consult with your provider's
# control panel to discover your account's preferred method of authentication.
[ password: <secret> ]

# At most one of domain_id and domain_name must be provided if using username
# with Identity V3. Otherwise, either are optional.
[ domain_name: <string> ]
[ domain_id: <string> ]

# The project_id and project_name fields are optional for the Identity V2 API.
# Some providers allow you to specify a project_name instead of the project_id.
# Some require both. Your provider's authentication policies will determine
# how these fields influence authentication.
[ project_name: <string> ]
[ project_id: <string> ]

# The application_credential_id or application_credential_name fields are
# required if using an application credential to authenticate. Some providers
# allow you to create an application credential to authenticate rather than a
# password.
[ application_credential_name: <string> ]
[ application_credential_id: <string> ]

# The application_credential_secret field is required if using an application
# credential to authenticate.
[ application_credential_secret: <secret> ]

# Whether the service discovery should list all instances for all projects.
# It is only relevant for the 'instance' role and usually requires admin permissions.
[ all_tenants: <boolean> | default: false ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# The availability of the endpoint to connect to. Must be one of public, admin or internal.
[ availability: <string> | default = "public" ]

# TLS configuration.
tls_config:
  [ <tls_config> ]

<ovhcloud_sd_config>

OVHcloud SD 配置允许使用其 API 从 OVHcloud 的专用服务器VPS检索抓取目标。Prometheus 将定期检查 REST 端点,并为每个发现的服务器创建一个目标。该角色将尝试使用公有 IPv4 地址作为默认地址,如果没有,它将尝试使用 IPv6 地址。这可以使用标签重命名来更改。对于 OVHcloud 的公共云实例,您可以使用openstacksdconfig

VPS

  • __meta_ovhcloud_vps_cluster:服务器的集群
  • __meta_ovhcloud_vps_datacenter:服务器的数据中心
  • __meta_ovhcloud_vps_disk:服务器的磁盘
  • __meta_ovhcloud_vps_display_name:服务器的显示名称
  • __meta_ovhcloud_vps_ipv4:服务器的 IPv4
  • __meta_ovhcloud_vps_ipv6:服务器的 IPv6
  • __meta_ovhcloud_vps_keymap:服务器的 KVM 键盘布局
  • __meta_ovhcloud_vps_maximum_additional_ip:服务器的最大附加 IP 数
  • __meta_ovhcloud_vps_memory_limit:服务器的内存限制
  • __meta_ovhcloud_vps_memory:服务器的内存
  • __meta_ovhcloud_vps_monitoring_ip_blocks:服务器的监控 IP 块
  • __meta_ovhcloud_vps_name:服务器的名称
  • __meta_ovhcloud_vps_netboot_mode:服务器的网络启动模式
  • __meta_ovhcloud_vps_offer_type:服务器的报价类型
  • __meta_ovhcloud_vps_offer:服务器的报价
  • __meta_ovhcloud_vps_state:服务器的状态
  • __meta_ovhcloud_vps_vcore:服务器的虚拟核心数
  • __meta_ovhcloud_vps_version:服务器的版本
  • __meta_ovhcloud_vps_zone:服务器的区域

专用服务器

  • __meta_ovhcloud_dedicated_server_commercial_range:服务器的商业范围
  • __meta_ovhcloud_dedicated_server_datacenter:服务器的数据中心
  • __meta_ovhcloud_dedicated_server_ipv4:服务器的 IPv4
  • __meta_ovhcloud_dedicated_server_ipv6:服务器的 IPv6
  • __meta_ovhcloud_dedicated_server_link_speed:服务器的链路速度
  • __meta_ovhcloud_dedicated_server_name:服务器的名称
  • __meta_ovhcloud_dedicated_server_no_intervention:是否禁用服务器的数据中心干预
  • __meta_ovhcloud_dedicated_server_os:服务器的操作系统
  • __meta_ovhcloud_dedicated_server_rack:服务器的机架
  • __meta_ovhcloud_dedicated_server_reverse:服务器的反向 DNS 名称
  • __meta_ovhcloud_dedicated_server_server_id:服务器的 ID
  • __meta_ovhcloud_dedicated_server_state:服务器的状态
  • __meta_ovhcloud_dedicated_server_support_level:服务器的支持级别

请参阅下文了解 OVHcloud 发现的配置选项

# Access key to use. https://api.ovh.com
application_key: <string>
application_secret: <secret>
consumer_key: <secret>
# Service of the targets to retrieve. Must be `vps` or `dedicated_server`.
service: <string>
# API endpoint. https://github.com/ovh/go-ovh#supported-apis
[ endpoint: <string> | default = "ovh-eu" ]
# Refresh interval to re-read the resources list.
[ refresh_interval: <duration> | default = 60s ]

<puppetdb_sd_config>

PuppetDB SD 配置允许从 PuppetDB 资源检索抓取目标。

此 SD 发现资源,并将为 API 返回的每个资源创建一个目标。

资源地址是资源的 certname,可以在标签重命名期间更改。

重新标记期间,以下元标签在目标上可用

  • __meta_puppetdb_query:Puppet 查询语言 (PQL) 查询
  • __meta_puppetdb_certname:与资源关联的节点的名称
  • __meta_puppetdb_resource:资源类型、标题和参数的 SHA-1 哈希值,用于标识
  • __meta_puppetdb_type:资源类型
  • __meta_puppetdb_title:资源标题
  • __meta_puppetdb_exported:资源是否已导出("true""false"
  • __meta_puppetdb_tags:资源标签的逗号分隔列表
  • __meta_puppetdb_file:声明资源的清单文件
  • __meta_puppetdb_environment:与资源关联的节点的环境
  • __meta_puppetdb_parameter_<parametername>:资源的参数

请参阅下文了解 PuppetDB 发现的配置选项

# The URL of the PuppetDB root query endpoint.
url: <string>

# Puppet Query Language (PQL) query. Only resources are supported.
# https://puppet.com/docs/puppetdb/latest/api/query/v4/pql.html
query: <string>

# Whether to include the parameters as meta labels.
# Due to the differences between parameter types and Prometheus labels,
# some parameters might not be rendered. The format of the parameters might
# also change in future releases.
#
# Note: Enabling this exposes parameters in the Prometheus UI and API. Make sure
# that you don't have secrets exposed as parameters if you enable this.
[ include_parameters: <boolean> | default = false ]

# Refresh interval to re-read the resources list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

有关使用 PuppetDB 配置 Prometheus 的详细示例,请参阅此示例 Prometheus 配置文件

<file_sd_config>

基于文件的服务发现提供了一种更通用的方法来配置静态目标,并用作插入自定义服务发现机制的接口。

它读取一组包含零个或多个 <static_config> 的列表的文件。对所有已定义文件的更改都会通过磁盘监视检测到并立即应用。

在监视这些单个文件的更改时,也会隐式地监视父目录。这是为了有效地处理原子重命名,并检测与配置的 glob 匹配的新文件。如果父目录包含大量其他文件,这可能会导致问题,因为即使与它们相关的事件不相关,也会监视这些文件中的每一个。

可以提供 YAML 或 JSON 格式的文件。仅应用导致格式良好的目标组的更改。

文件必须包含静态配置的列表,使用以下格式

JSON

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

YAML

- targets:
  [ - '<host>' ]
  labels:
    [ <labelname>: <labelvalue> ... ]

作为回退,还会按指定的刷新间隔定期重新读取文件内容。

标签重命名阶段,每个目标都有一个元标签 __meta_filepath。其值设置为从中提取目标的路径。

此发现机制有集成列表。

# Patterns for files from which target groups are extracted.
files:
  [ - <filename_pattern> ... ]

# Refresh interval to re-read the files.
[ refresh_interval: <duration> | default = 5m ]

其中,<filename_pattern> 可以是以 .json.yml.yaml 结尾的路径。最后一个路径段可能包含单个 *,它匹配任何字符序列,例如 my/path/tg_*.json

<gce_sd_config>

GCE SD 配置允许从 GCP GCE 实例检索抓取目标。默认情况下使用私有 IP 地址,但可以使用标签重命名将其更改为公有 IP 地址。

重新标记期间,以下元标签在目标上可用

  • __meta_gce_instance_id:实例的数字 ID
  • __meta_gce_instance_name:实例的名称
  • __meta_gce_label_<labelname>:实例的每个 GCE 标签,任何不支持的字符都转换为下划线
  • __meta_gce_machine_type:实例的机器类型的完整或部分 URL
  • __meta_gce_metadata_<name>:实例的每个元数据项
  • __meta_gce_network:实例的网络 URL
  • __meta_gce_private_ip:实例的私有 IP 地址
  • __meta_gce_interface_ipv4_<name>:每个命名接口的 IPv4 地址
  • __meta_gce_project:实例运行所在的 GCP 项目
  • __meta_gce_public_ip:实例的公有 IP 地址(如果存在)
  • __meta_gce_subnetwork: 实例的子网 URL
  • __meta_gce_tags: 实例标签的逗号分隔列表
  • __meta_gce_zone: 实例运行所在的 GCE 区域 URL

有关 GCE 发现的配置选项,请参见下文

# The information to access the GCE API.

# The GCP Project
project: <string>

# The zone of the scrape targets. If you need multiple zones use multiple
# gce_sd_configs.
zone: <string>

# Filter can be used optionally to filter the instance list by other criteria
# Syntax of this filter string is described here in the filter query parameter section:
# https://cloud.google.com/compute/docs/reference/latest/instances/list
[ filter: <string> ]

# Refresh interval to re-read the instance list
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# The tag separator is used to separate the tags on concatenation
[ tag_separator: <string> | default = , ]

凭据由 Google Cloud SDK 默认客户端通过查找以下位置来发现,优先选择找到的第一个位置

  1. GOOGLE_APPLICATION_CREDENTIALS 环境变量指定的 JSON 文件
  2. 位于众所周知的路径 $HOME/.config/gcloud/application_default_credentials.json 中的 JSON 文件
  3. 从 GCE 元数据服务器获取

如果 Prometheus 在 GCE 中运行,则与其运行的实例关联的服务帐户应至少具有对计算资源的只读权限。如果在 GCE 之外运行,请确保创建适当的服务帐户并将凭据文件放在预期的位置之一。

<hetzner_sd_config>

Hetzner SD 配置允许从 Hetzner Cloud API 和 Robot API 检索抓取目标。此服务发现默认使用公共 IPv4 地址,但可以使用重新标记进行更改,如 Prometheus hetzner-sd 配置文件 中所示。

以下元标签在 重新标记 期间在所有目标上可用

  • __meta_hetzner_server_id: 服务器的 ID
  • __meta_hetzner_server_name: 服务器的名称
  • __meta_hetzner_server_status: 服务器的状态
  • __meta_hetzner_public_ipv4: 服务器的公共 ipv4 地址
  • __meta_hetzner_public_ipv6_network: 服务器的公共 ipv6 网络 (/64)
  • __meta_hetzner_datacenter: 服务器的数据中心

以下标签仅适用于 role 设置为 hcloud 的目标

  • __meta_hetzner_hcloud_image_name: 服务器的镜像名称
  • __meta_hetzner_hcloud_image_description: 服务器镜像的描述
  • __meta_hetzner_hcloud_image_os_flavor: 服务器镜像的操作系统类型
  • __meta_hetzner_hcloud_image_os_version: 服务器镜像的操作系统版本
  • __meta_hetzner_hcloud_datacenter_location: 服务器的位置
  • __meta_hetzner_hcloud_datacenter_location_network_zone: 服务器的网络区域
  • __meta_hetzner_hcloud_server_type: 服务器的类型
  • __meta_hetzner_hcloud_cpu_cores: 服务器的 CPU 核心数
  • __meta_hetzner_hcloud_cpu_type: 服务器的 CPU 类型(共享或专用)
  • __meta_hetzner_hcloud_memory_size_gb: 服务器的内存量(以 GB 为单位)
  • __meta_hetzner_hcloud_disk_size_gb: 服务器的磁盘大小(以 GB 为单位)
  • __meta_hetzner_hcloud_private_ipv4_<networkname>: 服务器在给定网络中的私有 ipv4 地址
  • __meta_hetzner_hcloud_label_<labelname>: 服务器的每个标签,任何不支持的字符都将转换为下划线
  • __meta_hetzner_hcloud_labelpresent_<labelname>: 对于服务器的每个标签,值为 true,任何不支持的字符都将转换为下划线

以下标签仅适用于 role 设置为 robot 的目标

  • __meta_hetzner_robot_product: 服务器的产品
  • __meta_hetzner_robot_cancelled: 服务器的取消状态
# The Hetzner role of entities that should be discovered.
# One of robot or hcloud.
role: <string>

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the servers are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<http_sd_config>

基于 HTTP 的服务发现提供了一种更通用的方式来配置静态目标,并充当插入自定义服务发现机制的接口。

它从包含零个或多个 <static_config> 列表的 HTTP 端点获取目标。目标必须回复 HTTP 200 响应。HTTP 标头 Content-Type 必须为 application/json,并且正文必须是有效的 JSON。

示例响应正文

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

端点会按照指定的刷新间隔定期查询。prometheus_sd_http_failures_total 计数器指标会跟踪刷新失败的次数。

每个目标在 重新标记阶段 都有一个元标签 __meta_url。其值设置为从中提取目标的 URL。

# URL from which the targets are fetched.
url: <string>

# Refresh interval to re-query the endpoint.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<ionos_sd_config>

IONOS SD 配置允许从 IONOS Cloud API 检索抓取目标。此服务发现默认使用第一个 NIC 的 IP 地址,但可以使用重新标记进行更改。以下元标签在 重新标记 期间在所有目标上可用

  • __meta_ionos_server_availability_zone: 服务器的可用区
  • __meta_ionos_server_boot_cdrom_id: 服务器从中启动的 CD-ROM 的 ID
  • __meta_ionos_server_boot_image_id: 服务器从中启动的引导镜像或快照的 ID
  • __meta_ionos_server_boot_volume_id: 引导卷的 ID
  • __meta_ionos_server_cpu_family: 服务器的 CPU 系列
  • __meta_ionos_server_id: 服务器的 ID
  • __meta_ionos_server_ip: 分配给服务器的所有 IP 的逗号分隔列表
  • __meta_ionos_server_lifecycle: 服务器资源的生命周期状态
  • __meta_ionos_server_name: 服务器的名称
  • __meta_ionos_server_nic_ip_<nic_name>: IP 的逗号分隔列表,按附加到服务器的每个 NIC 的名称分组
  • __meta_ionos_server_servers_id: 服务器所属的服务器的 ID
  • __meta_ionos_server_state: 服务器的执行状态
  • __meta_ionos_server_type: 服务器的类型
# The unique ID of the data center.
datacenter_id: <string>

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the servers are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<kubernetes_sd_config>

Kubernetes SD 配置允许从 Kubernetes REST API 检索抓取目标,并始终与集群状态保持同步。

可以配置以下 role 类型之一来发现目标

node

node 角色为每个集群节点发现一个目标,其地址默认为 Kubelet 的 HTTP 端口。目标地址默认为 Kubernetes 节点对象中 NodeInternalIPNodeExternalIPNodeLegacyHostIPNodeHostName 地址类型顺序中第一个存在的地址。

可用的元标签

  • __meta_kubernetes_node_name: 节点对象的名称。
  • __meta_kubernetes_node_provider_id: 节点对象的云提供商名称。
  • __meta_kubernetes_node_label_<labelname>: 节点对象中的每个标签,任何不支持的字符都将转换为下划线。
  • __meta_kubernetes_node_labelpresent_<labelname>: 对于节点对象中的每个标签,值为 true,任何不支持的字符都将转换为下划线。
  • __meta_kubernetes_node_annotation_<annotationname>: 节点对象中的每个注释。
  • __meta_kubernetes_node_annotationpresent_<annotationname>: 对于节点对象中的每个注释,值为 true
  • __meta_kubernetes_node_address_<address_type>: 如果存在,则为每个节点地址类型的第一个地址。

此外,节点的 instance 标签将设置为从 API 服务器检索的节点名称。

service

service 角色为每个服务的每个服务端口发现一个目标。这通常用于对服务进行黑盒监控。地址将设置为服务的 Kubernetes DNS 名称和相应的服务端口。

可用的元标签

  • __meta_kubernetes_namespace: 服务对象的命名空间。
  • __meta_kubernetes_service_annotation_<annotationname>: 服务对象中的每个注释。
  • __meta_kubernetes_service_annotationpresent_<annotationname>: 对于服务对象的每个注释,值为 "true"。
  • __meta_kubernetes_service_cluster_ip: 服务的集群 IP 地址。(不适用于 ExternalName 类型的服务)
  • __meta_kubernetes_service_loadbalancer_ip: 负载均衡器的 IP 地址。(适用于 LoadBalancer 类型的服务)
  • __meta_kubernetes_service_external_name: 服务的 DNS 名称。(适用于 ExternalName 类型的服务)
  • __meta_kubernetes_service_label_<labelname>: 服务对象中的每个标签,任何不支持的字符都将转换为下划线。
  • __meta_kubernetes_service_labelpresent_<labelname>: 对于服务对象的每个标签,值为 true,任何不支持的字符都将转换为下划线。
  • __meta_kubernetes_service_name: 服务对象的名称。
  • __meta_kubernetes_service_port_name: 目标的服务的端口名称。
  • __meta_kubernetes_service_port_number: 目标的服务的端口号。
  • __meta_kubernetes_service_port_protocol: 目标的服务的端口协议。
  • __meta_kubernetes_service_type: 服务的类型。

pod

pod 角色发现所有 Pod,并将它们的容器作为目标公开。对于容器的每个声明的端口,都会生成一个目标。如果容器没有指定端口,则会为每个容器创建一个无端口目标,以便通过重新标记手动添加端口。

可用的元标签

  • __meta_kubernetes_namespace: Pod 对象的命名空间。
  • __meta_kubernetes_pod_name: Pod 对象的名称。
  • __meta_kubernetes_pod_ip: Pod 对象的 Pod IP。
  • __meta_kubernetes_pod_label_<labelname>: Pod 对象中的每个标签,任何不支持的字符都将转换为下划线。
  • __meta_kubernetes_pod_labelpresent_<labelname>: 对于 Pod 对象中的每个标签,值为 true,任何不支持的字符都将转换为下划线。
  • __meta_kubernetes_pod_annotation_<annotationname>: Pod 对象中的每个注释。
  • __meta_kubernetes_pod_annotationpresent_<annotationname>: 对于 Pod 对象中的每个注释,值为 true
  • __meta_kubernetes_pod_container_init: 如果容器是 InitContainer,则值为 true
  • __meta_kubernetes_pod_container_name: 目标地址指向的容器的名称。
  • __meta_kubernetes_pod_container_id: 目标地址指向的容器的 ID。该 ID 的格式为 <type>://<container_id>
  • __meta_kubernetes_pod_container_image: 容器正在使用的镜像。
  • __meta_kubernetes_pod_container_port_name: 容器端口的名称。
  • __meta_kubernetes_pod_container_port_number: 容器端口的号码。
  • __meta_kubernetes_pod_container_port_protocol: 容器端口的协议。
  • __meta_kubernetes_pod_ready: 对于 Pod 的就绪状态,设置为 truefalse
  • __meta_kubernetes_pod_phase: 在 生命周期 中设置为 PendingRunningSucceededFailedUnknown
  • __meta_kubernetes_pod_node_name: Pod 调度到的节点的名称。
  • __meta_kubernetes_pod_host_ip: Pod 对象的当前主机 IP。
  • __meta_kubernetes_pod_uid: Pod 对象的 UID。
  • __meta_kubernetes_pod_controller_kind: Pod 控制器的对象类型。
  • __meta_kubernetes_pod_controller_name: Pod 控制器的名称。

endpoints

endpoints 角色从服务的列出端点中发现目标。对于每个端点地址,每个端口发现一个目标。如果端点由 Pod 支持,则还会发现 Pod 的所有其他未绑定到端点端口的容器端口作为目标。

可用的元标签

  • __meta_kubernetes_namespace:端点对象所在的命名空间。
  • __meta_kubernetes_endpoints_name:端点对象的名称。
  • __meta_kubernetes_endpoints_label_<labelname>:端点对象的每个标签,任何不支持的字符都转换为下划线。
  • __meta_kubernetes_endpoints_labelpresent_<labelname>:对于端点对象的每个标签,值为 true,任何不支持的字符都转换为下划线。
  • __meta_kubernetes_endpoints_annotation_<annotationname>:端点对象的每个注解。
  • __meta_kubernetes_endpoints_annotationpresent_<annotationname>:对于端点对象的每个注解,值为 true
  • 对于直接从端点列表发现的所有目标(那些不是从底层 Pod 推断出的),会附加以下标签
    • __meta_kubernetes_endpoint_hostname:端点的主机名。
    • __meta_kubernetes_endpoint_node_name:托管端点的节点的名称。
    • __meta_kubernetes_endpoint_ready:设置为 truefalse 表示端点的就绪状态。
    • __meta_kubernetes_endpoint_port_name:端点端口的名称。
    • __meta_kubernetes_endpoint_port_protocol:端点端口的协议。
    • __meta_kubernetes_endpoint_address_target_kind:端点地址目标类型。
    • __meta_kubernetes_endpoint_address_target_name:端点地址目标的名称。
  • 如果端点属于服务,则会附加 role: service 发现的所有标签。
  • 对于由 Pod 支持的所有目标,会附加 role: pod 发现的所有标签。

endpointslice

endpointslice 角色从现有的 endpointslice 中发现目标。对于 endpointslice 对象中引用的每个端点地址,都会发现一个目标。如果端点由 Pod 支持,则 Pod 的所有未绑定到端点端口的其他容器端口也会被发现为目标。

此角色需要 discovery.k8s.io/v1 API 版本(自 Kubernetes v1.21 可用)。

可用的元标签

  • __meta_kubernetes_namespace:端点对象所在的命名空间。
  • __meta_kubernetes_endpointslice_name:endpointslice 对象的名称。
  • __meta_kubernetes_endpointslice_label_<labelname>:endpointslice 对象的每个标签,任何不支持的字符都转换为下划线。
  • __meta_kubernetes_endpointslice_labelpresent_<labelname>:对于 endpointslice 对象的每个标签,值为 true,任何不支持的字符都转换为下划线。
  • __meta_kubernetes_endpointslice_annotation_<annotationname>:endpointslice 对象的每个注解。
  • __meta_kubernetes_endpointslice_annotationpresent_<annotationname>:对于 endpointslice 对象的每个注解,值为 true
  • 对于直接从 endpointslice 列表发现的所有目标(那些不是从底层 Pod 推断出的),会附加以下标签
    • __meta_kubernetes_endpointslice_address_target_kind:引用对象的类型。
    • __meta_kubernetes_endpointslice_address_target_name:引用对象的名称。
    • __meta_kubernetes_endpointslice_address_type:目标地址的 IP 协议族。
    • __meta_kubernetes_endpointslice_endpoint_conditions_ready:设置为 truefalse 表示引用的端点的就绪状态。
    • __meta_kubernetes_endpointslice_endpoint_conditions_serving:设置为 truefalse 表示引用的端点的服务状态。
    • __meta_kubernetes_endpointslice_endpoint_conditions_terminating:设置为 truefalse 表示引用的端点的终止状态。
    • __meta_kubernetes_endpointslice_endpoint_topology_kubernetes_io_hostname:托管引用的端点的节点的名称。
    • __meta_kubernetes_endpointslice_endpoint_topology_present_kubernetes_io_hostname:一个标志,指示引用的对象是否具有 kubernetes.io/hostname 注解。
    • __meta_kubernetes_endpointslice_endpoint_hostname:引用端点的主机名。
    • __meta_kubernetes_endpointslice_endpoint_node_name:托管引用端点的节点的名称。
    • __meta_kubernetes_endpointslice_endpoint_zone:引用端点所在的区域。
    • __meta_kubernetes_endpointslice_port:引用端点的端口。
    • __meta_kubernetes_endpointslice_port_name:引用端点的命名端口。
    • __meta_kubernetes_endpointslice_port_protocol:引用端点的协议。
  • 如果端点属于服务,则会附加 role: service 发现的所有标签。
  • 对于由 Pod 支持的所有目标,会附加 role: pod 发现的所有标签。

ingress

ingress 角色为每个 Ingress 的每个路径发现一个目标。这通常用于对 Ingress 进行黑盒监控。地址将设置为 Ingress 规范中指定的主机。

此角色需要 networking.k8s.io/v1 API 版本(自 Kubernetes v1.19 可用)。

可用的元标签

  • __meta_kubernetes_namespace:Ingress 对象所在的命名空间。
  • __meta_kubernetes_ingress_name:Ingress 对象的名称。
  • __meta_kubernetes_ingress_label_<labelname>:Ingress 对象的每个标签,任何不支持的字符都转换为下划线。
  • __meta_kubernetes_ingress_labelpresent_<labelname>:对于 Ingress 对象的每个标签,值为 true,任何不支持的字符都转换为下划线。
  • __meta_kubernetes_ingress_annotation_<annotationname>:Ingress 对象的每个注解。
  • __meta_kubernetes_ingress_annotationpresent_<annotationname>:对于 Ingress 对象的每个注解,值为 true
  • __meta_kubernetes_ingress_class_name:Ingress 规范中的类名(如果存在)。
  • __meta_kubernetes_ingress_scheme:Ingress 的协议方案,如果设置了 TLS 配置,则为 https。默认为 http
  • __meta_kubernetes_ingress_path:Ingress 规范中的路径。默认为 /

有关 Kubernetes 发现的配置选项,请参见下文

# The information to access the Kubernetes API.

# The API server addresses. If left empty, Prometheus is assumed to run inside
# of the cluster and will discover API servers automatically and use the pod's
# CA certificate and bearer token file at /var/run/secrets/kubernetes.io/serviceaccount/.
[ api_server: <host> ]

# The Kubernetes role of entities that should be discovered.
# One of endpoints, endpointslice, service, pod, node, or ingress.
role: <string>

# Optional path to a kubeconfig file.
# Note that api_server and kube_config are mutually exclusive.
[ kubeconfig_file: <filename> ]

# Optional namespace discovery. If omitted, all namespaces are used.
namespaces:
  own_namespace: <boolean>
  names:
    [ - <string> ]

# Optional label and field selectors to limit the discovery process to a subset of available resources.
# See https://kubernetes.ac.cn/docs/concepts/overview/working-with-objects/field-selectors/
# and https://kubernetes.ac.cn/docs/concepts/overview/working-with-objects/labels/ to learn more about the possible
# filters that can be used. The endpoints role supports pod, service and endpoints selectors.
# The pod role supports node selectors when configured with `attach_metadata: {node: true}`.
# Other roles only support selectors matching the role itself (e.g. node role can only contain node selectors).

# Note: When making decision about using field/label selector make sure that this
# is the best approach - it will prevent Prometheus from reusing single list/watch
# for all scrape configs. This might result in a bigger load on the Kubernetes API,
# because per each selector combination there will be additional LIST/WATCH. On the other hand,
# if you just want to monitor small subset of pods in large cluster it's recommended to use selectors.
# Decision, if selectors should be used or not depends on the particular situation.
[ selectors:
  [ - role: <string>
    [ label: <string> ]
    [ field: <string> ] ]]

# Optional metadata to attach to discovered targets. If omitted, no additional metadata is attached.
attach_metadata:
# Attaches node metadata to discovered targets. Valid for roles: pod, endpoints, endpointslice.
# When set to true, Prometheus must have permissions to get Nodes.
  [ node: <boolean> | default = false ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

有关配置 Prometheus for Kubernetes 的详细示例,请参见此 Prometheus 配置文件示例

您可能需要查看第三方 Prometheus Operator,它自动化了 Kubernetes 之上的 Prometheus 设置。

<kuma_sd_config>

Kuma SD 配置允许从 Kuma 控制平面检索抓取目标。

此 SD 基于 Kuma 数据平面代理,通过 MADS v1(监控分配发现服务)xDS API 发现“监控分配”,并为启用 Prometheus 的网格中的每个代理创建一个目标。

以下元标签可用于每个目标

  • __meta_kuma_mesh:代理的网格名称
  • __meta_kuma_dataplane:代理的名称
  • __meta_kuma_service:代理关联的服务名称
  • __meta_kuma_label_<tagname>:代理的每个标签

有关 Kuma MonitoringAssignment 发现的配置选项,请参见下文

# Address of the Kuma Control Plane's MADS xDS server.
server: <string>

# Client id is used by Kuma Control Plane to compute Monitoring Assignment for specific Prometheus backend. 
# This is useful when migrating between multiple Prometheus backends, or having separate backend for each Mesh.
# When not specified, system hostname/fqdn will be used if available, if not `prometheus` will be used.
[ client_id: <string> ]

# The time to wait between polling update requests.
[ refresh_interval: <duration> | default = 30s ]

# The time after which the monitoring assignments are refreshed.
[ fetch_timeout: <duration> | default = 2m ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

重新标记阶段是过滤代理和用户定义标签的首选且更强大的方法。

<lightsail_sd_config>

Lightsail SD 配置允许从 AWS Lightsail 实例检索抓取目标。默认情况下使用私有 IP 地址,但可以使用重新标记更改为公共 IP 地址。

重新标记期间,以下元标签在目标上可用

  • __meta_lightsail_availability_zone:实例运行所在的可用区
  • __meta_lightsail_blueprint_id:Lightsail 蓝图 ID
  • __meta_lightsail_bundle_id:Lightsail 捆绑包 ID
  • __meta_lightsail_instance_name:Lightsail 实例的名称
  • __meta_lightsail_instance_state:Lightsail 实例的状态
  • __meta_lightsail_instance_support_code:Lightsail 实例的支持代码
  • __meta_lightsail_ipv6_addresses:分配给实例网络接口的 IPv6 地址的逗号分隔列表(如果存在)
  • __meta_lightsail_private_ip:实例的私有 IP 地址
  • __meta_lightsail_public_ip:实例的公共 IP 地址(如果可用)
  • __meta_lightsail_region:实例的区域
  • __meta_lightsail_tag_<tagkey>:实例的每个标签值

有关 Lightsail 发现的配置选项,请参见下文

# The information to access the Lightsail API.

# The AWS region. If blank, the region from the instance metadata is used.
[ region: <string> ]

# Custom endpoint to be used.
[ endpoint: <string> ]

# The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
# and `AWS_SECRET_ACCESS_KEY` are used.
[ access_key: <string> ]
[ secret_key: <secret> ]
# Named AWS profile used to connect to the API.
[ profile: <string> ]

# AWS Role ARN, an alternative to using AWS API keys.
[ role_arn: <string> ]

# Refresh interval to re-read the instance list.
[ refresh_interval: <duration> | default = 60s ]

# The port to scrape metrics from. If using the public IP address, this must
# instead be specified in the relabeling rule.
[ port: <int> | default = 80 ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<linode_sd_config>

Linode SD 配置允许从 Linode 的 Linode APIv4 检索抓取目标。此服务发现默认使用公共 IPv4 地址,但可以使用重新标记进行更改,如 Prometheus linode-sd 配置文件中所示。

必须创建具有以下作用域的 Linode APIv4 令牌:linodes:read_onlyips:read_onlyevents:read_only

重新标记期间,以下元标签在目标上可用

  • __meta_linode_instance_id:Linode 实例的 ID
  • __meta_linode_instance_label:Linode 实例的标签
  • __meta_linode_image:Linode 实例映像的 Slug
  • __meta_linode_private_ipv4:Linode 实例的私有 IPv4
  • __meta_linode_public_ipv4:Linode 实例的公共 IPv4
  • __meta_linode_public_ipv6:Linode 实例的公共 IPv6
  • __meta_linode_private_ipv4_rdns:Linode 实例第一个私有 IPv4 的反向 DNS
  • __meta_linode_public_ipv4_rdns:Linode 实例第一个公共 IPv4 的反向 DNS
  • __meta_linode_public_ipv6_rdns:Linode 实例第一个公共 IPv6 的反向 DNS
  • __meta_linode_region:Linode 实例的区域
  • __meta_linode_type:Linode 实例的类型
  • __meta_linode_status:Linode 实例的状态
  • __meta_linode_tags:Linode 实例的标签列表,由标签分隔符连接
  • __meta_linode_group:Linode 实例所属的显示组
  • __meta_linode_gpus:Linode 实例的 GPU 数量
  • __meta_linode_hypervisor:为 Linode 实例提供支持的虚拟化软件
  • __meta_linode_backups:Linode 实例的备份服务状态
  • __meta_linode_specs_disk_bytes:Linode 实例有权访问的存储空间量
  • __meta_linode_specs_memory_bytes:Linode 实例有权访问的 RAM 量
  • __meta_linode_specs_vcpus:此 Linode 有权访问的 VCPU 数量
  • __meta_linode_specs_transfer_bytes:Linode 实例每月分配的网络传输量
  • __meta_linode_extra_ips:分配给 Linode 实例的所有额外 IPv4 地址的列表,由标签分隔符连接
  • __meta_linode_ipv6_ranges:分配给 Linode 实例的带有掩码的 IPv6 范围列表,由标签分隔符连接

# Optional region to filter on.
[ region: <string> ]

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The string by which Linode Instance tags are joined into the tag label.
[ tag_separator: <string> | default = , ]

# The time after which the linode instances are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<marathon_sd_config>

Marathon SD 配置允许使用 Marathon REST API 检索抓取目标。Prometheus 将定期检查 REST 端点以查找当前正在运行的任务,并为每个至少有一个健康任务的应用程序创建一个目标组。

重新标记期间,以下元标签在目标上可用

  • __meta_marathon_app:应用程序的名称(斜杠替换为破折号)
  • __meta_marathon_image:使用的 Docker 镜像的名称(如果可用)
  • __meta_marathon_task:Mesos 任务的 ID
  • __meta_marathon_app_label_<labelname>:附加到应用程序的任何 Marathon 标签,任何不支持的字符都转换为下划线
  • __meta_marathon_port_definition_label_<labelname>:端口定义标签,任何不支持的字符都转换为下划线
  • __meta_marathon_port_mapping_label_<labelname>: 端口映射标签,任何不支持的字符都会转换为下划线。
  • __meta_marathon_port_index: 端口索引号(例如,PORT11

请查看下文了解 Marathon 发现的配置选项。

# List of URLs to be used to contact Marathon servers.
# You need to provide at least one server URL.
servers:
  - <string>

# Polling interval
[ refresh_interval: <duration> | default = 30s ]

# Optional authentication information for token-based authentication
# https://docs.mesosphere.com/1.11/security/ent/iam-api/#passing-an-authentication-token
# It is mutually exclusive with `auth_token_file` and other authentication mechanisms.
[ auth_token: <secret> ]

# Optional authentication information for token-based authentication
# https://docs.mesosphere.com/1.11/security/ent/iam-api/#passing-an-authentication-token
# It is mutually exclusive with `auth_token` and other authentication mechanisms.
[ auth_token_file: <filename> ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

默认情况下,Marathon 中列出的每个应用程序都会被 Prometheus 抓取。如果并非所有服务都提供 Prometheus 指标,则可以使用 Marathon 标签和 Prometheus 重新标记来控制实际抓取的实例。请参阅 Prometheus marathon-sd 配置文件,了解如何设置 Marathon 应用程序和 Prometheus 配置的实际示例。

默认情况下,所有应用程序都将作为 Prometheus 中的单个作业显示(在配置文件中指定),也可以使用重新标记进行更改。

<nerve_sd_config>

Nerve SD 配置允许从 AirBnB 的 Nerve 中检索抓取目标,这些目标存储在 Zookeeper 中。

重新标记期间,以下元标签在目标上可用

  • __meta_nerve_path: Zookeeper 中端点节点的完整路径
  • __meta_nerve_endpoint_host: 端点的主机
  • __meta_nerve_endpoint_port: 端点的端口
  • __meta_nerve_endpoint_name: 端点的名称
# The Zookeeper servers.
servers:
  - <host>
# Paths can point to a single service, or the root of a tree of services.
paths:
  - <string>
[ timeout: <duration> | default = 10s ]

<nomad_sd_config>

Nomad SD 配置允许从 Nomad 的服务 API 中检索抓取目标。

重新标记期间,以下元标签在目标上可用

  • __meta_nomad_address: 目标的 Service 地址
  • __meta_nomad_dc: 目标的数据中心名称
  • __meta_nomad_namespace: 目标的命名空间
  • __meta_nomad_node_id: 为目标定义的节点名称
  • __meta_nomad_service: 目标所属的服务的名称
  • __meta_nomad_service_address: 目标的 Service 地址
  • __meta_nomad_service_id: 目标的 Service ID
  • __meta_nomad_service_port: 目标的 Service 端口
  • __meta_nomad_tags: 目标的标签列表,由标签分隔符连接。
# The information to access the Nomad API. It is to be defined
# as the Nomad documentation requires.
[ allow_stale: <boolean> | default = true ]
[ namespace: <string> | default = default ]
[ refresh_interval: <duration> | default = 60s ]
[ region: <string> | default = global ]
# The URL to connect to the API.
[ server: <string> ]
[ tag_separator: <string> | default = ,]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<serverset_sd_config>

Serverset SD 配置允许从 Serversets 中检索抓取目标,这些目标存储在 Zookeeper 中。 Serversets 通常被 FinagleAurora 使用。

重新标记期间,以下元标签在目标上可用

  • __meta_serverset_path: Zookeeper 中 serverset 成员节点的完整路径
  • __meta_serverset_endpoint_host: 默认端点的主机
  • __meta_serverset_endpoint_port: 默认端点的端口
  • __meta_serverset_endpoint_host_<endpoint>: 给定端点的主机
  • __meta_serverset_endpoint_port_<endpoint>: 给定端点的端口
  • __meta_serverset_shard: 成员的分片号
  • __meta_serverset_status: 成员的状态
# The Zookeeper servers.
servers:
  - <host>
# Paths can point to a single serverset, or the root of a tree of serversets.
paths:
  - <string>
[ timeout: <duration> | default = 10s ]

Serverset 数据必须是 JSON 格式,目前不支持 Thrift 格式。

<triton_sd_config>

Triton SD 配置允许从 Container Monitor 发现端点检索抓取目标。

可以配置以下 <triton_role> 类型之一来发现目标

container

container 角色发现每个属于 account 的“虚拟机”的一个目标。这些是 SmartOS 区域或 lx/KVM/bhyve 品牌的区域。

重新标记期间,以下元标签在目标上可用

  • __meta_triton_groups: 属于目标的组列表,以逗号分隔符连接
  • __meta_triton_machine_alias: 目标容器的别名
  • __meta_triton_machine_brand: 目标容器的品牌
  • __meta_triton_machine_id: 目标容器的 UUID
  • __meta_triton_machine_image: 目标容器的镜像类型
  • __meta_triton_server_id: 目标容器运行所在的服务器 UUID

cn

cn 角色为构成 Triton 基础设施的每个计算节点(也称为“服务器”或“全局区域”)发现一个目标。 account 必须是 Triton 运营商,并且当前必须拥有至少一个 container

重新标记期间,以下元标签在目标上可用

  • __meta_triton_machine_alias: 目标的主机名(需要 triton-cmon 1.7.0 或更高版本)
  • __meta_triton_machine_id: 目标的 UUID

请查看下文了解 Triton 发现的配置选项

# The information to access the Triton discovery API.

# The account to use for discovering new targets.
account: <string>

# The type of targets to discover, can be set to:
# * "container" to discover virtual machines (SmartOS zones, lx/KVM/bhyve branded zones) running on Triton
# * "cn" to discover compute nodes (servers/global zones) making up the Triton infrastructure
[ role : <string> | default = "container" ]

# The DNS suffix which should be applied to target.
dns_suffix: <string>

# The Triton discovery endpoint (e.g. 'cmon.us-east-3b.triton.zone'). This is
# often the same value as dns_suffix.
endpoint: <string>

# A list of groups for which targets are retrieved, only supported when `role` == `container`.
# If omitted all containers owned by the requesting account are scraped.
groups:
  [ - <string> ... ]

# The port to use for discovery and metric scraping.
[ port: <int> | default = 9163 ]

# The interval which should be used for refreshing targets.
[ refresh_interval: <duration> | default = 60s ]

# The Triton discovery API version.
[ version: <int> | default = 1 ]

# TLS configuration.
tls_config:
  [ <tls_config> ]

<eureka_sd_config>

Eureka SD 配置允许使用 Eureka REST API 检索抓取目标。 Prometheus 将定期检查 REST 端点,并为每个应用实例创建一个目标。

重新标记期间,以下元标签在目标上可用

  • __meta_eureka_app_name: 应用程序的名称
  • __meta_eureka_app_instance_id: 应用程序实例的 ID
  • __meta_eureka_app_instance_hostname: 实例的主机名
  • __meta_eureka_app_instance_homepage_url: 应用程序实例的主页 URL
  • __meta_eureka_app_instance_statuspage_url: 应用程序实例的状态页面 URL
  • __meta_eureka_app_instance_healthcheck_url: 应用程序实例的健康检查 URL
  • __meta_eureka_app_instance_ip_addr: 应用程序实例的 IP 地址
  • __meta_eureka_app_instance_vip_address: 应用程序实例的 VIP 地址
  • __meta_eureka_app_instance_secure_vip_address: 应用程序实例的安全 VIP 地址
  • __meta_eureka_app_instance_status: 应用程序实例的状态
  • __meta_eureka_app_instance_port: 应用程序实例的端口
  • __meta_eureka_app_instance_port_enabled: 应用程序实例的启用端口
  • __meta_eureka_app_instance_secure_port: 应用程序实例的安全端口地址
  • __meta_eureka_app_instance_secure_port_enabled: 应用程序实例的安全端口
  • __meta_eureka_app_instance_country_id: 应用程序实例的国家/地区 ID
  • __meta_eureka_app_instance_metadata_<metadataname>: 应用程序实例元数据
  • __meta_eureka_app_instance_datacenterinfo_name: 应用程序实例的数据中心名称
  • __meta_eureka_app_instance_datacenterinfo_<metadataname>: 数据中心元数据

请查看下文了解 Eureka 发现的配置选项

# The URL to connect to the Eureka server.
server: <string>

# Refresh interval to re-read the app instance list.
[ refresh_interval: <duration> | default = 30s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

请参阅 Prometheus eureka-sd 配置文件,了解如何设置 Eureka 应用程序和 Prometheus 配置的实际示例。

<scaleway_sd_config>

Scaleway SD 配置允许从 Scaleway 实例裸机服务 检索抓取目标。

重新标记期间,以下元标签在目标上可用

实例角色

  • __meta_scaleway_instance_boot_type: 服务器的启动类型
  • __meta_scaleway_instance_hostname: 服务器的主机名
  • __meta_scaleway_instance_id: 服务器的 ID
  • __meta_scaleway_instance_image_arch: 服务器镜像的架构
  • __meta_scaleway_instance_image_id: 服务器镜像的 ID
  • __meta_scaleway_instance_image_name: 服务器镜像的名称
  • __meta_scaleway_instance_location_cluster_id: 服务器位置的集群 ID
  • __meta_scaleway_instance_location_hypervisor_id: 服务器位置的虚拟机监控程序 ID
  • __meta_scaleway_instance_location_node_id: 服务器位置的节点 ID
  • __meta_scaleway_instance_name: 服务器的名称
  • __meta_scaleway_instance_organization_id: 服务器的组织
  • __meta_scaleway_instance_private_ipv4: 服务器的私有 IPv4 地址
  • __meta_scaleway_instance_project_id: 服务器的项目 ID
  • __meta_scaleway_instance_public_ipv4: 服务器的公共 IPv4 地址
  • __meta_scaleway_instance_public_ipv6: 服务器的公共 IPv6 地址
  • __meta_scaleway_instance_region: 服务器的区域
  • __meta_scaleway_instance_security_group_id: 服务器的安全组 ID
  • __meta_scaleway_instance_security_group_name: 服务器的安全组名称
  • __meta_scaleway_instance_status: 服务器的状态
  • __meta_scaleway_instance_tags: 服务器的标签列表,由标签分隔符连接
  • __meta_scaleway_instance_type: 服务器的商业类型
  • __meta_scaleway_instance_zone: 服务器的区域(例如:fr-par-1,完整列表请参见 此处

此角色使用它按以下顺序找到的第一个地址:私有 IPv4、公共 IPv4、公共 IPv6。 可以使用重新标记更改此设置,如 Prometheus scaleway-sd 配置文件 中所示。如果实例在重新标记之前没有地址,则不会将其添加到目标列表中,并且您将无法对其进行重新标记。

裸机角色

  • __meta_scaleway_baremetal_id: 服务器的 ID
  • __meta_scaleway_baremetal_public_ipv4: 服务器的公共 IPv4 地址
  • __meta_scaleway_baremetal_public_ipv6: 服务器的公共 IPv6 地址
  • __meta_scaleway_baremetal_name: 服务器的名称
  • __meta_scaleway_baremetal_os_name: 服务器的操作系统名称
  • __meta_scaleway_baremetal_os_version: 服务器的操作系统版本
  • __meta_scaleway_baremetal_project_id: 服务器的项目 ID
  • __meta_scaleway_baremetal_status: 服务器的状态
  • __meta_scaleway_baremetal_tags: 服务器的标签列表,由标签分隔符连接
  • __meta_scaleway_baremetal_type: 服务器的商业类型
  • __meta_scaleway_baremetal_zone: 服务器的区域(例如:fr-par-1,完整列表请参见 此处

此角色默认使用公共 IPv4 地址。可以使用重新标记更改此设置,如 Prometheus scaleway-sd 配置文件 中所示。

请查看下文了解 Scaleway 发现的配置选项

# Access key to use. https://console.scaleway.com/project/credentials
access_key: <string>

# Secret key to use when listing targets. https://console.scaleway.com/project/credentials
# It is mutually exclusive with `secret_key_file`.
[ secret_key: <secret> ]

# Sets the secret key with the credentials read from the configured file.
# It is mutually exclusive with `secret_key`.
[ secret_key_file: <filename> ]

# Project ID of the targets.
project_id: <string>

# Role of the targets to retrieve. Must be `instance` or `baremetal`.
role: <string>

# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# API URL to use when doing the server listing requests.
[ api_url: <string> | default = "https://api.scaleway.com" ]

# Zone is the availability zone of your targets (e.g. fr-par-1).
[ zone: <string> | default = fr-par-1 ]

# NameFilter specify a name filter (works as a LIKE) to apply on the server listing request.
[ name_filter: <string> ]

# TagsFilter specify a tag filter (a server needs to have all defined tags to be listed) to apply on the server listing request.
tags_filter:
[ - <string> ]

# Refresh interval to re-read the targets list.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<uyuni_sd_config>

Uyuni SD 配置允许通过 Uyuni API 从托管系统检索抓取目标。

重新标记期间,以下元标签在目标上可用

  • __meta_uyuni_endpoint_name: 应用程序端点的名称
  • __meta_uyuni_exporter: 为目标公开指标的导出器
  • __meta_uyuni_groups: 目标的系统组
  • __meta_uyuni_metrics_path: 目标的指标路径
  • __meta_uyuni_minion_hostname: Uyuni 客户端的主机名
  • __meta_uyuni_primary_fqdn: Uyuni 客户端的主要 FQDN
  • __meta_uyuni_proxy_module: 如果为目标配置了导出器导出器代理,则为模块名称
  • __meta_uyuni_scheme: 请求使用的协议方案
  • __meta_uyuni_system_id: 客户端的系统 ID

请查看下文了解 Uyuni 发现的配置选项

# The URL to connect to the Uyuni server.
server: <string>

# Credentials are used to authenticate the requests to Uyuni API.
username: <string>
password: <secret>

# The entitlement string to filter eligible systems.
[ entitlement: <string> | default = monitoring_entitled ]

# The string by which Uyuni group names are joined into the groups label.
[ separator: <string> | default = , ]

# Refresh interval to re-read the managed targets list.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

请参阅 Prometheus uyuni-sd 配置文件,了解如何设置 Uyuni Prometheus 配置的实际示例。

<vultr_sd_config>

Vultr SD 配置允许从 Vultr 检索抓取目标。

此服务发现默认使用主 IPv4 地址,可以使用重新标记更改此设置,如 Prometheus vultr-sd 配置文件 中所示。

重新标记期间,以下元标签在目标上可用

  • __meta_vultr_instance_id : vultr 实例的唯一 ID。
  • __meta_vultr_instance_label : 用户为此实例提供的标签。
  • __meta_vultr_instance_os : 操作系统名称。
  • __meta_vultr_instance_os_id : 此实例使用的操作系统 ID。
  • __meta_vultr_instance_region:实例所在的区域 ID。
  • __meta_vultr_instance_plan:该计划的唯一 ID。
  • __meta_vultr_instance_main_ip:主要的 IPv4 地址。
  • __meta_vultr_instance_internal_ip:私有 IP 地址。
  • __meta_vultr_instance_main_ipv6:主要的 IPv6 地址。
  • __meta_vultr_instance_features:该实例可用的功能列表。
  • __meta_vultr_instance_tags:与该实例关联的标签列表。
  • __meta_vultr_instance_hostname:此实例的主机名。
  • __meta_vultr_instance_server_status:服务器的健康状态。
  • __meta_vultr_instance_vcpu_count:vCPU 的数量。
  • __meta_vultr_instance_ram_mb:RAM 的大小,单位为 MB。
  • __meta_vultr_instance_disk_gb:磁盘的大小,单位为 GB。
  • __meta_vultr_instance_allowed_bandwidth_gb:每月带宽配额,单位为 GB。
# The port to scrape metrics from.
[ port: <int> | default = 80 ]

# The time after which the instances are refreshed.
[ refresh_interval: <duration> | default = 60s ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

<static_config>

static_config 允许指定目标列表和它们的通用标签集。这是在抓取配置中指定静态目标的规范方式。

# The targets specified by the static config.
targets:
  [ - '<host>' ]

# Labels assigned to all metrics scraped from the targets.
labels:
  [ <labelname>: <labelvalue> ... ]

<relabel_config>

重新标记是一个强大的工具,可以在抓取目标之前动态重写目标的标签集。每个抓取配置可以配置多个重新标记步骤。它们按照在配置文件中出现的顺序应用于每个目标的标签集。

最初,除了配置的每个目标标签之外,目标的 job 标签被设置为相应抓取配置的 job_name 值。 __address__ 标签被设置为目标的 <host>:<port> 地址。重新标记之后,如果未在重新标记期间设置,则默认情况下将 instance 标签设置为 __address__ 的值。

__scheme____metrics_path__ 标签分别设置为目标的 scheme 和指标路径,如在 scrape_config 中指定。

__param_<name> 标签设置为第一个传递的名为 <name> 的 URL 参数的值,如在 scrape_config 中定义。

__scrape_interval____scrape_timeout__ 标签设置为目标的时间间隔和超时时间,如在 scrape_config 中指定。

在重新标记阶段,可能会有以 __meta_ 为前缀的额外标签可用。它们由提供目标的服务发现机制设置,并且因机制而异。

在目标重新标记完成后,以 __ 开头的标签将从标签集中删除。

如果重新标记步骤需要仅临时存储标签值(作为后续重新标记步骤的输入),请使用 __tmp 标签名称前缀。保证此前缀永远不会被 Prometheus 本身使用。

# The source labels select values from existing labels. Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
[ source_labels: '[' <labelname> [, ...] ']' ]

# Separator placed between concatenated source label values.
[ separator: <string> | default = ; ]

# Label to which the resulting value is written in a replace action.
# It is mandatory for replace actions. Regex capture groups are available.
[ target_label: <labelname> ]

# Regular expression against which the extracted value is matched.
[ regex: <regex> | default = (.*) ]

# Modulus to take of the hash of the source label values.
[ modulus: <int> ]

# Replacement value against which a regex replace is performed if the
# regular expression matches. Regex capture groups are available.
[ replacement: <string> | default = $1 ]

# Action to perform based on regex matching.
[ action: <relabel_action> | default = replace ]

<regex> 是任何有效的 RE2 正则表达式。它对于 replacekeepdroplabelmaplabeldroplabelkeep 操作是必需的。正则表达式在两端都锚定。要取消锚定正则表达式,请使用 .*<regex>.*

<relabel_action> 确定要执行的重新标记操作。

  • replace:将 regex 与连接的 source_labels 进行匹配。然后,将 target_label 设置为 replacement,其中 replacement 中的匹配组引用(${1}${2}、...)替换为它们的值。如果 regex 不匹配,则不进行替换。
  • lowercase:将连接的 source_labels 映射为小写形式。
  • uppercase:将连接的 source_labels 映射为大写形式。
  • keep:删除 regex 与连接的 source_labels 不匹配的目标。
  • drop:删除 regex 与连接的 source_labels 匹配的目标。
  • keepequal:删除连接的 source_labelstarget_label 不匹配的目标。
  • dropequal:删除连接的 source_labelstarget_label 匹配的目标。
  • hashmod:将 target_label 设置为连接的 source_labels 的哈希值的 modulus
  • labelmap:将 regex 与所有源标签名称进行匹配,而不仅仅是 source_labels 中指定的那些。然后,将匹配标签的值复制到由 replacement 给出的标签名称,其中 replacement 中的匹配组引用(${1}${2}、...)替换为它们的值。
  • labeldrop:将 regex 与所有标签名称进行匹配。任何匹配的标签都将从标签集中删除。
  • labelkeep:将 regex 与所有标签名称进行匹配。任何不匹配的标签都将从标签集中删除。

必须注意 labeldroplabelkeep,以确保在删除标签后,指标仍然具有唯一的标签。

<metric_relabel_configs>

指标重新标记在摄取之前作为最后一步应用于样本。它具有与目标重新标记相同的配置格式和操作。指标重新标记不适用于自动生成的时间序列,例如 up

一种用途是排除那些摄取成本过高的时间序列。

<alert_relabel_configs>

警报重新标记在警报发送到 Alertmanager 之前应用于警报。它具有与目标重新标记相同的配置格式和操作。警报重新标记在外部标签之后应用。

一种用途是确保具有不同外部标签的 Prometheus 服务器的高可用性对发送相同的警报。

<alertmanager_config>

alertmanager_config 部分指定 Prometheus 服务器将警报发送到的 Alertmanager 实例。它还提供参数来配置如何与这些 Alertmanager 通信。

Alertmanager 可以通过 static_configs 参数静态配置,或者使用支持的服务发现机制之一动态发现。

此外,relabel_configs 允许从发现的实体中选择 Alertmanager,并提供对使用的 API 路径的高级修改,该路径通过 __alerts_path__ 标签公开。

# Per-target Alertmanager timeout when pushing alerts.
[ timeout: <duration> | default = 10s ]

# The api version of Alertmanager.
[ api_version: <string> | default = v2 ]

# Prefix for the HTTP path alerts are pushed to.
[ path_prefix: <path> | default = / ]

# Configures the protocol scheme used for requests.
[ scheme: <scheme> | default = http ]

# Optionally configures AWS's Signature Verification 4 signing process to sign requests.
# Cannot be set at the same time as basic_auth, authorization, oauth2, azuread or google_iam.
# To use the default credentials from the AWS SDK, use `sigv4: {}`.
sigv4:
  # The AWS region. If blank, the region from the default credentials chain
  # is used.
  [ region: <string> ]

  # The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
  # and `AWS_SECRET_ACCESS_KEY` are used.
  [ access_key: <string> ]
  [ secret_key: <secret> ]

  # Named AWS profile used to authenticate.
  [ profile: <string> ]

  # AWS Role ARN, an alternative to using AWS API keys.
  [ role_arn: <string> ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]

# List of Eureka service discovery configurations.
eureka_sd_configs:
  [ - <eureka_sd_config> ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]

# List of DigitalOcean service discovery configurations.
digitalocean_sd_configs:
  [ - <digitalocean_sd_config> ... ]

# List of Docker service discovery configurations.
docker_sd_configs:
  [ - <docker_sd_config> ... ]

# List of Docker Swarm service discovery configurations.
dockerswarm_sd_configs:
  [ - <dockerswarm_sd_config> ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]

# List of Hetzner service discovery configurations.
hetzner_sd_configs:
  [ - <hetzner_sd_config> ... ]

# List of HTTP service discovery configurations.
http_sd_configs:
  [ - <http_sd_config> ... ]

 # List of IONOS service discovery configurations.
ionos_sd_configs:
  [ - <ionos_sd_config> ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

# List of Lightsail service discovery configurations.
lightsail_sd_configs:
  [ - <lightsail_sd_config> ... ]

# List of Linode service discovery configurations.
linode_sd_configs:
  [ - <linode_sd_config> ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]

# List of Nomad service discovery configurations.
nomad_sd_configs:
  [ - <nomad_sd_config> ... ]

# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]

# List of OVHcloud service discovery configurations.
ovhcloud_sd_configs:
  [ - <ovhcloud_sd_config> ... ]

# List of PuppetDB service discovery configurations.
puppetdb_sd_configs:
  [ - <puppetdb_sd_config> ... ]

# List of Scaleway service discovery configurations.
scaleway_sd_configs:
  [ - <scaleway_sd_config> ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]

# List of Uyuni service discovery configurations.
uyuni_sd_configs:
  [ - <uyuni_sd_config> ... ]

# List of Vultr service discovery configurations.
vultr_sd_configs:
  [ - <vultr_sd_config> ... ]

# List of labeled statically configured Alertmanagers.
static_configs:
  [ - <static_config> ... ]

# List of Alertmanager relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]

# List of alert relabel configurations.
alert_relabel_configs:
  [ - <relabel_config> ... ]

<remote_write>

write_relabel_configs 是在将样本发送到远程端点之前应用于样本的重新标记。写入重新标记在外部标签之后应用。这可以用于限制发送哪些样本。

这里有一个关于如何使用此功能的小演示

# The URL of the endpoint to send samples to.
url: <string>

# protobuf message to use when writing to the remote write endpoint.
#
# * The `prometheus.WriteRequest` represents the message introduced in Remote Write 1.0, which
# will be deprecated eventually.
# * The `io.prometheus.write.v2.Request` was introduced in Remote Write 2.0 and replaces the former,
# by improving efficiency and sending metadata, created timestamp and native histograms by default.
#
# Before changing this value, consult with your remote storage provider (or test) what message it supports.
# Read more on https://prometheus.ac.cn/docs/specs/remote_write_spec_2_0/#io-prometheus-write-v2-request
[ protobuf_message: <prometheus.WriteRequest | io.prometheus.write.v2.Request> | default = prometheus.WriteRequest ]

# Timeout for requests to the remote write endpoint.
[ remote_timeout: <duration> | default = 30s ]

# Custom HTTP headers to be sent along with each remote write request.
# Be aware that headers that are set by Prometheus itself can't be overwritten.
headers:
  [ <string>: <string> ... ]

# List of remote write relabel configurations.
write_relabel_configs:
  [ - <relabel_config> ... ]

# Name of the remote write config, which if specified must be unique among remote write configs.
# The name will be used in metrics and logging in place of a generated value to help users distinguish between
# remote write configs.
[ name: <string> ]

# Enables sending of exemplars over remote write. Note that exemplar storage itself must be enabled for exemplars to be scraped in the first place.
[ send_exemplars: <boolean> | default = false ]

# Enables sending of native histograms, also known as sparse histograms, over remote write.
# For the `io.prometheus.write.v2.Request` message, this option is noop (always true).
[ send_native_histograms: <boolean> | default = false ]

# When enabled, remote-write will resolve the URL host name via DNS, choose one of the IP addresses at random, and connect to it. 
# When disabled, remote-write relies on Go's standard behavior, which is to try to connect to each address in turn.
# The connection timeout applies to the whole operation, i.e. in the latter case it is spread over all attempt.
# This is an experimental feature, and its behavior might still change, or even get removed.
[ round_robin_dns: <boolean> | default = false ]

# Optionally configures AWS's Signature Verification 4 signing process to
# sign requests. Cannot be set at the same time as basic_auth, authorization, oauth2, or azuread.
# To use the default credentials from the AWS SDK, use `sigv4: {}`.
sigv4:
  # The AWS region. If blank, the region from the default credentials chain
  # is used.
  [ region: <string> ]

  # The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
  # and `AWS_SECRET_ACCESS_KEY` are used.
  [ access_key: <string> ]
  [ secret_key: <secret> ]

  # Named AWS profile used to authenticate.
  [ profile: <string> ]

  # AWS Role ARN, an alternative to using AWS API keys.
  [ role_arn: <string> ]

# Optional AzureAD configuration.
# Cannot be used at the same time as basic_auth, authorization, oauth2, sigv4 or google_iam.
azuread:
  # The Azure Cloud. Options are 'AzurePublic', 'AzureChina', or 'AzureGovernment'.
  [ cloud: <string> | default = AzurePublic ]

  # Azure User-assigned Managed identity.
  [ managed_identity:
      [ client_id: <string> ] ]  

  # Azure OAuth.
  [ oauth:
      [ client_id: <string> ]
      [ client_secret: <string> ]
      [ tenant_id: <string> ] ]

  # Azure SDK auth.
  # See https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication
  [ sdk:
      [ tenant_id: <string> ] ]

# WARNING: Remote write is NOT SUPPORTED by Google Cloud. This configuration is reserved for future use.
# Optional Google Cloud Monitoring configuration.
# Cannot be used at the same time as basic_auth, authorization, oauth2, sigv4 or azuread.
# To use the default credentials from the Google Cloud SDK, use `google_iam: {}`.
google_iam:
  # Service account key with monitoring write permissions.
  credentials_file: <file_name>

# Configures the queue used to write to remote storage.
queue_config:
  # Number of samples to buffer per shard before we block reading of more
  # samples from the WAL. It is recommended to have enough capacity in each
  # shard to buffer several requests to keep throughput up while processing
  # occasional slow remote requests.
  [ capacity: <int> | default = 10000 ]
  # Maximum number of shards, i.e. amount of concurrency.
  [ max_shards: <int> | default = 50 ]
  # Minimum number of shards, i.e. amount of concurrency.
  [ min_shards: <int> | default = 1 ]
  # Maximum number of samples per send.
  [ max_samples_per_send: <int> | default = 2000]
  # Maximum time a sample will wait for a send. The sample might wait less
  # if the buffer is full. Further time might pass due to potential retries.
  [ batch_send_deadline: <duration> | default = 5s ]
  # Initial retry delay. Gets doubled for every retry.
  [ min_backoff: <duration> | default = 30ms ]
  # Maximum retry delay.
  [ max_backoff: <duration> | default = 5s ]
  # Retry upon receiving a 429 status code from the remote-write storage.
  # This is experimental and might change in the future.
  [ retry_on_http_429: <boolean> | default = false ]
  # If set, any sample that is older than sample_age_limit
  # will not be sent to the remote storage. The default value is 0s,
  # which means that all samples are sent.
  [ sample_age_limit: <duration> | default = 0s ]

# Configures the sending of series metadata to remote storage
# if the `prometheus.WriteRequest` message was chosen. When
# `io.prometheus.write.v2.Request` is used, metadata is always sent.
#
# Metadata configuration is subject to change at any point
# or be removed in future releases.
metadata_config:
  # Whether metric metadata is sent to remote storage or not.
  [ send: <boolean> | default = true ]
  # How frequently metric metadata is sent to remote storage.
  [ send_interval: <duration> | default = 1m ]
  # Maximum number of samples per send.
  [ max_samples_per_send: <int> | default = 500]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
# enable_http2 defaults to false for remote-write.
[ <http_config> ]

这里有一个集成列表,其中包含了此功能。

<remote_read>

# The URL of the endpoint to query from.
url: <string>

# Name of the remote read config, which if specified must be unique among remote read configs.
# The name will be used in metrics and logging in place of a generated value to help users distinguish between
# remote read configs.
[ name: <string> ]

# An optional list of equality matchers which have to be
# present in a selector to query the remote read endpoint.
required_matchers:
  [ <labelname>: <labelvalue> ... ]

# Timeout for requests to the remote read endpoint.
[ remote_timeout: <duration> | default = 1m ]

# Custom HTTP headers to be sent along with each remote read request.
# Be aware that headers that are set by Prometheus itself can't be overwritten.
headers:
  [ <string>: <string> ... ]

# Whether reads should be made for queries for time ranges that
# the local storage should have complete data for.
[ read_recent: <boolean> | default = false ]

# Whether to use the external labels as selectors for the remote read endpoint.
[ filter_external_labels: <boolean> | default = true ]

# HTTP client settings, including authentication methods (such as basic auth and
# authorization), proxy configurations, TLS options, custom HTTP headers, etc.
[ <http_config> ]

这里有一个集成列表,其中包含了此功能。

<tsdb>

tsdb 允许您配置 TSDB 的运行时可重载配置设置。

# Configures how old an out-of-order/out-of-bounds sample can be w.r.t. the TSDB max time.
# An out-of-order/out-of-bounds sample is ingested into the TSDB as long as the timestamp
# of the sample is >= TSDB.MaxTime-out_of_order_time_window.
#
# When out_of_order_time_window is >0, the errors out-of-order and out-of-bounds are
# combined into a single error called 'too-old'; a sample is either (a) ingestible
# into the TSDB, i.e. it is an in-order sample or an out-of-order/out-of-bounds sample
# that is within the out-of-order window, or (b) too-old, i.e. not in-order
# and before the out-of-order window.
#
# When out_of_order_time_window is greater than 0, it also affects experimental agent. It allows 
# the agent's WAL to accept out-of-order samples that fall within the specified time window relative 
# to the timestamp of the last appended sample for the same series.
[ out_of_order_time_window: <duration> | default = 0s ]

<exemplars>

请注意,exemplar 存储仍被认为是实验性的,必须通过 --enable-feature=exemplar-storage 启用。

# Configures the maximum size of the circular buffer used to store exemplars for all series. Resizable during runtime.
[ max_exemplars: <int> | default = 100000 ]

<tracing_config>

tracing_config 配置通过 OTLP 协议将 Prometheus 中的跟踪导出到跟踪后端。跟踪目前是一项实验性功能,未来可能会发生变化。

# Client used to export the traces. Options are 'http' or 'grpc'.
[ client_type: <string> | default = grpc ]

# Endpoint to send the traces to. Should be provided in format <host>:<port>.
[ endpoint: <string> ]

# Sets the probability a given trace will be sampled. Must be a float from 0 through 1.
[ sampling_fraction: <float> | default = 0 ]

# If disabled, the client will use a secure connection.
[ insecure: <boolean> | default = false ]

# Key-value pairs to be used as headers associated with gRPC or HTTP requests.
headers:
  [ <string>: <string> ... ]

# Compression key for supported compression types. Supported compression: gzip.
[ compression: <string> ]

# Maximum time the exporter will wait for each batch export.
[ timeout: <duration> | default = 10s ]

# TLS configuration.
tls_config:
  [ <tls_config> ]

这份文档是开源的。请提交问题或拉取请求来帮助改进它。