Prometheus + Grafana 监控平台部署-小程博客

环境说明：CentOS 7 / Rocky Linux 9 / Ubuntu 20.04+

软件版本：Prometheus v2.45+ | Grafana v10.x | Node Exporter v1.6+

目标读者：运维工程师 / DevOps / 云原生开发者

一、架构概览

┌─────────────────────────────────────────────────────────────────────┐
│                          Grafana Web UI                              │
│                     (可视化看板 / 仪表盘 / 告警)                       │
└─────────────────────────────────────────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Prometheus Server                             │
│              (指标采集 / 时序存储 / PromQL 查询引擎)                   │
└─────────────────────────────────────────────────────────────────────┘
           │                    │                     │
           ▼                    ▼                     ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│   Node Exporter  │  │   cAdvisor       │  │   MySQL Exporter │
│   (服务器节点)    │  │   (Docker 容器)  │  │   (数据库)        │
└──────────────────┘  └──────────────────┘  └──────────────────┘
           │                    │                     │
           ▼                    ▼                     ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│  Linux Server 1  │  │  Docker Host     │  │  MySQL Server   │
└──────────────────┘  └──────────────────┘  └──────────────────┘

核心组件说明

组件	作用	端口
Prometheus	指标采集、存储、查询	9090
Grafana	可视化展示、仪表盘、告警	3000
Node Exporter	采集服务器硬件/OS 指标	9100
cAdvisor	采集 Docker 容器指标	8080
mysqld_exporter	采集 MySQL 数据库指标	9104

二、环境准备

2.1 系统要求

项目	最低要求	推荐配置
CPU	2 核	4 核+
内存	4 GB	8 GB+
磁盘	50 GB SSD	100 GB+ SSD
系统	CentOS 7+ / Ubuntu 18+	Rocky Linux 9 / Ubuntu 22.04

2.2 关闭防火墙（如需）

# CentOS / Rocky Linux
sudo systemctl stop firewalld
sudo systemctl disable firewalld

# Ubuntu
sudo ufw disable

# 验证
sudo systemctl status firewalld

2.3 关闭 SELinux（CentOS/Rocky）

# 临时关闭
sudo setenforce 0

# 永久关闭（需重启）
sudo sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

# 重启验证
sudo reboot

三、安装 Prometheus

3.1 创建系统用户

# 创建 Prometheus 用户（非 root 运行更安全）
sudo useradd --no-create-home --shell /bin/false prometheus
sudo useradd --no-create-home --shell /bin/false node_exporter

3.2 下载并安装 Prometheus

# 创建目录
sudo mkdir -p /etc/prometheus
sudo mkdir -p /var/lib/prometheus

# 下载 Prometheus（请根据系统架构选择对应版本）
cd /tmp
curl -LO https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz

# 解压
tar xvf prometheus-2.45.0.linux-amd64.tar.gz

# 复制二进制文件
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo cp -r prometheus-2.45.0.linux-amd64/console_libraries /etc/prometheus/
sudo cp -r prometheus-2.45.0.linux-amd64/consoles /etc/prometheus/

# 设置权限
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo chown -R prometheus:prometheus /var/lib/prometheus

# 清理
rm -rf /tmp/prometheus-2.45.0.linux-amd64*

3.3 配置 Prometheus

# 创建主配置文件
sudo vi /etc/prometheus/prometheus.yml

/etc/prometheus/prometheus.yml

global:
  # 全局采集间隔（默认 15s）
  scrape_interval: 15s
  # 评估规则间隔
  evaluation_interval: 15s
  # 外部标签（给所有指标打标签）
  external_labels:
    cluster: 'prod'
    env: 'production'

# Alertmanager 配置
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - 'localhost:9093'

# 规则文件
rule_files:
  - "/etc/prometheus/rules/*.yml"

# 采集目标配置
scrape_configs:
  # Prometheus 自身监控
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
        labels:
          service: 'prometheus'

  # Node Exporter（服务器监控）
  - job_name: 'node'
    static_configs:
      - targets: ['192.168.1.10:9100', '192.168.1.11:9100']
        labels:
          env: 'prod'

  # cAdvisor（Docker 监控）
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['192.168.1.20:8080']
        labels:
          service: 'docker'

  # MySQL Exporter（数据库监控）
  - job_name: 'mysql'
    static_configs:
      - targets: ['192.168.1.30:9104']
        labels:
          service: 'database'

3.4 创建 Prometheus Systemd 服务

# 创建 systemd 服务文件
sudo vi /etc/systemd/system/prometheus.service

/etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/usr/local/bin/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/var/lib/prometheus/ \
    --storage.tsdb.retention.time=15d \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --web.console.templates=/etc/prometheus/consoles \
    --web.enable-lifecycle \
    --storage.tsdb.retention.size=50GB

Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

3.5 启动 Prometheus

# 重新加载 systemd
sudo systemctl daemon-reload

# 启动服务
sudo systemctl start prometheus

# 设置开机自启
sudo systemctl enable prometheus

# 检查状态
sudo systemctl status prometheus

# 验证端口
sudo ss -tlnp | grep 9090

3.6 访问 Prometheus Web UI

打开浏览器访问：http://<服务器IP>:9090

┌─────────────────────────────────────────────────────────────────┐
│  Prometheus                              [Graph] [Table] [Status]│
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─ Status ──────────────────────────────────────────────────┐  │
│  │ ▶ Graph   ▶ Table   ▶ Status   ▶ Targets   ▶ Rules        │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
│  Expression (e.g. up{job="prometheus"}):                       │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │ up                                                       │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                    [Execute] [Console] [Graph] │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

四、安装 Node Exporter（服务器监控）

4.1 安装 Node Exporter

# 下载
cd /tmp
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar xvf node_exporter-1.6.1.linux-amd64.tar.gz

# 复制
sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/

# 设置权限
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

# 清理
rm -rf /tmp/node_exporter-*

4.2 配置 Systemd 服务

sudo vi /etc/systemd/system/node_exporter.service

/etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
After=network.target

[Service]
Type=simple
User=node_exporter
ExecStart=/usr/local/bin/node_exporter \
    --collector.cpu \
    --collector.meminfo \
    --collector.diskstats \
    --collector.filesystem \
    --collector.netdev \
    --collector.loadavg \
    --collector.processes \
    --web.listen-address=:9100

Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

4.3 启动服务

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

# 验证（返回 metrics 数据即成功）
curl http://localhost:9100/metrics | head -20

4.4 常用监控指标速查

指标名称	说明
`node_cpu_seconds_total`	CPU 使用时间
`node_memory_MemTotal_bytes`	内存总量
`node_memory_MemAvailable_bytes`	可用内存
`node_disk_read_bytes_total`	磁盘读取总量
`node_disk_written_bytes_total`	磁盘写入总量
`node_network_receive_bytes_total`	网络接收字节
`node_network_transmit_bytes_total`	网络发送字节
`node_load1` / `node_load5` / `node_load15`	系统负载

五、安装 Grafana

5.1 安装 Grafana（Ubuntu/Debian）

# 添加 GPG 密钥
sudo apt-get install -y gnupg2 curl
curl -fsSL https://packages.grafana.com/gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/grafana-keyring.gpg

# 添加 apt 源
echo "deb [signed-by=/usr/share/keyrings/grafana-keyring.gpg] https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

# 安装
sudo apt-get update
sudo apt-get install -y grafana

# 设置开机自启并启动
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

5.2 安装 Grafana（CentOS/Rocky）

# 创建 yum 源
sudo vi /etc/yum.repos.d/grafana.repo

/etc/yum.repos.d/grafana.repo

[grafana]
name=Grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

# 安装
sudo yum install -y grafana

# 启动
sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

5.3 访问 Grafana

默认地址：http://<服务器IP>:3000

┌─────────────────────────────────────────────────────────────────┐
│                    Grafana                                        │
│                                                                  │
│    ┌─────────────────────────────────────────────────────────┐  │
│    │                                                         │  │
│    │              🔑 Sign In                                 │  │
│    │                                                         │  │
│    │    Email or username:  admin                           │  │
│    │                                                         │  │
│    │    Password:         ●●●●●●●●                           │  │
│    │                                                         │  │
│    │         [ Sign In ]                                     │  │
│    │                                                         │  │
│    └─────────────────────────────────────────────────────────┘  │
│                                                                  │
│                    Default: admin / admin                       │
└─────────────────────────────────────────────────────────────────┘

⚠️ 首次登录：默认账号密码为 admin / admin，登录后请立即修改密码！

5.4 配置数据源

步骤 1：登录后，点击左侧菜单 Configuration → Data Sources

步骤 2：点击 Add data source，选择 Prometheus

步骤 3：配置 Prometheus 连接

┌────────────────────────────────────────────┐
│  Data Sources / Prometheus                 │
├────────────────────────────────────────────┤
│                                            │
│  Name:           Prometheus                 │
│                                            │
│  Default:       ☑                          │
│                                            │
│  URL:           http://localhost:9090     │
│                                            │
│  Access:        Server (default)           │
│                                            │
│  Scrape interval:  15s                     │
│                                            │
│  [ Save & Test ]                           │
│                                            │
└────────────────────────────────────────────┘

点击 Save & Test，出现绿色提示 Data source is working 即配置成功。

5.5 导入监控仪表盘

Grafana 官方社区提供了大量现成仪表盘，推荐使用以下 ID：

仪表盘	Grafana.com ID	用途
Node Exporter Full	1860	服务器完整监控（CPU/内存/磁盘/网络）
Docker Monitoring	193	Docker 容器监控
Prometheus Stats	2	Prometheus 自身状态
MySQL Overview	7362	MySQL 数据库监控

导入步骤：

点击左侧菜单 Dashboards → Browse
点击右上角 Import
输入 Grafana.com Dashboard ID（如 1860）
选择 Prometheus 数据源
点击 Import

5.6 自定义仪表盘：服务器监控示例

步骤 1：新建 Dashboard → Add visualization

步骤 2：选择 Prometheus 数据源

步骤 3：添加 Panel，编写 PromQL 查询

📊 CPU 使用率

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

📊 内存使用率

100 * (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes))

📊 磁盘使用率

100 * (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})

📊 网络流量

rate(node_network_receive_bytes_total[5m]) * 8  # 转换为 bits/s
rate(node_network_transmit_bytes_total[5m]) * 8

📊 磁盘 IO

rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])

六、配置告警规则

6.1 创建告警规则目录

sudo mkdir -p /etc/prometheus/rules

6.2 服务器告警规则示例

sudo vi /etc/prometheus/rules/server_alerts.yml

/etc/prometheus/rules/server_alerts.yml

groups:
  - name: server_alerts
    interval: 30s
    rules:
      # 服务器宕机告警
      - alert: ServerDown
        expr: up{job="node"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "服务器 {{ $labels.instance }} 已宕机"
          description: "服务器 {{ $labels.instance }} 已经超过 1 分钟无法访问"

      # CPU 使用率过高
      - alert: HighCPU
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "服务器 CPU 使用率过高"
          description: "服务器 {{ $labels.instance }} CPU 使用率已超过 80%，当前值：{{ $value }}%"

      # CPU 使用率严重过高
      - alert: CriticalCPU
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 95
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "服务器 CPU 使用率严重超标"
          description: "服务器 {{ $labels.instance }} CPU 使用率已超过 95%，当前值：{{ $value }}%"

      # 内存使用率过高
      - alert: HighMemory
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "服务器内存使用率过高"
          description: "服务器 {{ $labels.instance }} 内存使用率已超过 85%"

      # 磁盘空间不足
      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "服务器磁盘空间不足"
          description: "服务器 {{ $labels.instance }} 根分区可用空间低于 10%"

      # Prometheus 自身宕机
      - alert: PrometheusDown
        expr: up{job="prometheus"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Prometheus 服务不可用"
          description: "Prometheus 实例无法访问"

6.3 重新加载 Prometheus 配置

# 无需重启，直接热加载
curl -X POST http://localhost:9090/-/reload

# 或通过 systemd
sudo systemctl restart prometheus

6.4 查看告警状态

在 Prometheus Web UI 中，点击 Alerts 页面可查看所有告警规则状态：

┌────────────────────────────────────────────────────────────────┐
│  Alerts                                           [Filter ▼]   │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌─ Firing (2) ──────────────────────────────────────────────┐  │
│  │                                                           │  │
│  │  🔴 ServerDown          server-01    active   15m ago   │  │
│  │  🟡 HighCPU              server-02    pending  3m ago   │  │
│  │                                                           │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌─ Inactive (4) ────────────────────────────────────────────┐  │
│  │                                                           │  │
│  │  🟢 HighMemory          server-01    OK                  │  │
│  │  🟢 DiskSpaceLow        server-02    OK                  │  │
│  │                                                           │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                │
└────────────────────────────────────────────────────────────────┘

七、Docker 监控（cAdvisor）

7.1 使用 Docker Compose 部署

# 创建工作目录
mkdir -p ~/cadvisor && cd ~/cadvisor

# 创建 docker-compose.yml
vi docker-compose.yml

docker-compose.yml

version: '3.8'

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.47.2
    container_name: cadvisor
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    restart: unless-stopped
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge

# 启动
docker-compose up -d

# 验证
docker ps | grep cadvisor
curl http://localhost:8080/metrics | head -10

7.2 添加 cAdvisor 到 Prometheus 配置

sudo vi /etc/prometheus/prometheus.yml

在 scrape_configs 中添加：

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['192.168.1.20:8080']

# 重载配置
curl -X POST http://localhost:9090/-/reload

八、常见问题排错

❌ 问题 1：Prometheus 启动失败

错误表现：

sudo systemctl status prometheus
# 显示 "Active: failed"

排查步骤：

# 1. 查看详细错误日志
journalctl -u prometheus -n 50 --no-pager

# 2. 手动运行查看错误信息
/usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml

# 3. 常见错误及解决方案

错误信息	原因	解决方法
`permission denied`	权限不足	`chown -R prometheus:prometheus /var/lib/prometheus`
`config error`	配置文件格式错误	使用 `promtool check config` 验证
`port already in use`	端口被占用	`lsof -i:9090` 查看并 kill 进程

验证配置文件：

promtool check config /etc/prometheus/prometheus.yml

❌ 问题 2：Grafana 无法连接 Prometheus

排查步骤：

# 1. 确认 Prometheus 正在运行
curl http://localhost:9090/-/healthy

# 2. 检查网络连通性
curl http://localhost:9090/metrics

# 3. 检查 Grafana 数据源配置
#    URL 应填写：http://<实际IP>:9090，而非 localhost（如果 Grafana 也跑在容器里）

# 4. 检查防火墙
sudo firewall-cmd --list-ports
sudo firewall-cmd --add-port=9090/tcp --permanent
sudo firewall-cmd --reload

❌ 问题 3：Node Exporter 指标为空

排查步骤：

# 1. 检查服务状态
sudo systemctl status node_exporter

# 2. 直接访问指标端点
curl http://localhost:9100/metrics

# 3. 常见原因：
#    - 指标采集需要 root 权限，某些指标（如 diskstats）需要更高权限
#    - 某些容器环境限制了对 /proc 和 /sys 的访问

❌ 问题 4：告警规则不生效

排查步骤：

# 1. 验证规则文件语法
promtool check rules /etc/prometheus/rules/server_alerts.yml

# 2. 在 Prometheus UI 查看规则状态
#    访问 http://localhost:9090/rules

# 3. 检查告警是否处于 pending 状态
#    告警需要满足 for 条件时间才会触发

# 4. 检查 Prometheus 日志
journalctl -u prometheus -f

❌ 问题 5：Grafana 仪表盘空白无数据

排查步骤：

# 1. 确认 Prometheus 中有数据
#    Prometheus UI → Graph → 查询 up

# 2. 检查时间范围
#    Grafana 右上角时间选择器，确保不是 "Last 5m" 之外的极端范围

# 3. 检查数据源配置
#    Dashboard → Settings → Data Sources

# 4. 查看 Grafana 日志
tail -f /var/log/grafana/grafana.log

❌ 问题 6：Docker 容器监控数据缺失

排查步骤：

# 1. 确认 cAdvisor 正常运行
docker ps | grep cadvisor

# 2. 查看 cAdvisor 日志
docker logs cadvisor

# 3. 检查挂载路径是否正确
#    cAdvisor 需要访问 /var/lib/docker 和 /sys

# 4. 如果在 Kubernetes 环境中
#    cAdvisor 已被 kubelet 内置集成，无需单独部署

❌ 问题 7：CPU/内存数据波动剧烈

原因分析：瞬时采样波动正常

解决方案：调整 PromQL 查询的时间窗口

# 原始（波动大）
node_cpu_seconds_total

# 推荐（5分钟滑动平均，更平滑）
avg(rate(node_cpu_seconds_total[5m])) by (instance)

九、安全加固建议

9.1 配置 Prometheus 认证

# 安装 nginx 做反向代理 + 认证
sudo apt install -y nginx apache2-utils

# 创建认证文件
sudo htpasswd -c /etc/nginx/.htpasswd admin

# 配置 nginx
sudo vi /etc/nginx/sites-available/prometheus

server {
    listen 9090;
    server_name _;

    auth_basic "Prometheus Login";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://localhost:9090;
    }
}

9.2 限制 Prometheus 端口访问

# 只允许内网访问
sudo firewall-cmd --add-source=192.168.1.0/24 --permanent
sudo firewall-cmd --remove-port=9090/tcp --permanent
sudo firewall-cmd --reload

9.3 HTTPS 配置（通过 Nginx）

server {
    listen 443 ssl;
    server_name prometheus.example.com;

    ssl_certificate /etc/ssl/certs/prometheus.crt;
    ssl_certificate_key /etc/ssl/private/prometheus.key;

    location / {
        proxy_pass http://localhost:9090;
    }
}

十、总结

部署完成后的检查清单

✅ Prometheus 服务运行正常（:9090）
✅ Node Exporter 运行正常（:9100）
✅ Grafana 服务运行正常（:3000）
✅ Prometheus 已接入 Node Exporter
✅ Grafana 已连接 Prometheus 数据源
✅ 已导入 Node Exporter Full 仪表盘（ID: 1860）
✅ 已创建服务器告警规则
✅ 已测试告警触发（模拟 CPU 高负载）
✅ 已验证监控数据可正常展示

后续优化建议

优化方向	建议内容
指标优化	根据实际业务添加自定义 Exporter（如 Redis、Nginx）
告警升级	接入 Alertmanager 实现邮件/钉钉/Slack 告警通知
高可用	部署两台 Prometheus 集群 + Thanos 实现长期存储
日志集成	接入 Loki 构建日志聚合平台
K8s 监控	部署 kube-state-metrics + kube-prometheus-stack

附录：快速命令速查表

# Prometheus
sudo systemctl restart prometheus          # 重启
curl -X POST http://localhost:9090/-/reload  # 热加载
promtool check config /etc/prometheus/prometheus.yml  # 验证配置

# Grafana
sudo systemctl restart grafana-server      # 重启
sudo systemctl status grafana-server       # 查看状态

# Node Exporter
curl http://localhost:9100/metrics         # 查看指标
sudo systemctl restart node_exporter       # 重启

# 日志查看
journalctl -u prometheus -f                # 实时查看 Prometheus 日志
journalctl -u grafana-server -f            # 实时查看 Grafana 日志

文章版权归作者所有，未经允许请勿转载。

THE END

普罗米修斯运维专题 IT技术运维技术
# 运维 # 云计算 # Prometheus # Grafana # 监控平台

Prometheus + Grafana 监控平台部署

一、架构概览

核心组件说明

二、环境准备

2.1 系统要求

2.2 关闭防火墙（如需）

2.3 关闭 SELinux（CentOS/Rocky）

三、安装 Prometheus

3.1 创建系统用户

3.2 下载并安装 Prometheus

3.3 配置 Prometheus

3.4 创建 Prometheus Systemd 服务

3.5 启动 Prometheus

3.6 访问 Prometheus Web UI

四、安装 Node Exporter（服务器监控）

4.1 安装 Node Exporter

4.2 配置 Systemd 服务

4.3 启动服务

4.4 常用监控指标速查

五、安装 Grafana

5.1 安装 Grafana（Ubuntu/Debian）

5.2 安装 Grafana（CentOS/Rocky）

5.3 访问 Grafana

5.4 配置数据源

5.5 导入监控仪表盘

5.6 自定义仪表盘：服务器监控示例

📊 CPU 使用率

📊 内存使用率

📊 磁盘使用率

📊 网络流量

📊 磁盘 IO

六、配置告警规则

6.1 创建告警规则目录

6.2 服务器告警规则示例

6.3 重新加载 Prometheus 配置

6.4 查看告警状态

七、Docker 监控（cAdvisor）

7.1 使用 Docker Compose 部署

7.2 添加 cAdvisor 到 Prometheus 配置

八、常见问题排错

❌ 问题 1：Prometheus 启动失败

❌ 问题 2：Grafana 无法连接 Prometheus

❌ 问题 3：Node Exporter 指标为空

❌ 问题 4：告警规则不生效

❌ 问题 5：Grafana 仪表盘空白无数据

❌ 问题 6：Docker 容器监控数据缺失

❌ 问题 7：CPU/内存数据波动剧烈

九、安全加固建议

9.1 配置 Prometheus 认证

9.2 限制 Prometheus 端口访问

9.3 HTTPS 配置（通过 Nginx）

十、总结

部署完成后的检查清单

后续优化建议

附录：快速命令速查表

请登录后发表评论