← All Tutorials

Deploying Monitoring Agents on VoIP Servers

Monitoring & Observability Intermediate 37 min read #17

Tutorial 17 -- Deploying Monitoring Agents on VoIP Servers

Deploy node_exporter, promtail, heplify, and a custom Asterisk exporter to distributed VoIP servers from a central monitoring host via SSH.


Table of Contents

  1. Why Centralized Monitoring Matters for VoIP
  2. Architecture Overview
  3. Prerequisites
  4. Agent Reference
  5. The Install Script (install-agents.sh)
  6. Agent 1 -- node_exporter (System Metrics)
  7. Agent 2 -- promtail (Log Shipping)
  8. Agent 3 -- heplify (SIP Packet Capture)
  9. Agent 4 -- asterisk_exporter (Custom VoIP Metrics)
  10. Prometheus Scrape Configuration
  11. Verification Procedures
  12. Firewall Rules
  13. Handling OS Differences
  14. Updating Agents
  15. Troubleshooting
  16. Summary

1. Why Centralized Monitoring Matters for VoIP

Distributed VoIP infrastructure -- multiple Asterisk/ViciDial servers spread across data centers -- creates a visibility problem. Each server generates its own metrics, its own logs, and its own SIP traffic, but problems (call quality degradation, trunk failures, agent disconnections, disk filling up with recordings) almost never announce themselves on the server where you happen to be logged in.

Without centralized monitoring, troubleshooting looks like this:

  1. A client reports dropped calls.
  2. You SSH into server A, grep through Asterisk logs, find nothing.
  3. You SSH into server B, grep through different logs, find a clue.
  4. You SSH into server C, check SIP peer status, find the actual problem.
  5. Thirty minutes have passed. Calls were still dropping the entire time.

With centralized monitoring, troubleshooting looks like this:

  1. A client reports dropped calls.
  2. You open Grafana, filter by time range, see the SIP peer went UNREACHABLE at 14:32 on server C.
  3. You open Homer, search SIP traffic for that trunk, see the last successful registration and the failure response.
  4. You open Loki, query {server="charlie"} |= "chan_sip.c", see the exact Asterisk error.
  5. Five minutes total, including fixing the issue.

The four monitoring agents in this tutorial form a complete observability stack:

Layer Agent What It Captures
System node_exporter CPU, RAM, disk, network, load averages
Logs promtail Asterisk logs, ViciDial logs, syslog
SIP heplify Every SIP transaction (INVITE, BYE, REGISTER, etc.)
Application asterisk_exporter Active calls, agent states, SIP peer health, RTP quality, queue depth

Together, they answer any question you could ask about what happened on any server at any point in time.


2. Architecture Overview

+------------------+     +------------------+     +------------------+
|  VoIP Server A   |     |  VoIP Server B   |     |  VoIP Server C   |
|  (openSUSE)      |     |  (CentOS 7)      |     |  (Ubuntu/Debian) |
|                  |     |                  |     |                  |
| node_exporter    |     | node_exporter    |     | node_exporter    |
|   :9100 ----+    |     |   :9100 ----+    |     |   :9100 ----+    |
|             |    |     |             |    |     |             |    |
| ast_exporter|    |     | ast_exporter|    |     | ast_exporter|    |
|   :9101 -+  |    |     |   :9101 -+  |    |     |   :9101 -+  |    |
|          |  |    |     |          |  |    |     |          |  |    |
| promtail |  |    |     | promtail |  |    |     | promtail |  |    |
|   :9080  |  |    |     |   :9080  |  |    |     |   :9080  |  |    |
|     |    |  |    |     |     |    |  |    |     |     |    |  |    |
| heplify  |  |    |     | heplify  |  |    |     | heplify  |  |    |
|     |    |  |    |     |     |    |  |    |     |     |    |  |    |
+-----|----+--+----+     +-----|----+--+----+     +-----|----+--+----+
      |    |  |                |    |  |                |    |  |
      |    |  | Prometheus     |    |  | Prometheus     |    |  |
      |    |  | scrape :9100   |    |  | scrape :9100   |    |  |
      |    |  | scrape :9101   |    |  | scrape :9101   |    |  |
      |    |  |                |    |  |                |    |  |
      |    +--|--------+-------|----+--|--------+-------|----+--+
      |       |        |       |       |        |       |
      |       v        v       |       v        v       |
      |  +--------------------+|  Prometheus    |       |
      |  | Prometheus (:9090) ||  scrapes       |       |
      |  +--------------------+|                |       |
      |                        |                |       |
      +-----------+------------+----------------+       |
                  |   push logs to :3100                |
                  v                                     |
         +--------------------+                         |
         |   Loki (:3100)     |                         |
         +--------------------+                         |
                                                        |
              +-----------------------------------------+
              |   push HEP/UDP to :9060
              v
         +-----------------------------+
         | heplify-server (:9060)      |
         |    |                        |
         |    v                        |
         | PostgreSQL (homer_data)     |
         |    |                        |
         |    v                        |
         | Homer WebApp (:9080)        |
         +-----------------------------+
                  |
                  v
         +--------------------+
         |  Grafana (:3000)   |  <-- unified dashboards
         +--------------------+

Data flow summary:


3. Prerequisites

On the monitoring server (central)

On each VoIP server (target)

Network requirements

Source Destination Port Protocol Purpose
Monitoring server VoIP servers 9100 TCP Prometheus scrapes node_exporter
Monitoring server VoIP servers 9101 TCP Prometheus scrapes asterisk_exporter
VoIP servers Monitoring server 3100 TCP promtail pushes logs to Loki
VoIP servers Monitoring server 9060 UDP/TCP heplify sends SIP to heplify-server

4. Agent Reference

Agent Version Port Direction Binary Path Config Path
node_exporter 1.7.0 9100 Pull (Prometheus scrapes) /usr/local/bin/node_exporter N/A (CLI flags)
promtail 2.9.6 9080 Push (to Loki) /usr/local/bin/promtail /etc/promtail/config.yml
heplify 1.67.1 N/A Push (to heplify-server) /usr/local/bin/heplify N/A (CLI flags)
asterisk_exporter custom 9101 Pull (Prometheus scrapes) /opt/asterisk_exporter/asterisk_exporter.py N/A (env vars)

5. The Install Script

This script is designed to run from the monitoring server, deploying all four agents to a remote VoIP server over SSH in a single pass. It handles three OS families (openSUSE/SUSE, CentOS/RHEL, Ubuntu/Debian), is idempotent (safe to re-run), and verifies each binary before downloading.

Usage

./install-agents.sh <server_ip> <ssh_port> <server_label> <monitor_vps_ip>

Parameters:

Parameter Description Example
server_ip IP address of the target VoIP server 203.0.113.10
ssh_port SSH port on the target server 22 or 9322
server_label Human-readable name used in metric labels alpha, bravo, charlie
monitor_vps_ip IP of the central monitoring server (where Loki and heplify-server run) YOUR_MONITORING_SERVER

Example invocation:

bash install-agents.sh 203.0.113.10 9322 alpha YOUR_MONITORING_SERVER

Complete Script

#!/bin/bash
# install-agents.sh -- Install monitoring agents on a ViciDial/Asterisk server
# Usage: ./install-agents.sh <server_ip> <ssh_port> <server_label> <monitor_vps_ip>
# Supports: openSUSE, CentOS 7, Ubuntu/Debian

set -e

SERVER_IP="${1:?Usage: $0 <server_ip> <ssh_port> <server_label> <monitor_vps_ip>}"
SSH_PORT="${2:-22}"
SERVER_LABEL="${3:?Provide server label (alpha/bravo/charlie/delta)}"
MONITOR_IP="${4:?Provide monitoring VPS IP}"

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

echo "=== Installing monitoring agents on ${SERVER_LABEL} (${SERVER_IP}:${SSH_PORT}) ==="
echo "Monitor VPS: ${MONITOR_IP}"
echo ""

SSH_CMD="ssh -o StrictHostKeyChecking=no -p ${SSH_PORT} root@${SERVER_IP}"

# --------------------------------------------------------------------------
# 1. heplify (SIP capture)
# --------------------------------------------------------------------------
echo "[1/4] Installing heplify..."
${SSH_CMD} bash << REMOTEOF
set -e

if [ ! -f /usr/local/bin/heplify ] || ! /usr/local/bin/heplify -version 2>/dev/null | grep -q heplify; then
    rm -f /usr/local/bin/heplify
    curl -sL https://github.com/sipcapture/heplify/releases/download/v1.67.1/heplify \
        -o /usr/local/bin/heplify
    chmod +x /usr/local/bin/heplify
    echo "  heplify binary installed"
else
    echo "  heplify already installed"
fi

cat > /etc/systemd/system/heplify.service << SVCFILE
[Unit]
Description=heplify SIP Capture Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/heplify -hs ${MONITOR_IP}:9060 -i any -dim "OPTIONS,NOTIFY" -e
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable heplify
systemctl restart heplify
echo "  heplify service started"
REMOTEOF

# --------------------------------------------------------------------------
# 2. node_exporter (system metrics)
# --------------------------------------------------------------------------
echo "[2/4] Installing node_exporter..."
${SSH_CMD} bash << 'REMOTEOF'
set -e

if [ ! -f /usr/local/bin/node_exporter ]; then
    cd /tmp
    curl -sL https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz \
        | tar xz
    cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
    rm -rf node_exporter-1.7.0.linux-amd64*
    echo "  node_exporter binary installed"
else
    echo "  node_exporter already installed"
fi

cat > /etc/systemd/system/node_exporter.service << 'SVCFILE'
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable node_exporter
systemctl restart node_exporter
echo "  node_exporter service started"
REMOTEOF

# --------------------------------------------------------------------------
# 3. promtail (log shipping)
# --------------------------------------------------------------------------
echo "[3/4] Installing promtail..."
${SSH_CMD} bash << REMOTEOF
set -e

if [ ! -f /usr/local/bin/promtail ]; then
    cd /tmp
    curl -sL https://github.com/grafana/loki/releases/download/v2.9.6/promtail-linux-amd64.zip \
        -o promtail.zip

    # Install unzip -- works across all OS families
    if command -v apt-get &>/dev/null; then
        apt-get install -y unzip 2>/dev/null || true
    elif command -v zypper &>/dev/null; then
        zypper install -y unzip 2>/dev/null || true
    elif command -v yum &>/dev/null; then
        yum install -y unzip 2>/dev/null || true
    fi

    unzip -o promtail.zip
    mv promtail-linux-amd64 /usr/local/bin/promtail
    chmod +x /usr/local/bin/promtail
    rm -f promtail.zip
    echo "  promtail binary installed"
else
    echo "  promtail already installed"
fi

mkdir -p /etc/promtail

cat > /etc/promtail/config.yml << CFGFILE
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://${MONITOR_IP}:3100/loki/api/v1/push

scrape_configs:
  - job_name: asterisk_messages
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: ${SERVER_LABEL}
          logtype: messages
          __path__: /var/log/asterisk/messages

  - job_name: asterisk_full
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: ${SERVER_LABEL}
          logtype: full
          __path__: /var/log/asterisk/full

  - job_name: vicidial
    static_configs:
      - targets: [localhost]
        labels:
          job: vicidial
          server: ${SERVER_LABEL}
          logtype: vicidial
          __path__: /var/log/astguiclient/*.log

  - job_name: syslog
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          server: ${SERVER_LABEL}
          logtype: syslog
          __path__: /var/log/messages
CFGFILE

mkdir -p /var/lib/promtail

cat > /etc/systemd/system/promtail.service << 'SVCFILE'
[Unit]
Description=Promtail Log Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable promtail
systemctl restart promtail
echo "  promtail service started"
REMOTEOF

# --------------------------------------------------------------------------
# 4. asterisk_exporter (VoIP metrics)
# --------------------------------------------------------------------------
echo "[4/4] Installing asterisk_exporter..."
${SSH_CMD} bash << REMOTEOF
set -e
mkdir -p /opt/asterisk_exporter
REMOTEOF

# Copy the exporter script from the local scripts directory
scp -o StrictHostKeyChecking=no -P ${SSH_PORT} \
    ${SCRIPT_DIR}/asterisk_exporter.py \
    root@${SERVER_IP}:/opt/asterisk_exporter/asterisk_exporter.py

# Install Python dependencies and create systemd service
${SSH_CMD} bash << REMOTEOF
set -e

# Find the right python3 binary -- try versioned names first
PYTHON_BIN=""
for p in python3.11 python3.6 python3; do
    if command -v \$p &>/dev/null; then
        PYTHON_BIN=\$(command -v \$p)
        break
    fi
done

if [ -z "\$PYTHON_BIN" ]; then
    # CentOS 7: install python3 via yum
    if command -v yum &>/dev/null; then
        yum install -y python3 python3-pip 2>/dev/null || true
        PYTHON_BIN=\$(command -v python3)
    fi
fi

echo "  Using Python: \$PYTHON_BIN"

# Install mysql-connector (try latest first, fall back to <8.1 for old Python)
\$PYTHON_BIN -m pip install mysql-connector-python 2>/dev/null \
    || \$PYTHON_BIN -m pip install "mysql-connector-python<8.1" 2>/dev/null \
    || true

# Verify the import works
\$PYTHON_BIN -c "import mysql.connector; print('  mysql-connector OK')" \
    || echo "  WARNING: mysql-connector import failed"

chmod +x /opt/asterisk_exporter/asterisk_exporter.py

cat > /etc/systemd/system/asterisk_exporter.service << SVCFILE
[Unit]
Description=Asterisk/ViciDial Prometheus Exporter
After=network.target mariadb.service asterisk.service
Wants=mariadb.service

[Service]
Type=simple
ExecStart=\$PYTHON_BIN /opt/asterisk_exporter/asterisk_exporter.py
Restart=always
RestartSec=10
Environment=EXPORTER_PORT=9101
Environment=MYSQL_HOST=localhost
Environment=MYSQL_USER=cron
Environment=MYSQL_PASS=1234
Environment=MYSQL_DB=asterisk
Environment=SERVER_LABEL=${SERVER_LABEL}

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable asterisk_exporter
systemctl restart asterisk_exporter
echo "  asterisk_exporter service started"
REMOTEOF

echo ""
echo "=== All 4 agents installed on ${SERVER_LABEL} (${SERVER_IP}) ==="
echo "  heplify       -> sending HEP to ${MONITOR_IP}:9060"
echo "  node_exporter -> :9100"
echo "  promtail      -> shipping logs to ${MONITOR_IP}:3100"
echo "  ast_exporter  -> :9101"
echo ""

How the script works

  1. SSH-based deployment: The script runs entirely from the monitoring server. Each section opens an SSH session, streams a heredoc of shell commands to execute on the remote host, then closes the connection.

  2. Idempotent: Before downloading any binary, it checks whether the file already exists at the expected path. Re-running the script on a server that already has agents installed will simply restart the services with the latest configuration.

  3. OS-agnostic package installation: When unzip is needed for promtail, the script detects the package manager (apt-get, zypper, or yum) and installs accordingly.

  4. Python version detection: For the asterisk_exporter, the script tries python3.11, python3.6, and python3 in order, covering openSUSE (which ships 3.11), CentOS 7 (which uses 3.6), and Ubuntu/Debian (which use python3).

  5. scp for the exporter: The asterisk_exporter is a custom Python script, so it gets copied from the monitoring server's scripts/ directory via scp rather than downloaded from a release URL.


6. Agent 1 -- node_exporter

What it does

node_exporter is the standard Prometheus exporter for hardware and OS-level metrics. It exposes approximately 277 metrics covering CPU, memory, disk, network, filesystem, and load average data.

Binary installation

cd /tmp
curl -sL https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz \
    | tar xz
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-1.7.0.linux-amd64*

The tarball contains a single static binary (~20 MB). No dependencies, no runtime, no configuration file. It runs on any Linux x86_64 system regardless of distribution.

systemd service file

Path: /etc/systemd/system/node_exporter.service

[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Key settings:

Optional: Enabling/disabling specific collectors

By default, node_exporter enables a broad set of collectors. For VoIP servers with heavy I/O, you may want to add flags to include or exclude specific collectors:

# Include only the collectors you care about:
ExecStart=/usr/local/bin/node_exporter \
    --web.listen-address=:9100 \
    --collector.cpu \
    --collector.meminfo \
    --collector.diskstats \
    --collector.filesystem \
    --collector.loadavg \
    --collector.netdev \
    --collector.stat \
    --collector.time \
    --collector.uname \
    --no-collector.wifi \
    --no-collector.infiniband \
    --no-collector.nfs \
    --no-collector.nfsd

For most VoIP deployments, the default set is fine.

Key metrics for VoIP servers

Metric What to watch
node_cpu_seconds_total High system or iowait indicates Asterisk is under load
node_memory_MemAvailable_bytes Asterisk leaks memory slowly; watch for steady decline
node_filesystem_avail_bytes Recordings fill disks; alert at 80%
node_load1 / node_load5 Should stay below CPU count during peak hours
node_network_receive_bytes_total Baseline for detecting DDoS or SIP floods
node_disk_io_time_seconds_total High I/O wait degrades call recording quality

Enabling and starting

systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter

Quick test

curl -s http://localhost:9100/metrics | head -20

You should see lines like:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 1.234567e+06
node_cpu_seconds_total{cpu="0",mode="system"} 12345.67
...

7. Agent 2 -- promtail

What it does

promtail is the log-shipping agent for Grafana Loki. It tails log files on the VoIP server, attaches labels (server name, log type), and pushes the entries to a central Loki instance. This enables centralized log search across all servers from Grafana.

Binary installation

cd /tmp
curl -sL https://github.com/grafana/loki/releases/download/v2.9.6/promtail-linux-amd64.zip \
    -o promtail.zip
unzip -o promtail.zip
mv promtail-linux-amd64 /usr/local/bin/promtail
chmod +x /usr/local/bin/promtail
rm -f promtail.zip

Unlike node_exporter, promtail is distributed as a zip file containing a single binary.

Configuration file

Path: /etc/promtail/config.yml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://YOUR_MONITORING_SERVER:3100/loki/api/v1/push

scrape_configs:
  - job_name: asterisk_messages
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: YOUR_SERVER_LABEL
          logtype: messages
          __path__: /var/log/asterisk/messages

  - job_name: asterisk_full
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: YOUR_SERVER_LABEL
          logtype: full
          __path__: /var/log/asterisk/full

  - job_name: vicidial
    static_configs:
      - targets: [localhost]
        labels:
          job: vicidial
          server: YOUR_SERVER_LABEL
          logtype: vicidial
          __path__: /var/log/astguiclient/*.log

  - job_name: syslog
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          server: YOUR_SERVER_LABEL
          logtype: syslog
          __path__: /var/log/messages

Configuration breakdown

server block:

positions block:

clients block:

scrape_configs block -- the four log sources:

Job Name Path What It Captures
asterisk_messages /var/log/asterisk/messages Asterisk NOTICE/WARNING/ERROR messages (SIP registration failures, channel errors, peer unreachable)
asterisk_full /var/log/asterisk/full Full verbose Asterisk log (every dialplan step, every SIP message -- high volume)
vicidial /var/log/astguiclient/*.log ViciDial application logs (agent login/logout, call routing, list loading, campaign actions)
syslog /var/log/messages System syslog (kernel, cron, auth, services)

Labels explained:

Important: The positions file

The positions file (/var/lib/promtail/positions.yaml) is critical. It looks like this after promtail has been running:

positions:
  /var/log/asterisk/messages: "4521789"
  /var/log/asterisk/full: "98234567"
  /var/log/astguiclient/VDadapt.log: "12345"
  /var/log/messages: "67890123"

Each number is the byte offset where promtail last read. If this file is deleted, promtail will re-read and re-send all log data from the beginning of each file. On a busy server, this can mean pushing gigabytes of old logs to Loki. If you need to reset positions, do so deliberately and during a low-traffic period.

systemd service file

Path: /etc/systemd/system/promtail.service

[Unit]
Description=Promtail Log Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Querying logs in Grafana (Loki)

Once promtail is shipping logs, query them in Grafana using LogQL:

# All Asterisk errors on server alpha
{server="alpha", job="asterisk"} |= "ERROR"

# SIP registration failures across all servers
{job="asterisk", logtype="messages"} |= "Registration" |= "failed"

# ViciDial agent login events on a specific server
{server="charlie", job="vicidial"} |= "LOGIN"

# High-volume: all full Asterisk logs (use with time range filter)
{server="delta", logtype="full"}

Note on syslog path

On Debian/Ubuntu, the system log is /var/log/syslog rather than /var/log/messages. If deploying to Debian/Ubuntu, update the syslog scrape config:

  - job_name: syslog
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          server: YOUR_SERVER_LABEL
          logtype: syslog
          __path__: /var/log/syslog

The install script as written uses /var/log/messages, which works on openSUSE, CentOS, and RHEL. On Debian/Ubuntu, if /var/log/messages does not exist, promtail will log a warning but continue running -- it simply will not ship syslog data. Adjust the path if needed.


8. Agent 3 -- heplify

What it does

heplify is a SIP packet capture agent. It sniffs network traffic, extracts SIP messages (INVITE, ACK, BYE, REGISTER, etc.), and sends them to a Homer server using the HEP3 (Homer Encapsulation Protocol) format over UDP.

This gives you full SIP call flow visualization: you can search for any call by phone number, SIP Call-ID, or time range and see every SIP message in the transaction, including response codes, SDP negotiation, and timing.

Binary installation

curl -sL https://github.com/sipcapture/heplify/releases/download/v1.67.1/heplify \
    -o /usr/local/bin/heplify
chmod +x /usr/local/bin/heplify

Single static binary, no dependencies.

systemd service file

Path: /etc/systemd/system/heplify.service

[Unit]
Description=heplify SIP Capture Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/heplify -hs YOUR_MONITORING_SERVER:9060 -i any -dim "OPTIONS,NOTIFY" -e
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Command-line flags explained

Flag Value Purpose
-hs YOUR_MONITORING_SERVER:9060 Homer server address and port (HEP receiver)
-i any Capture on all network interfaces
-dim "OPTIONS,NOTIFY" Discard In Method -- drop these SIP methods before sending. OPTIONS keepalives and NOTIFY events are extremely high-volume and rarely useful for troubleshooting
-e (flag) Send via UDP (the default and recommended transport for HEP)

Why filter OPTIONS and NOTIFY

On a typical VoIP server with 30 SIP peers, each sending OPTIONS keepalives every 60 seconds, that is 30 messages/minute -- 43,200 per day -- of pure noise. NOTIFY messages (for MWI, dialog-info, etc.) add another significant volume. Filtering these at the capture level reduces:

The important SIP methods (INVITE, BYE, REGISTER, ACK, CANCEL, REFER, UPDATE) are all captured and forwarded.

Additional useful flags

# Capture only on a specific interface (useful if the server has multiple NICs)
-i eth0

# Filter to a specific port (only capture SIP on port 5060)
-p "port 5060"

# Also capture RTCP (RTP Control Protocol) for quality metrics
-pr 5060-5062

# Set a custom HEP node ID (useful when multiple servers send to the same Homer)
-hn 100

# Enable TLS for HEP transport (if your Homer server requires it)
-hs YOUR_MONITORING_SERVER:9060 -ht tls

Verifying heplify is capturing

# Check the service is running
systemctl status heplify

# Check recent logs for capture stats
journalctl -u heplify --no-pager -n 20

You should see output like:

heplify: sending to YOUR_MONITORING_SERVER:9060
heplify: captured 47 packets, sent 47

On the Homer server side, verify data is arriving:

# Check heplify-server is receiving packets
docker logs --tail 20 heplify-server

9. Agent 4 -- asterisk_exporter

What it does

The asterisk_exporter is a custom Python script that queries Asterisk CLI commands and ViciDial's MySQL database to expose VoIP-specific metrics as a Prometheus endpoint on port 9101. This fills the gap between generic system metrics (node_exporter) and what you actually need to monitor in a call center: active calls, agent states, SIP peer health, RTP quality, queue depth, and more.

A full walkthrough of the exporter's internals is available in Tutorial 08 -- Building a Custom Asterisk Prometheus Exporter. This section covers only what you need for deployment.

File location

/opt/asterisk_exporter/
    asterisk_exporter.py     # The exporter script (copied from monitoring server)

Dependencies

systemd service file

Path: /etc/systemd/system/asterisk_exporter.service

[Unit]
Description=Asterisk/ViciDial Prometheus Exporter
After=network.target mariadb.service asterisk.service
Wants=mariadb.service

[Service]
Type=simple
ExecStart=/usr/bin/python3 /opt/asterisk_exporter/asterisk_exporter.py
Restart=always
RestartSec=10
Environment=EXPORTER_PORT=9101
Environment=MYSQL_HOST=localhost
Environment=MYSQL_USER=cron
Environment=MYSQL_PASS=1234
Environment=MYSQL_DB=asterisk
Environment=SERVER_LABEL=YOUR_SERVER_LABEL

[Install]
WantedBy=multi-user.target

Environment variables

Variable Default Purpose
EXPORTER_PORT 9101 Port to listen on
MYSQL_HOST localhost MySQL server address
MYSQL_USER cron MySQL user (needs SELECT on vicidial tables)
MYSQL_PASS 1234 MySQL password
MYSQL_DB asterisk Database name
SERVER_LABEL alpha Label attached to all metrics

Important: The MySQL user only needs SELECT privileges. Use an existing read-only user or create one:

CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'YOUR_PASSWORD';
GRANT SELECT ON asterisk.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;

Exposed metrics

The exporter exposes approximately 25 metric families:

Metric Type Description
asterisk_active_calls gauge Current active calls
asterisk_active_channels gauge Current active channels
asterisk_sip_peer_up gauge SIP peer reachability (1=up, 0=down)
asterisk_sip_peer_latency_ms gauge SIP peer qualify latency in milliseconds
asterisk_sip_peer_status gauge SIP peer status string (OK, Lagged, UNREACHABLE)
asterisk_agents_logged_in gauge Number of agents logged in
asterisk_agents_incall gauge Number of agents currently in a call
asterisk_agents_paused gauge Number of agents in pause state
asterisk_agents_waiting gauge Number of agents ready/waiting
asterisk_agent_status gauge Per-agent status (INCALL, PAUSED, READY, CLOSER)
asterisk_agent_incall_duration_seconds gauge How long the current agent has been in-call
asterisk_agent_pause_duration_seconds gauge How long the current agent has been paused
asterisk_queue_depth gauge Calls waiting in queue per inbound group
asterisk_rtp_packet_loss_percent gauge RTP packet loss percentage per channel
asterisk_rtp_jitter_ms gauge RTP jitter in milliseconds per channel
asterisk_rtp_rtt_ms gauge RTP round-trip time per channel
asterisk_uptime_seconds gauge Asterisk system uptime
asterisk_confbridge_count gauge Active ConfBridge/MeetMe conferences
asterisk_channels_by_codec gauge Channel count per codec (alaw, ulaw, g722, etc.)
asterisk_transcoding_channels gauge Channels actively transcoding between codecs
asterisk_fail2ban_active_bans gauge Current fail2ban active bans per jail
asterisk_fail2ban_bans_total counter Total fail2ban bans per jail
asterisk_recordings_missing gauge CDR entries from the last hour without matching recordings

How it collects data

The exporter uses two data sources:

1. Asterisk CLI (via asterisk -rx):

asterisk -rx "sip show peers"          # SIP peer status and latency
asterisk -rx "core show channels"      # Active call/channel counts
asterisk -rx "sip show channelstats"   # RTP quality (loss, jitter, RTT)
asterisk -rx "core show uptime seconds"  # Asterisk uptime
asterisk -rx "confbridge list"         # Active conferences
asterisk -rx "core show channel SIP/..." # Per-channel codec/transcoding info

2. ViciDial MySQL (via mysql-connector-python):

-- Agent states
SELECT status, COUNT(*) FROM vicidial_live_agents GROUP BY status;

-- Per-agent detail
SELECT user, status, pause_code,
       TIMESTAMPDIFF(SECOND, last_state_change, NOW()) as state_duration
FROM vicidial_live_agents;

-- Queue depth
SELECT campaign_id, COUNT(*) FROM vicidial_auto_calls
WHERE status = 'LIVE' GROUP BY campaign_id;

-- Missing recordings
SELECT COUNT(*) FROM vicidial_closer_log cl
LEFT JOIN recording_log rl ON ...
WHERE cl.call_date >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
AND cl.length_in_sec > 10 AND rl.recording_id IS NULL;

Quick test

curl -s http://localhost:9101/metrics | grep asterisk_active

Expected output:

# HELP asterisk_active_calls Number of active calls
# TYPE asterisk_active_calls gauge
asterisk_active_calls{server="alpha"} 12
# HELP asterisk_active_channels Number of active channels
# TYPE asterisk_active_channels gauge
asterisk_active_channels{server="alpha"} 24

10. Prometheus Scrape Configuration

On the monitoring server, add scrape targets for each VoIP server in your Prometheus configuration.

Path: prometheus/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # --- Prometheus self-monitoring ---
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # --- Node Exporter (system metrics per server) ---
  - job_name: "node"
    static_configs:
      - targets: ["SERVER_A_IP:9100"]
        labels:
          server: "alpha"
      - targets: ["SERVER_B_IP:9100"]
        labels:
          server: "bravo"
      - targets: ["SERVER_C_IP:9100"]
        labels:
          server: "charlie"
      - targets: ["SERVER_D_IP:9100"]
        labels:
          server: "delta"

  # --- Asterisk Exporter (VoIP metrics per server) ---
  - job_name: "asterisk"
    scrape_interval: 15s
    static_configs:
      - targets: ["SERVER_A_IP:9101"]
        labels:
          server: "alpha"
      - targets: ["SERVER_B_IP:9101"]
        labels:
          server: "bravo"
      - targets: ["SERVER_C_IP:9101"]
        labels:
          server: "charlie"
      - targets: ["SERVER_D_IP:9101"]
        labels:
          server: "delta"

  # --- heplify-server metrics (on the monitoring server itself) ---
  - job_name: "heplify-server"
    static_configs:
      - targets: ["heplify-server:9096"]

Key points

Adding a new server

When you deploy agents to a new server:

  1. Run install-agents.sh with the new server's IP, SSH port, and label.
  2. Add two new entries to prometheus.yml (one under node, one under asterisk).
  3. Reload Prometheus:
    # If running in Docker:
    docker exec prometheus kill -HUP 1
    # Or via API (if --web.enable-lifecycle is set):
    curl -X POST http://localhost:9090/-/reload
    

Verifying targets in Prometheus

Open http://YOUR_MONITORING_SERVER:9090/targets in a browser. You should see all configured targets with their status (UP or DOWN), last scrape time, and scrape duration.


11. Verification Procedures

After running install-agents.sh, verify each agent is working correctly.

From the VoIP server (SSH in and check locally)

# --- 1. node_exporter ---
systemctl status node_exporter
# Should show: active (running)

curl -s http://localhost:9100/metrics | head -5
# Should show Prometheus metric lines

# --- 2. promtail ---
systemctl status promtail
# Should show: active (running)

curl -s http://localhost:9080/targets
# Should show each scrape target and its status

curl -s http://localhost:9080/ready
# Should return: "ready"

# --- 3. heplify ---
systemctl status heplify
# Should show: active (running)

journalctl -u heplify --no-pager -n 10
# Should show capture statistics (packets captured/sent)

# --- 4. asterisk_exporter ---
systemctl status asterisk_exporter
# Should show: active (running)

curl -s http://localhost:9101/metrics | grep "asterisk_active_calls"
# Should show the current call count

From the monitoring server (check data is arriving)

# --- Prometheus targets (check all UP) ---
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep -A2 '"health"'

# --- Query node_exporter data ---
curl -s --data-urlencode 'query=up{job="node"}' http://localhost:9090/api/v1/query \
    | python3 -m json.tool

# --- Query asterisk_exporter data ---
curl -s --data-urlencode 'query=asterisk_active_calls' http://localhost:9090/api/v1/query \
    | python3 -m json.tool

# --- Check Loki is receiving logs ---
curl -s -G --data-urlencode 'query={server="alpha"}' \
    --data-urlencode 'limit=5' \
    http://localhost:3100/loki/api/v1/query_range

# --- Check Homer is receiving SIP ---
# Look at heplify-server logs for packet counts
docker logs --tail 10 heplify-server

Verification checklist

Check Command Expected
node_exporter running systemctl is-active node_exporter active
node_exporter responding curl -s localhost:9100/metrics | wc -l > 200
promtail running systemctl is-active promtail active
promtail tailing files curl -s localhost:9080/targets All targets "RUNNING"
heplify running systemctl is-active heplify active
heplify capturing journalctl -u heplify -n 5 Shows packet counts
asterisk_exporter running systemctl is-active asterisk_exporter active
asterisk_exporter responding curl -s localhost:9101/metrics | wc -l > 30
Prometheus scraping node Prometheus UI /targets node job shows UP
Prometheus scraping asterisk Prometheus UI /targets asterisk job shows UP

12. Firewall Rules

On each VoIP server (allow monitoring server to scrape)

The monitoring server needs TCP access to ports 9100 and 9101 on each VoIP server for Prometheus scraping:

# Allow the monitoring server to scrape exporters
iptables -I INPUT -s YOUR_MONITORING_SERVER -p tcp --dport 9100 -j ACCEPT
iptables -I INPUT -s YOUR_MONITORING_SERVER -p tcp --dport 9101 -j ACCEPT

# Persist rules (distribution-dependent)
# CentOS/RHEL:
iptables-save > /etc/sysconfig/iptables

# openSUSE:
iptables-save > /etc/sysconfig/iptables

# Ubuntu/Debian:
iptables-save > /etc/iptables/rules.v4

On the monitoring server (allow agents to push data)

Each VoIP server needs to reach the monitoring server on ports 3100 (Loki) and 9060 (Homer/heplify-server):

# Allow each VoIP server to push logs and SIP data
for SERVER_IP in SERVER_A_IP SERVER_B_IP SERVER_C_IP SERVER_D_IP; do
    iptables -I INPUT -s ${SERVER_IP} -p tcp --dport 3100 -j ACCEPT
    iptables -I INPUT -s ${SERVER_IP} -p udp --dport 9060 -j ACCEPT
    iptables -I INPUT -s ${SERVER_IP} -p tcp --dport 9060 -j ACCEPT
done

# Persist
iptables-save > /etc/iptables/rules.v4

If you are running the monitoring stack in Docker, you may need to add rules to the DOCKER-USER chain instead of INPUT:

iptables -I DOCKER-USER -s SERVER_A_IP -p tcp --dport 3100 -j ACCEPT
iptables -I DOCKER-USER -s SERVER_A_IP -p udp --dport 9060 -j ACCEPT

Port summary

Port Protocol Direction Service
9100 TCP Monitoring -> VoIP node_exporter
9101 TCP Monitoring -> VoIP asterisk_exporter
3100 TCP VoIP -> Monitoring Loki (promtail push)
9060 UDP+TCP VoIP -> Monitoring heplify-server (HEP)

13. Handling OS Differences

The install script supports three Linux families. Here are the differences that matter and how the script handles them.

Package managers

OS Family Package Manager Used For
openSUSE/SLES zypper Installing unzip
CentOS/RHEL yum Installing unzip, python3, python3-pip
Ubuntu/Debian apt-get Installing unzip

The script detects the package manager by checking which command exists:

if command -v apt-get &>/dev/null; then
    apt-get install -y unzip
elif command -v zypper &>/dev/null; then
    zypper install -y unzip
elif command -v yum &>/dev/null; then
    yum install -y unzip
fi

Python versions

OS Default Python 3 Binary Name
openSUSE 15.x 3.6 + 3.11 available python3.11 preferred
CentOS 7 Not installed by default python3 (3.6) after yum install python3
Ubuntu 22.04+ 3.10+ python3
Debian 12 3.11 python3

The script tries versioned names first (python3.11, python3.6) before falling back to python3. On CentOS 7 where Python 3 is not installed, it runs yum install -y python3 python3-pip.

mysql-connector-python compatibility

The mysql-connector-python package version 8.1+ requires Python 3.8+. On CentOS 7 (Python 3.6), the install script falls back:

pip install mysql-connector-python 2>/dev/null \
    || pip install "mysql-connector-python<8.1" 2>/dev/null \
    || true

Syslog path

OS Syslog Path
openSUSE, CentOS, RHEL /var/log/messages
Ubuntu, Debian /var/log/syslog

The default promtail config uses /var/log/messages. For Ubuntu/Debian deployments, update the config or add both paths.

Firewall persistence

OS Save Command Restore Mechanism
openSUSE iptables-save > /etc/sysconfig/iptables SuSEfirewall2 or manual
CentOS 7 iptables-save > /etc/sysconfig/iptables iptables-restore in init
Ubuntu/Debian iptables-save > /etc/iptables/rules.v4 iptables-persistent package

14. Updating Agents

Updating node_exporter

# On the VoIP server:
NEW_VERSION="1.8.0"  # Check https://github.com/prometheus/node_exporter/releases

systemctl stop node_exporter
cd /tmp
curl -sL "https://github.com/prometheus/node_exporter/releases/download/v${NEW_VERSION}/node_exporter-${NEW_VERSION}.linux-amd64.tar.gz" \
    | tar xz
cp "node_exporter-${NEW_VERSION}.linux-amd64/node_exporter" /usr/local/bin/
rm -rf "node_exporter-${NEW_VERSION}.linux-amd64"*
systemctl start node_exporter

# Verify
node_exporter --version

Updating promtail

NEW_VERSION="3.0.0"  # Check https://github.com/grafana/loki/releases

systemctl stop promtail
cd /tmp
curl -sL "https://github.com/grafana/loki/releases/download/v${NEW_VERSION}/promtail-linux-amd64.zip" \
    -o promtail.zip
unzip -o promtail.zip
mv promtail-linux-amd64 /usr/local/bin/promtail
chmod +x /usr/local/bin/promtail
rm -f promtail.zip
systemctl start promtail

# Verify -- promtail will pick up from the last position
promtail --version

Important: When updating promtail, do not delete /var/lib/promtail/positions.yaml. The new version will resume from the last recorded position.

Updating heplify

NEW_VERSION="1.68.0"  # Check https://github.com/sipcapture/heplify/releases

systemctl stop heplify
curl -sL "https://github.com/sipcapture/heplify/releases/download/v${NEW_VERSION}/heplify" \
    -o /usr/local/bin/heplify
chmod +x /usr/local/bin/heplify
systemctl start heplify

Updating asterisk_exporter

The asterisk_exporter is a custom script, so updates are done by copying the new version from the monitoring server:

# From the monitoring server:
scp -P SSH_PORT /opt/monitoring/scripts/asterisk_exporter.py \
    root@SERVER_IP:/opt/asterisk_exporter/asterisk_exporter.py

ssh -p SSH_PORT root@SERVER_IP "systemctl restart asterisk_exporter"

Bulk updates across all servers

For updating agents across multiple servers, wrap the update commands in a loop:

#!/bin/bash
# update-node-exporter.sh -- Update node_exporter on all servers
VERSION="1.8.0"
SERVERS=("SERVER_A_IP:9322:alpha" "SERVER_B_IP:9322:bravo" "SERVER_C_IP:9322:charlie")

for entry in "${SERVERS[@]}"; do
    IFS=: read -r ip port label <<< "$entry"
    echo "=== Updating ${label} (${ip}) ==="
    ssh -p ${port} root@${ip} bash << EOF
        systemctl stop node_exporter
        cd /tmp
        curl -sL "https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz" | tar xz
        cp "node_exporter-${VERSION}.linux-amd64/node_exporter" /usr/local/bin/
        rm -rf "node_exporter-${VERSION}.linux-amd64"*
        systemctl start node_exporter
        echo "  Updated to ${VERSION}"
EOF
done

15. Troubleshooting

Agent will not start

Symptom: systemctl status <agent> shows failed or inactive (dead).

# Check the full error
journalctl -u node_exporter --no-pager -n 30
journalctl -u promtail --no-pager -n 30
journalctl -u heplify --no-pager -n 30
journalctl -u asterisk_exporter --no-pager -n 30

Common causes:

Agent Error Fix
node_exporter address already in use :9100 Another process is using port 9100. Check with ss -tlnp | grep 9100
promtail permission denied on log file promtail runs as root by default, but check if log files have restrictive ACLs
promtail error creating positions file Create the directory: mkdir -p /var/lib/promtail
heplify permission denied: /dev/net/tun or pcap errors heplify needs root or CAP_NET_RAW. Ensure the service runs as root
asterisk_exporter ModuleNotFoundError: No module named 'mysql' Reinstall: python3 -m pip install mysql-connector-python
asterisk_exporter mysql.connector.errors.InterfaceError MySQL is not running, or credentials are wrong. Check MYSQL_USER/MYSQL_PASS in the service file

No metrics appearing in Prometheus

Symptom: Prometheus target shows DOWN, or metrics exist but return no data.

Step 1: Check the exporter is responding locally on the VoIP server:

curl -s http://localhost:9100/metrics | head -5   # node_exporter
curl -s http://localhost:9101/metrics | head -5   # asterisk_exporter

If this fails, the agent is not running or is listening on a different port.

Step 2: Check network connectivity from the monitoring server:

# From the monitoring server:
curl -s --connect-timeout 5 http://SERVER_IP:9100/metrics | head -5
curl -s --connect-timeout 5 http://SERVER_IP:9101/metrics | head -5

If this fails but Step 1 succeeded, it is a firewall issue. See Section 12.

Step 3: Check Prometheus configuration:

# Verify the target is configured
grep -A3 "SERVER_IP" prometheus/prometheus.yml

# Reload Prometheus after config changes
curl -X POST http://localhost:9090/-/reload

Step 4: Check Prometheus targets page:

Open http://YOUR_MONITORING_SERVER:9090/targets and look for error messages next to the target.

Logs not appearing in Loki/Grafana

Symptom: Querying {server="alpha"} in Grafana/Loki returns no results.

Step 1: Verify promtail is tailing the right files:

curl -s http://localhost:9080/targets

Look for entries showing RUNNING status and non-zero last_target_len.

Step 2: Check if the log files exist and are being written to:

ls -la /var/log/asterisk/messages
ls -la /var/log/asterisk/full
ls -la /var/log/astguiclient/
tail -1 /var/log/asterisk/messages

If a file does not exist (common: some servers do not have /var/log/asterisk/full), promtail will log a warning and skip it -- this is normal.

Step 3: Check promtail can reach Loki:

# From the VoIP server, test connectivity to Loki
curl -s -o /dev/null -w "%{http_code}" http://YOUR_MONITORING_SERVER:3100/ready
# Should return: 200

If this fails, check firewall rules on the monitoring server for port 3100.

Step 4: Check promtail logs for push errors:

journalctl -u promtail --no-pager -n 50 | grep -i "error\|level=error\|429"

Common errors:

Step 5: Check the positions file:

cat /var/lib/promtail/positions.yaml

If positions are advancing (numbers increasing on subsequent checks), promtail is reading the files. If positions are static, the log files are not being written to.

Log shipping delays

Symptom: Logs appear in Grafana/Loki but with a delay of minutes or hours.

Possible causes:

  1. Loki ingestion rate limits: If promtail is sending faster than Loki can accept, entries queue up. Check promtail logs for 429 errors and increase Loki limits.

  2. Large log files on first run: When promtail first starts on a server with large existing log files, it reads from the beginning. This can cause a large backlog. Consider setting positions.yaml to the end of each file before first start:

    # Skip to end of existing logs (only for first deployment)
    wc -c /var/log/asterisk/messages
    # Use that byte count in positions.yaml
    
  3. Clock skew: If the VoIP server's clock is off by more than a few minutes, Loki may reject entries as "too old" or "too far in the future." Verify NTP is running:

    timedatectl status
    # Or:
    ntpq -p
    

SIP data not appearing in Homer

Symptom: Homer search returns no results for a server that should be sending SIP data.

Step 1: Verify heplify is capturing packets:

journalctl -u heplify --no-pager -n 20

If you see captured 0 packets, heplify may be listening on the wrong interface. Try specifying the interface explicitly:

# Find the correct interface
ip addr show | grep "inet "
# Update the service to use it
# -i eth0  instead of  -i any

Step 2: Verify SIP traffic exists on the server:

# Quick packet capture to confirm SIP is flowing
tcpdump -i any -c 10 port 5060 -n

If no SIP traffic is seen, the server may not have active SIP trunks or may use a non-standard SIP port.

Step 3: Test connectivity to heplify-server:

# UDP connectivity test (heplify sends HEP over UDP by default)
echo "test" | nc -u -w1 YOUR_MONITORING_SERVER 9060

Step 4: Check heplify-server logs on the monitoring server:

docker logs --tail 30 heplify-server

Look for incoming HEP packet counts or error messages.

asterisk_exporter shows partial metrics

Symptom: Some metrics (like asterisk_active_calls) work but others (like asterisk_agents_logged_in) are missing.

This usually means the MySQL connection is failing while Asterisk CLI commands succeed:

# Test MySQL connectivity with the same credentials
mysql -u cron -p1234 -e "SELECT COUNT(*) FROM asterisk.vicidial_live_agents;"

If this fails, check:


16. Summary

Deploying monitoring agents across distributed VoIP servers transforms your operational capabilities. Instead of reactive SSH-and-grep troubleshooting, you get a unified view of every server's health, every call's SIP flow, every agent's state, and every log entry -- all searchable from a single Grafana interface.

What you deployed

Agent Port Data Destination What It Provides
node_exporter 9100 Prometheus (pull) CPU, RAM, disk, network, load
promtail 9080 Loki (push) Asterisk logs, ViciDial logs, syslog
heplify -- heplify-server (push) Complete SIP call flows
asterisk_exporter 9101 Prometheus (pull) Active calls, agents, SIP peers, RTP, queues

The install process in brief

  1. Run install-agents.sh from the monitoring server for each VoIP server.
  2. Add scrape targets to prometheus.yml for node_exporter and asterisk_exporter.
  3. Open firewall ports: 9100/9101 inbound on VoIP servers, 3100/9060 inbound on the monitoring server.
  4. Verify: check Prometheus targets page, query Loki, search Homer.

Maintenance cadence

Task Frequency How
Check agent health Daily (or alert on it) Prometheus up{} metric
Update node_exporter Quarterly Download new release, restart service
Update promtail Quarterly (match Loki version) Download new release, restart service
Update heplify As needed Download new release, restart service
Update asterisk_exporter When you add metrics scp new script, restart service
Review promtail positions After server reboot Verify positions file is intact

This tutorial is part of a series on building production VoIP monitoring infrastructure. For the custom exporter internals, see Tutorial 08. For the central monitoring stack (Prometheus, Loki, Homer, Grafana), see Tutorials 01-07.

Need expert help with your setup?

VoIP infrastructure consulting, AI voice agent integration, monitoring stacks, scaling — I've done it all in production.

Get a Free Consultation