1. SSH-based deployment: The script runs entirely from the monitoring server. Each section opens an SSH session, streams a heredoc of shell commands to execute on the remote host, then closes the connection.

Deploying Monitoring Agents on VoIP Servers

Deploy node_exporter, promtail, heplify, and a custom Asterisk exporter to distributed VoIP servers from a central monitoring host via SSH.

Why Centralized Monitoring Matters for VoIP
Architecture Overview
Prerequisites
Agent Reference
The Install Script (install-agents.sh)
Agent 1 -- node_exporter (System Metrics)
Agent 2 -- promtail (Log Shipping)
Agent 3 -- heplify (SIP Packet Capture)
Agent 4 -- asterisk_exporter (Custom VoIP Metrics)
Prometheus Scrape Configuration
Verification Procedures
Firewall Rules
Handling OS Differences
Updating Agents
Troubleshooting
Summary

1. Why Centralized Monitoring Matters for VoIP

Distributed VoIP infrastructure -- multiple Asterisk/ViciDial servers spread across data centers -- creates a visibility problem. Each server generates its own metrics, its own logs, and its own SIP traffic, but problems (call quality degradation, trunk failures, agent disconnections, disk filling up with recordings) almost never announce themselves on the server where you happen to be logged in.

Without centralized monitoring, troubleshooting looks like this:

A client reports dropped calls.
You SSH into server A, grep through Asterisk logs, find nothing.
You SSH into server B, grep through different logs, find a clue.
You SSH into server C, check SIP peer status, find the actual problem.
Thirty minutes have passed. Calls were still dropping the entire time.

With centralized monitoring, troubleshooting looks like this:

A client reports dropped calls.
You open Grafana, filter by time range, see the SIP peer went UNREACHABLE at 14:32 on server C.
You open Homer, search SIP traffic for that trunk, see the last successful registration and the failure response.
You open Loki, query {server="charlie"} |= "chan_sip.c", see the exact Asterisk error.
Five minutes total, including fixing the issue.

The four monitoring agents in this tutorial form a complete observability stack:

Layer	Agent	What It Captures
System	node_exporter	CPU, RAM, disk, network, load averages
Logs	promtail	Asterisk logs, ViciDial logs, syslog
SIP	heplify	Every SIP transaction (INVITE, BYE, REGISTER, etc.)
Application	asterisk_exporter	Active calls, agent states, SIP peer health, RTP quality, queue depth

Together, they answer any question you could ask about what happened on any server at any point in time.

2. Architecture Overview

+------------------+     +------------------+     +------------------+
|  VoIP Server A   |     |  VoIP Server B   |     |  VoIP Server C   |
|  (openSUSE)      |     |  (CentOS 7)      |     |  (Ubuntu/Debian) |
|                  |     |                  |     |                  |
| node_exporter    |     | node_exporter    |     | node_exporter    |
|   :9100 ----+    |     |   :9100 ----+    |     |   :9100 ----+    |
|             |    |     |             |    |     |             |    |
| ast_exporter|    |     | ast_exporter|    |     | ast_exporter|    |
|   :9101 -+  |    |     |   :9101 -+  |    |     |   :9101 -+  |    |
|          |  |    |     |          |  |    |     |          |  |    |
| promtail |  |    |     | promtail |  |    |     | promtail |  |    |
|   :9080  |  |    |     |   :9080  |  |    |     |   :9080  |  |    |
|     |    |  |    |     |     |    |  |    |     |     |    |  |    |
| heplify  |  |    |     | heplify  |  |    |     | heplify  |  |    |
|     |    |  |    |     |     |    |  |    |     |     |    |  |    |
+-----|----+--+----+     +-----|----+--+----+     +-----|----+--+----+
      |    |  |                |    |  |                |    |  |
      |    |  | Prometheus     |    |  | Prometheus     |    |  |
      |    |  | scrape :9100   |    |  | scrape :9100   |    |  |
      |    |  | scrape :9101   |    |  | scrape :9101   |    |  |
      |    |  |                |    |  |                |    |  |
      |    +--|--------+-------|----+--|--------+-------|----+--+
      |       |        |       |       |        |       |
      |       v        v       |       v        v       |
      |  +--------------------+|  Prometheus    |       |
      |  | Prometheus (:9090) ||  scrapes       |       |
      |  +--------------------+|                |       |
      |                        |                |       |
      +-----------+------------+----------------+       |
                  |   push logs to :3100                |
                  v                                     |
         +--------------------+                         |
         |   Loki (:3100)     |                         |
         +--------------------+                         |
                                                        |
              +-----------------------------------------+
              |   push HEP/UDP to :9060
              v
         +-----------------------------+
         | heplify-server (:9060)      |
         |    |                        |
         |    v                        |
         | PostgreSQL (homer_data)     |
         |    |                        |
         |    v                        |
         | Homer WebApp (:9080)        |
         +-----------------------------+
                  |
                  v
         +--------------------+
         |  Grafana (:3000)   |  <-- unified dashboards
         +--------------------+

Data flow summary:

Metrics (node_exporter + asterisk_exporter): Prometheus on the monitoring server pulls metrics every 15 seconds from each VoIP server's :9100 and :9101 endpoints.
Logs (promtail): Each server's promtail agent pushes log entries to Loki on the monitoring server (:3100).
SIP traces (heplify): Each server's heplify agent captures SIP packets and sends them via HEP3/UDP to heplify-server on the monitoring server (:9060).

3. Prerequisites

On the monitoring server (central)

Prometheus running and accessible (typically Docker, port 9090)
Loki running and accepting pushes on port 3100
heplify-server running and accepting HEP on port 9060 (UDP+TCP)
SSH key-based access to all target VoIP servers (the install script runs commands via SSH as root)

On each VoIP server (target)

Root SSH access from the monitoring server (key-based recommended)
systemd as init system (all modern Linux distributions)
curl installed (for downloading binaries)
Python 3 with pip (for the asterisk_exporter)
Asterisk running (for VoIP-specific metrics)
MySQL/MariaDB with a read-only user (for ViciDial metrics)

Network requirements

Source	Destination	Port	Protocol	Purpose
Monitoring server	VoIP servers	9100	TCP	Prometheus scrapes node_exporter
Monitoring server	VoIP servers	9101	TCP	Prometheus scrapes asterisk_exporter
VoIP servers	Monitoring server	3100	TCP	promtail pushes logs to Loki
VoIP servers	Monitoring server	9060	UDP/TCP	heplify sends SIP to heplify-server

4. Agent Reference

Agent	Version	Port	Direction	Binary Path	Config Path
node_exporter	1.7.0	9100	Pull (Prometheus scrapes)	`/usr/local/bin/node_exporter`	N/A (CLI flags)
promtail	2.9.6	9080	Push (to Loki)	`/usr/local/bin/promtail`	`/etc/promtail/config.yml`
heplify	1.67.1	N/A	Push (to heplify-server)	`/usr/local/bin/heplify`	N/A (CLI flags)
asterisk_exporter	custom	9101	Pull (Prometheus scrapes)	`/opt/asterisk_exporter/asterisk_exporter.py`	N/A (env vars)

5. The Install Script

This script is designed to run from the monitoring server, deploying all four agents to a remote VoIP server over SSH in a single pass. It handles three OS families (openSUSE/SUSE, CentOS/RHEL, Ubuntu/Debian), is idempotent (safe to re-run), and verifies each binary before downloading.

Usage

./install-agents.sh <server_ip> <ssh_port> <server_label> <monitor_vps_ip>

Parameters:

Parameter	Description	Example
`server_ip`	IP address of the target VoIP server	`203.0.113.10`
`ssh_port`	SSH port on the target server	`22` or `9322`
`server_label`	Human-readable name used in metric labels	`alpha`, `bravo`, `charlie`
`monitor_vps_ip`	IP of the central monitoring server (where Loki and heplify-server run)	`YOUR_MONITORING_SERVER`

Example invocation:

bash install-agents.sh 203.0.113.10 9322 alpha YOUR_MONITORING_SERVER

Complete Script

#!/bin/bash
# install-agents.sh -- Install monitoring agents on a ViciDial/Asterisk server
# Usage: ./install-agents.sh <server_ip> <ssh_port> <server_label> <monitor_vps_ip>
# Supports: openSUSE, CentOS 7, Ubuntu/Debian

set -e

SERVER_IP="${1:?Usage: $0 <server_ip> <ssh_port> <server_label> <monitor_vps_ip>}"
SSH_PORT="${2:-22}"
SERVER_LABEL="${3:?Provide server label (alpha/bravo/charlie/delta)}"
MONITOR_IP="${4:?Provide monitoring VPS IP}"

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

echo "=== Installing monitoring agents on ${SERVER_LABEL} (${SERVER_IP}:${SSH_PORT}) ==="
echo "Monitor VPS: ${MONITOR_IP}"
echo ""

SSH_CMD="ssh -o StrictHostKeyChecking=no -p ${SSH_PORT} root@${SERVER_IP}"

# --------------------------------------------------------------------------
# 1. heplify (SIP capture)
# --------------------------------------------------------------------------
echo "[1/4] Installing heplify..."
${SSH_CMD} bash << REMOTEOF
set -e

if [ ! -f /usr/local/bin/heplify ] || ! /usr/local/bin/heplify -version 2>/dev/null | grep -q heplify; then
    rm -f /usr/local/bin/heplify
    curl -sL https://github.com/sipcapture/heplify/releases/download/v1.67.1/heplify \
        -o /usr/local/bin/heplify
    chmod +x /usr/local/bin/heplify
    echo "  heplify binary installed"
else
    echo "  heplify already installed"
fi

cat > /etc/systemd/system/heplify.service << SVCFILE
[Unit]
Description=heplify SIP Capture Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/heplify -hs ${MONITOR_IP}:9060 -i any -dim "OPTIONS,NOTIFY" -e
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable heplify
systemctl restart heplify
echo "  heplify service started"
REMOTEOF

# --------------------------------------------------------------------------
# 2. node_exporter (system metrics)
# --------------------------------------------------------------------------
echo "[2/4] Installing node_exporter..."
${SSH_CMD} bash << 'REMOTEOF'
set -e

if [ ! -f /usr/local/bin/node_exporter ]; then
    cd /tmp
    curl -sL https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz \
        | tar xz
    cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
    rm -rf node_exporter-1.7.0.linux-amd64*
    echo "  node_exporter binary installed"
else
    echo "  node_exporter already installed"
fi

cat > /etc/systemd/system/node_exporter.service << 'SVCFILE'
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable node_exporter
systemctl restart node_exporter
echo "  node_exporter service started"
REMOTEOF

# --------------------------------------------------------------------------
# 3. promtail (log shipping)
# --------------------------------------------------------------------------
echo "[3/4] Installing promtail..."
${SSH_CMD} bash << REMOTEOF
set -e

if [ ! -f /usr/local/bin/promtail ]; then
    cd /tmp
    curl -sL https://github.com/grafana/loki/releases/download/v2.9.6/promtail-linux-amd64.zip \
        -o promtail.zip

    # Install unzip -- works across all OS families
    if command -v apt-get &>/dev/null; then
        apt-get install -y unzip 2>/dev/null || true
    elif command -v zypper &>/dev/null; then
        zypper install -y unzip 2>/dev/null || true
    elif command -v yum &>/dev/null; then
        yum install -y unzip 2>/dev/null || true
    fi

    unzip -o promtail.zip
    mv promtail-linux-amd64 /usr/local/bin/promtail
    chmod +x /usr/local/bin/promtail
    rm -f promtail.zip
    echo "  promtail binary installed"
else
    echo "  promtail already installed"
fi

mkdir -p /etc/promtail

cat > /etc/promtail/config.yml << CFGFILE
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://${MONITOR_IP}:3100/loki/api/v1/push

scrape_configs:
  - job_name: asterisk_messages
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: ${SERVER_LABEL}
          logtype: messages
          __path__: /var/log/asterisk/messages

  - job_name: asterisk_full
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: ${SERVER_LABEL}
          logtype: full
          __path__: /var/log/asterisk/full

  - job_name: vicidial
    static_configs:
      - targets: [localhost]
        labels:
          job: vicidial
          server: ${SERVER_LABEL}
          logtype: vicidial
          __path__: /var/log/astguiclient/*.log

  - job_name: syslog
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          server: ${SERVER_LABEL}
          logtype: syslog
          __path__: /var/log/messages
CFGFILE

mkdir -p /var/lib/promtail

cat > /etc/systemd/system/promtail.service << 'SVCFILE'
[Unit]
Description=Promtail Log Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable promtail
systemctl restart promtail
echo "  promtail service started"
REMOTEOF

# --------------------------------------------------------------------------
# 4. asterisk_exporter (VoIP metrics)
# --------------------------------------------------------------------------
echo "[4/4] Installing asterisk_exporter..."
${SSH_CMD} bash << REMOTEOF
set -e
mkdir -p /opt/asterisk_exporter
REMOTEOF

# Copy the exporter script from the local scripts directory
scp -o StrictHostKeyChecking=no -P ${SSH_PORT} \
    ${SCRIPT_DIR}/asterisk_exporter.py \
    root@${SERVER_IP}:/opt/asterisk_exporter/asterisk_exporter.py

# Install Python dependencies and create systemd service
${SSH_CMD} bash << REMOTEOF
set -e

# Find the right python3 binary -- try versioned names first
PYTHON_BIN=""
for p in python3.11 python3.6 python3; do
    if command -v \$p &>/dev/null; then
        PYTHON_BIN=\$(command -v \$p)
        break
    fi
done

if [ -z "\$PYTHON_BIN" ]; then
    # CentOS 7: install python3 via yum
    if command -v yum &>/dev/null; then
        yum install -y python3 python3-pip 2>/dev/null || true
        PYTHON_BIN=\$(command -v python3)
    fi
fi

echo "  Using Python: \$PYTHON_BIN"

# Install mysql-connector (try latest first, fall back to <8.1 for old Python)
\$PYTHON_BIN -m pip install mysql-connector-python 2>/dev/null \
    || \$PYTHON_BIN -m pip install "mysql-connector-python<8.1" 2>/dev/null \
    || true

# Verify the import works
\$PYTHON_BIN -c "import mysql.connector; print('  mysql-connector OK')" \
    || echo "  WARNING: mysql-connector import failed"

chmod +x /opt/asterisk_exporter/asterisk_exporter.py

cat > /etc/systemd/system/asterisk_exporter.service << SVCFILE
[Unit]
Description=Asterisk/ViciDial Prometheus Exporter
After=network.target mariadb.service asterisk.service
Wants=mariadb.service

[Service]
Type=simple
ExecStart=\$PYTHON_BIN /opt/asterisk_exporter/asterisk_exporter.py
Restart=always
RestartSec=10
Environment=EXPORTER_PORT=9101
Environment=MYSQL_HOST=localhost
Environment=MYSQL_USER=cron
Environment=MYSQL_PASS=1234
Environment=MYSQL_DB=asterisk
Environment=SERVER_LABEL=${SERVER_LABEL}

[Install]
WantedBy=multi-user.target
SVCFILE

systemctl daemon-reload
systemctl enable asterisk_exporter
systemctl restart asterisk_exporter
echo "  asterisk_exporter service started"
REMOTEOF

echo ""
echo "=== All 4 agents installed on ${SERVER_LABEL} (${SERVER_IP}) ==="
echo "  heplify       -> sending HEP to ${MONITOR_IP}:9060"
echo "  node_exporter -> :9100"
echo "  promtail      -> shipping logs to ${MONITOR_IP}:3100"
echo "  ast_exporter  -> :9101"
echo ""

How the script works

SSH-based deployment: The script runs entirely from the monitoring server. Each section opens an SSH session, streams a heredoc of shell commands to execute on the remote host, then closes the connection.
Idempotent: Before downloading any binary, it checks whether the file already exists at the expected path. Re-running the script on a server that already has agents installed will simply restart the services with the latest configuration.
OS-agnostic package installation: When unzip is needed for promtail, the script detects the package manager (apt-get, zypper, or yum) and installs accordingly.
Python version detection: For the asterisk_exporter, the script tries python3.11, python3.6, and python3 in order, covering openSUSE (which ships 3.11), CentOS 7 (which uses 3.6), and Ubuntu/Debian (which use python3).
scp for the exporter: The asterisk_exporter is a custom Python script, so it gets copied from the monitoring server's scripts/ directory via scp rather than downloaded from a release URL.

6. Agent 1 -- node_exporter

What it does

node_exporter is the standard Prometheus exporter for hardware and OS-level metrics. It exposes approximately 277 metrics covering CPU, memory, disk, network, filesystem, and load average data.

Binary installation

cd /tmp
curl -sL https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz \
    | tar xz
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-1.7.0.linux-amd64*

The tarball contains a single static binary (~20 MB). No dependencies, no runtime, no configuration file. It runs on any Linux x86_64 system regardless of distribution.

systemd service file

Path: /etc/systemd/system/node_exporter.service

[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Key settings:

--web.listen-address=:9100 -- Listens on all interfaces, port 9100. Change to 127.0.0.1:9100 if you want localhost-only access (not useful for remote Prometheus scraping).
Restart=always + RestartSec=10 -- Automatically restarts on crash after a 10-second delay.

Optional: Enabling/disabling specific collectors

By default, node_exporter enables a broad set of collectors. For VoIP servers with heavy I/O, you may want to add flags to include or exclude specific collectors:

# Include only the collectors you care about:
ExecStart=/usr/local/bin/node_exporter \
    --web.listen-address=:9100 \
    --collector.cpu \
    --collector.meminfo \
    --collector.diskstats \
    --collector.filesystem \
    --collector.loadavg \
    --collector.netdev \
    --collector.stat \
    --collector.time \
    --collector.uname \
    --no-collector.wifi \
    --no-collector.infiniband \
    --no-collector.nfs \
    --no-collector.nfsd

For most VoIP deployments, the default set is fine.

Key metrics for VoIP servers

Metric	What to watch
`node_cpu_seconds_total`	High `system` or `iowait` indicates Asterisk is under load
`node_memory_MemAvailable_bytes`	Asterisk leaks memory slowly; watch for steady decline
`node_filesystem_avail_bytes`	Recordings fill disks; alert at 80%
`node_load1` / `node_load5`	Should stay below CPU count during peak hours
`node_network_receive_bytes_total`	Baseline for detecting DDoS or SIP floods
`node_disk_io_time_seconds_total`	High I/O wait degrades call recording quality

Enabling and starting

systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter

Quick test

curl -s http://localhost:9100/metrics | head -20

You should see lines like:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 1.234567e+06
node_cpu_seconds_total{cpu="0",mode="system"} 12345.67
...

7. Agent 2 -- promtail

What it does

promtail is the log-shipping agent for Grafana Loki. It tails log files on the VoIP server, attaches labels (server name, log type), and pushes the entries to a central Loki instance. This enables centralized log search across all servers from Grafana.

Binary installation

cd /tmp
curl -sL https://github.com/grafana/loki/releases/download/v2.9.6/promtail-linux-amd64.zip \
    -o promtail.zip
unzip -o promtail.zip
mv promtail-linux-amd64 /usr/local/bin/promtail
chmod +x /usr/local/bin/promtail
rm -f promtail.zip

Unlike node_exporter, promtail is distributed as a zip file containing a single binary.

Configuration file

Path: /etc/promtail/config.yml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://YOUR_MONITORING_SERVER:3100/loki/api/v1/push

scrape_configs:
  - job_name: asterisk_messages
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: YOUR_SERVER_LABEL
          logtype: messages
          __path__: /var/log/asterisk/messages

  - job_name: asterisk_full
    static_configs:
      - targets: [localhost]
        labels:
          job: asterisk
          server: YOUR_SERVER_LABEL
          logtype: full
          __path__: /var/log/asterisk/full

  - job_name: vicidial
    static_configs:
      - targets: [localhost]
        labels:
          job: vicidial
          server: YOUR_SERVER_LABEL
          logtype: vicidial
          __path__: /var/log/astguiclient/*.log

  - job_name: syslog
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          server: YOUR_SERVER_LABEL
          logtype: syslog
          __path__: /var/log/messages

Configuration breakdown

server block:

http_listen_port: 9080 -- promtail exposes a status API here. You can visit http://localhost:9080/targets to see which log files are being tailed and their current read position.
grpc_listen_port: 0 -- Disables the gRPC server (not needed for push mode).

positions block:

filename: /var/lib/promtail/positions.yaml -- Tracks the byte offset of each tailed file. This ensures promtail resumes from where it left off after a restart, without re-sending old log lines. The directory must exist (the install script creates it).

clients block:

url -- The Loki push endpoint. Replace YOUR_MONITORING_SERVER with the IP of your central monitoring server.

scrape_configs block -- the four log sources:

Job Name	Path	What It Captures
`asterisk_messages`	`/var/log/asterisk/messages`	Asterisk NOTICE/WARNING/ERROR messages (SIP registration failures, channel errors, peer unreachable)
`asterisk_full`	`/var/log/asterisk/full`	Full verbose Asterisk log (every dialplan step, every SIP message -- high volume)
`vicidial`	`/var/log/astguiclient/*.log`	ViciDial application logs (agent login/logout, call routing, list loading, campaign actions)
`syslog`	`/var/log/messages`	System syslog (kernel, cron, auth, services)

Labels explained:

job -- Groups related logs. Use asterisk for Asterisk logs, vicidial for application logs, syslog for system logs.
server -- The server label (e.g., alpha, bravo). This is the most important label for filtering in Grafana.
logtype -- Distinguishes between different log files from the same server.
__path__ -- Special label telling promtail which file to tail. Supports globs (*.log).

Important: The positions file

The positions file (/var/lib/promtail/positions.yaml) is critical. It looks like this after promtail has been running:

positions:
  /var/log/asterisk/messages: "4521789"
  /var/log/asterisk/full: "98234567"
  /var/log/astguiclient/VDadapt.log: "12345"
  /var/log/messages: "67890123"

Each number is the byte offset where promtail last read. If this file is deleted, promtail will re-read and re-send all log data from the beginning of each file. On a busy server, this can mean pushing gigabytes of old logs to Loki. If you need to reset positions, do so deliberately and during a low-traffic period.

systemd service file

Path: /etc/systemd/system/promtail.service

[Unit]
Description=Promtail Log Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Querying logs in Grafana (Loki)

Once promtail is shipping logs, query them in Grafana using LogQL:

# All Asterisk errors on server alpha
{server="alpha", job="asterisk"} |= "ERROR"

# SIP registration failures across all servers
{job="asterisk", logtype="messages"} |= "Registration" |= "failed"

# ViciDial agent login events on a specific server
{server="charlie", job="vicidial"} |= "LOGIN"

# High-volume: all full Asterisk logs (use with time range filter)
{server="delta", logtype="full"}

Note on syslog path

On Debian/Ubuntu, the system log is /var/log/syslog rather than /var/log/messages. If deploying to Debian/Ubuntu, update the syslog scrape config:

  - job_name: syslog
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          server: YOUR_SERVER_LABEL
          logtype: syslog
          __path__: /var/log/syslog

The install script as written uses /var/log/messages, which works on openSUSE, CentOS, and RHEL. On Debian/Ubuntu, if /var/log/messages does not exist, promtail will log a warning but continue running -- it simply will not ship syslog data. Adjust the path if needed.

8. Agent 3 -- heplify

What it does

heplify is a SIP packet capture agent. It sniffs network traffic, extracts SIP messages (INVITE, ACK, BYE, REGISTER, etc.), and sends them to a Homer server using the HEP3 (Homer Encapsulation Protocol) format over UDP.

This gives you full SIP call flow visualization: you can search for any call by phone number, SIP Call-ID, or time range and see every SIP message in the transaction, including response codes, SDP negotiation, and timing.

Binary installation

curl -sL https://github.com/sipcapture/heplify/releases/download/v1.67.1/heplify \
    -o /usr/local/bin/heplify
chmod +x /usr/local/bin/heplify

Single static binary, no dependencies.

systemd service file

Path: /etc/systemd/system/heplify.service

[Unit]
Description=heplify SIP Capture Agent
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/heplify -hs YOUR_MONITORING_SERVER:9060 -i any -dim "OPTIONS,NOTIFY" -e
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Command-line flags explained

Flag	Value	Purpose
`-hs`	`YOUR_MONITORING_SERVER:9060`	Homer server address and port (HEP receiver)
`-i`	`any`	Capture on all network interfaces
`-dim`	`"OPTIONS,NOTIFY"`	Discard In Method -- drop these SIP methods before sending. OPTIONS keepalives and NOTIFY events are extremely high-volume and rarely useful for troubleshooting
`-e`	(flag)	Send via UDP (the default and recommended transport for HEP)

Why filter OPTIONS and NOTIFY

On a typical VoIP server with 30 SIP peers, each sending OPTIONS keepalives every 60 seconds, that is 30 messages/minute -- 43,200 per day -- of pure noise. NOTIFY messages (for MWI, dialog-info, etc.) add another significant volume. Filtering these at the capture level reduces:

Network traffic between the VoIP server and the Homer server
Storage consumption in the Homer PostgreSQL database
Query time when searching for actual call flows

The important SIP methods (INVITE, BYE, REGISTER, ACK, CANCEL, REFER, UPDATE) are all captured and forwarded.

Additional useful flags

# Capture only on a specific interface (useful if the server has multiple NICs)
-i eth0

# Filter to a specific port (only capture SIP on port 5060)
-p "port 5060"

# Also capture RTCP (RTP Control Protocol) for quality metrics
-pr 5060-5062

# Set a custom HEP node ID (useful when multiple servers send to the same Homer)
-hn 100

# Enable TLS for HEP transport (if your Homer server requires it)
-hs YOUR_MONITORING_SERVER:9060 -ht tls

Verifying heplify is capturing

# Check the service is running
systemctl status heplify

# Check recent logs for capture stats
journalctl -u heplify --no-pager -n 20

You should see output like:

heplify: sending to YOUR_MONITORING_SERVER:9060
heplify: captured 47 packets, sent 47

On the Homer server side, verify data is arriving:

# Check heplify-server is receiving packets
docker logs --tail 20 heplify-server

9. Agent 4 -- asterisk_exporter

What it does

The asterisk_exporter is a custom Python script that queries Asterisk CLI commands and ViciDial's MySQL database to expose VoIP-specific metrics as a Prometheus endpoint on port 9101. This fills the gap between generic system metrics (node_exporter) and what you actually need to monitor in a call center: active calls, agent states, SIP peer health, RTP quality, queue depth, and more.

A full walkthrough of the exporter's internals is available in Tutorial 08 -- Building a Custom Asterisk Prometheus Exporter. This section covers only what you need for deployment.

File location

/opt/asterisk_exporter/
    asterisk_exporter.py     # The exporter script (copied from monitoring server)

Dependencies

Python 3 (3.6+ on CentOS 7, 3.11+ on openSUSE, 3.8+ on Ubuntu/Debian)
mysql-connector-python (pip package)
Asterisk (the exporter calls asterisk -rx to run CLI commands)
MySQL/MariaDB (for ViciDial metrics)

systemd service file

Path: /etc/systemd/system/asterisk_exporter.service

[Unit]
Description=Asterisk/ViciDial Prometheus Exporter
After=network.target mariadb.service asterisk.service
Wants=mariadb.service

[Service]
Type=simple
ExecStart=/usr/bin/python3 /opt/asterisk_exporter/asterisk_exporter.py
Restart=always
RestartSec=10
Environment=EXPORTER_PORT=9101
Environment=MYSQL_HOST=localhost
Environment=MYSQL_USER=cron
Environment=MYSQL_PASS=1234
Environment=MYSQL_DB=asterisk
Environment=SERVER_LABEL=YOUR_SERVER_LABEL

[Install]
WantedBy=multi-user.target

Environment variables

Variable	Default	Purpose
`EXPORTER_PORT`	`9101`	Port to listen on
`MYSQL_HOST`	`localhost`	MySQL server address
`MYSQL_USER`	`cron`	MySQL user (needs SELECT on vicidial tables)
`MYSQL_PASS`	`1234`	MySQL password
`MYSQL_DB`	`asterisk`	Database name
`SERVER_LABEL`	`alpha`	Label attached to all metrics

Important: The MySQL user only needs SELECT privileges. Use an existing read-only user or create one:

CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'YOUR_PASSWORD';
GRANT SELECT ON asterisk.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;

Exposed metrics

The exporter exposes approximately 25 metric families:

Metric	Type	Description
`asterisk_active_calls`	gauge	Current active calls
`asterisk_active_channels`	gauge	Current active channels
`asterisk_sip_peer_up`	gauge	SIP peer reachability (1=up, 0=down)
`asterisk_sip_peer_latency_ms`	gauge	SIP peer qualify latency in milliseconds
`asterisk_sip_peer_status`	gauge	SIP peer status string (OK, Lagged, UNREACHABLE)
`asterisk_agents_logged_in`	gauge	Number of agents logged in
`asterisk_agents_incall`	gauge	Number of agents currently in a call
`asterisk_agents_paused`	gauge	Number of agents in pause state
`asterisk_agents_waiting`	gauge	Number of agents ready/waiting
`asterisk_agent_status`	gauge	Per-agent status (INCALL, PAUSED, READY, CLOSER)
`asterisk_agent_incall_duration_seconds`	gauge	How long the current agent has been in-call
`asterisk_agent_pause_duration_seconds`	gauge	How long the current agent has been paused
`asterisk_queue_depth`	gauge	Calls waiting in queue per inbound group
`asterisk_rtp_packet_loss_percent`	gauge	RTP packet loss percentage per channel
`asterisk_rtp_jitter_ms`	gauge	RTP jitter in milliseconds per channel
`asterisk_rtp_rtt_ms`	gauge	RTP round-trip time per channel
`asterisk_uptime_seconds`	gauge	Asterisk system uptime
`asterisk_confbridge_count`	gauge	Active ConfBridge/MeetMe conferences
`asterisk_channels_by_codec`	gauge	Channel count per codec (alaw, ulaw, g722, etc.)
`asterisk_transcoding_channels`	gauge	Channels actively transcoding between codecs
`asterisk_fail2ban_active_bans`	gauge	Current fail2ban active bans per jail
`asterisk_fail2ban_bans_total`	counter	Total fail2ban bans per jail
`asterisk_recordings_missing`	gauge	CDR entries from the last hour without matching recordings

How it collects data

The exporter uses two data sources:

1. Asterisk CLI (via asterisk -rx):

asterisk -rx "sip show peers"          # SIP peer status and latency
asterisk -rx "core show channels"      # Active call/channel counts
asterisk -rx "sip show channelstats"   # RTP quality (loss, jitter, RTT)
asterisk -rx "core show uptime seconds"  # Asterisk uptime
asterisk -rx "confbridge list"         # Active conferences
asterisk -rx "core show channel SIP/..." # Per-channel codec/transcoding info

2. ViciDial MySQL (via mysql-connector-python):

-- Agent states
SELECT status, COUNT(*) FROM vicidial_live_agents GROUP BY status;

-- Per-agent detail
SELECT user, status, pause_code,
       TIMESTAMPDIFF(SECOND, last_state_change, NOW()) as state_duration
FROM vicidial_live_agents;

-- Queue depth
SELECT campaign_id, COUNT(*) FROM vicidial_auto_calls
WHERE status = 'LIVE' GROUP BY campaign_id;

-- Missing recordings
SELECT COUNT(*) FROM vicidial_closer_log cl
LEFT JOIN recording_log rl ON ...
WHERE cl.call_date >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
AND cl.length_in_sec > 10 AND rl.recording_id IS NULL;

Quick test

curl -s http://localhost:9101/metrics | grep asterisk_active

Expected output:

# HELP asterisk_active_calls Number of active calls
# TYPE asterisk_active_calls gauge
asterisk_active_calls{server="alpha"} 12
# HELP asterisk_active_channels Number of active channels
# TYPE asterisk_active_channels gauge
asterisk_active_channels{server="alpha"} 24

10. Prometheus Scrape Configuration

On the monitoring server, add scrape targets for each VoIP server in your Prometheus configuration.

Path: prometheus/prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  # --- Prometheus self-monitoring ---
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # --- Node Exporter (system metrics per server) ---
  - job_name: "node"
    static_configs:
      - targets: ["SERVER_A_IP:9100"]
        labels:
          server: "alpha"
      - targets: ["SERVER_B_IP:9100"]
        labels:
          server: "bravo"
      - targets: ["SERVER_C_IP:9100"]
        labels:
          server: "charlie"
      - targets: ["SERVER_D_IP:9100"]
        labels:
          server: "delta"

  # --- Asterisk Exporter (VoIP metrics per server) ---
  - job_name: "asterisk"
    scrape_interval: 15s
    static_configs:
      - targets: ["SERVER_A_IP:9101"]
        labels:
          server: "alpha"
      - targets: ["SERVER_B_IP:9101"]
        labels:
          server: "bravo"
      - targets: ["SERVER_C_IP:9101"]
        labels:
          server: "charlie"
      - targets: ["SERVER_D_IP:9101"]
        labels:
          server: "delta"

  # --- heplify-server metrics (on the monitoring server itself) ---
  - job_name: "heplify-server"
    static_configs:
      - targets: ["heplify-server:9096"]

Key points

job_name: "node" groups all node_exporter targets. The server label differentiates them.
job_name: "asterisk" groups all asterisk_exporter targets. The server label matches the SERVER_LABEL used during agent installation.
Consistent labels: The server label used in Prometheus must match the server label used in promtail config and the SERVER_LABEL environment variable in the asterisk_exporter. This allows you to correlate metrics, logs, and SIP traces for the same server.
Scrape interval: 15 seconds is a good default. The asterisk_exporter collects data on each scrape, so shorter intervals increase load on Asterisk and MySQL.

Adding a new server

When you deploy agents to a new server:

Run install-agents.sh with the new server's IP, SSH port, and label.
Add two new entries to prometheus.yml (one under node, one under asterisk).

Reload Prometheus:

# If running in Docker:
docker exec prometheus kill -HUP 1
# Or via API (if --web.enable-lifecycle is set):
curl -X POST http://localhost:9090/-/reload

Verifying targets in Prometheus

Open http://YOUR_MONITORING_SERVER:9090/targets in a browser. You should see all configured targets with their status (UP or DOWN), last scrape time, and scrape duration.

11. Verification Procedures

After running install-agents.sh, verify each agent is working correctly.

From the VoIP server (SSH in and check locally)

# --- 1. node_exporter ---
systemctl status node_exporter
# Should show: active (running)

curl -s http://localhost:9100/metrics | head -5
# Should show Prometheus metric lines

# --- 2. promtail ---
systemctl status promtail
# Should show: active (running)

curl -s http://localhost:9080/targets
# Should show each scrape target and its status

curl -s http://localhost:9080/ready
# Should return: "ready"

# --- 3. heplify ---
systemctl status heplify
# Should show: active (running)

journalctl -u heplify --no-pager -n 10
# Should show capture statistics (packets captured/sent)

# --- 4. asterisk_exporter ---
systemctl status asterisk_exporter
# Should show: active (running)

curl -s http://localhost:9101/metrics | grep "asterisk_active_calls"
# Should show the current call count

From the monitoring server (check data is arriving)

# --- Prometheus targets (check all UP) ---
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep -A2 '"health"'

# --- Query node_exporter data ---
curl -s --data-urlencode 'query=up{job="node"}' http://localhost:9090/api/v1/query \
    | python3 -m json.tool

# --- Query asterisk_exporter data ---
curl -s --data-urlencode 'query=asterisk_active_calls' http://localhost:9090/api/v1/query \
    | python3 -m json.tool

# --- Check Loki is receiving logs ---
curl -s -G --data-urlencode 'query={server="alpha"}' \
    --data-urlencode 'limit=5' \
    http://localhost:3100/loki/api/v1/query_range

# --- Check Homer is receiving SIP ---
# Look at heplify-server logs for packet counts
docker logs --tail 10 heplify-server

Verification checklist

Check	Command	Expected
node_exporter running	`systemctl is-active node_exporter`	`active`
node_exporter responding	`curl -s localhost:9100/metrics \| wc -l`	`> 200`
promtail running	`systemctl is-active promtail`	`active`
promtail tailing files	`curl -s localhost:9080/targets`	All targets "RUNNING"
heplify running	`systemctl is-active heplify`	`active`
heplify capturing	`journalctl -u heplify -n 5`	Shows packet counts
asterisk_exporter running	`systemctl is-active asterisk_exporter`	`active`
asterisk_exporter responding	`curl -s localhost:9101/metrics \| wc -l`	`> 30`
Prometheus scraping node	Prometheus UI `/targets`	`node` job shows UP
Prometheus scraping asterisk	Prometheus UI `/targets`	`asterisk` job shows UP

12. Firewall Rules

On each VoIP server (allow monitoring server to scrape)

The monitoring server needs TCP access to ports 9100 and 9101 on each VoIP server for Prometheus scraping:

# Allow the monitoring server to scrape exporters
iptables -I INPUT -s YOUR_MONITORING_SERVER -p tcp --dport 9100 -j ACCEPT
iptables -I INPUT -s YOUR_MONITORING_SERVER -p tcp --dport 9101 -j ACCEPT

# Persist rules (distribution-dependent)
# CentOS/RHEL:
iptables-save > /etc/sysconfig/iptables

# openSUSE:
iptables-save > /etc/sysconfig/iptables

# Ubuntu/Debian:
iptables-save > /etc/iptables/rules.v4

On the monitoring server (allow agents to push data)

Each VoIP server needs to reach the monitoring server on ports 3100 (Loki) and 9060 (Homer/heplify-server):

# Allow each VoIP server to push logs and SIP data
for SERVER_IP in SERVER_A_IP SERVER_B_IP SERVER_C_IP SERVER_D_IP; do
    iptables -I INPUT -s ${SERVER_IP} -p tcp --dport 3100 -j ACCEPT
    iptables -I INPUT -s ${SERVER_IP} -p udp --dport 9060 -j ACCEPT
    iptables -I INPUT -s ${SERVER_IP} -p tcp --dport 9060 -j ACCEPT
done

# Persist
iptables-save > /etc/iptables/rules.v4

If you are running the monitoring stack in Docker, you may need to add rules to the DOCKER-USER chain instead of INPUT:

iptables -I DOCKER-USER -s SERVER_A_IP -p tcp --dport 3100 -j ACCEPT
iptables -I DOCKER-USER -s SERVER_A_IP -p udp --dport 9060 -j ACCEPT

Port summary

Port	Protocol	Direction	Service
9100	TCP	Monitoring -> VoIP	node_exporter
9101	TCP	Monitoring -> VoIP	asterisk_exporter
3100	TCP	VoIP -> Monitoring	Loki (promtail push)
9060	UDP+TCP	VoIP -> Monitoring	heplify-server (HEP)

13. Handling OS Differences

The install script supports three Linux families. Here are the differences that matter and how the script handles them.

Package managers

OS Family	Package Manager	Used For
openSUSE/SLES	`zypper`	Installing `unzip`
CentOS/RHEL	`yum`	Installing `unzip`, `python3`, `python3-pip`
Ubuntu/Debian	`apt-get`	Installing `unzip`

The script detects the package manager by checking which command exists:

if command -v apt-get &>/dev/null; then
    apt-get install -y unzip
elif command -v zypper &>/dev/null; then
    zypper install -y unzip
elif command -v yum &>/dev/null; then
    yum install -y unzip
fi

Python versions

OS	Default Python 3	Binary Name
openSUSE 15.x	3.6 + 3.11 available	`python3.11` preferred
CentOS 7	Not installed by default	`python3` (3.6) after `yum install python3`
Ubuntu 22.04+	3.10+	`python3`
Debian 12	3.11	`python3`

The script tries versioned names first (python3.11, python3.6) before falling back to python3. On CentOS 7 where Python 3 is not installed, it runs yum install -y python3 python3-pip.

mysql-connector-python compatibility

The mysql-connector-python package version 8.1+ requires Python 3.8+. On CentOS 7 (Python 3.6), the install script falls back:

pip install mysql-connector-python 2>/dev/null \
    || pip install "mysql-connector-python<8.1" 2>/dev/null \
    || true

Syslog path

OS	Syslog Path
openSUSE, CentOS, RHEL	`/var/log/messages`
Ubuntu, Debian	`/var/log/syslog`

The default promtail config uses /var/log/messages. For Ubuntu/Debian deployments, update the config or add both paths.

Firewall persistence

OS	Save Command	Restore Mechanism
openSUSE	`iptables-save > /etc/sysconfig/iptables`	SuSEfirewall2 or manual
CentOS 7	`iptables-save > /etc/sysconfig/iptables`	`iptables-restore` in init
Ubuntu/Debian	`iptables-save > /etc/iptables/rules.v4`	`iptables-persistent` package

14. Updating Agents

Updating node_exporter

# On the VoIP server:
NEW_VERSION="1.8.0"  # Check https://github.com/prometheus/node_exporter/releases

systemctl stop node_exporter
cd /tmp
curl -sL "https://github.com/prometheus/node_exporter/releases/download/v${NEW_VERSION}/node_exporter-${NEW_VERSION}.linux-amd64.tar.gz" \
    | tar xz
cp "node_exporter-${NEW_VERSION}.linux-amd64/node_exporter" /usr/local/bin/
rm -rf "node_exporter-${NEW_VERSION}.linux-amd64"*
systemctl start node_exporter

# Verify
node_exporter --version

Updating promtail

NEW_VERSION="3.0.0"  # Check https://github.com/grafana/loki/releases

systemctl stop promtail
cd /tmp
curl -sL "https://github.com/grafana/loki/releases/download/v${NEW_VERSION}/promtail-linux-amd64.zip" \
    -o promtail.zip
unzip -o promtail.zip
mv promtail-linux-amd64 /usr/local/bin/promtail
chmod +x /usr/local/bin/promtail
rm -f promtail.zip
systemctl start promtail

# Verify -- promtail will pick up from the last position
promtail --version

Important: When updating promtail, do not delete /var/lib/promtail/positions.yaml. The new version will resume from the last recorded position.

Updating heplify

NEW_VERSION="1.68.0"  # Check https://github.com/sipcapture/heplify/releases

systemctl stop heplify
curl -sL "https://github.com/sipcapture/heplify/releases/download/v${NEW_VERSION}/heplify" \
    -o /usr/local/bin/heplify
chmod +x /usr/local/bin/heplify
systemctl start heplify

Updating asterisk_exporter

The asterisk_exporter is a custom script, so updates are done by copying the new version from the monitoring server:

# From the monitoring server:
scp -P SSH_PORT /opt/monitoring/scripts/asterisk_exporter.py \
    root@SERVER_IP:/opt/asterisk_exporter/asterisk_exporter.py

ssh -p SSH_PORT root@SERVER_IP "systemctl restart asterisk_exporter"

Bulk updates across all servers

For updating agents across multiple servers, wrap the update commands in a loop:

#!/bin/bash
# update-node-exporter.sh -- Update node_exporter on all servers
VERSION="1.8.0"
SERVERS=("SERVER_A_IP:9322:alpha" "SERVER_B_IP:9322:bravo" "SERVER_C_IP:9322:charlie")

for entry in "${SERVERS[@]}"; do
    IFS=: read -r ip port label <<< "$entry"
    echo "=== Updating ${label} (${ip}) ==="
    ssh -p ${port} root@${ip} bash << EOF
        systemctl stop node_exporter
        cd /tmp
        curl -sL "https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz" | tar xz
        cp "node_exporter-${VERSION}.linux-amd64/node_exporter" /usr/local/bin/
        rm -rf "node_exporter-${VERSION}.linux-amd64"*
        systemctl start node_exporter
        echo "  Updated to ${VERSION}"
EOF
done

15. Troubleshooting

Agent will not start

Symptom: systemctl status <agent> shows failed or inactive (dead).

# Check the full error
journalctl -u node_exporter --no-pager -n 30
journalctl -u promtail --no-pager -n 30
journalctl -u heplify --no-pager -n 30
journalctl -u asterisk_exporter --no-pager -n 30

Common causes:

Agent	Error	Fix
node_exporter	`address already in use :9100`	Another process is using port 9100. Check with `ss -tlnp \| grep 9100`
promtail	`permission denied` on log file	promtail runs as root by default, but check if log files have restrictive ACLs
promtail	`error creating positions file`	Create the directory: `mkdir -p /var/lib/promtail`
heplify	`permission denied: /dev/net/tun` or pcap errors	heplify needs root or `CAP_NET_RAW`. Ensure the service runs as root
asterisk_exporter	`ModuleNotFoundError: No module named 'mysql'`	Reinstall: `python3 -m pip install mysql-connector-python`
asterisk_exporter	`mysql.connector.errors.InterfaceError`	MySQL is not running, or credentials are wrong. Check `MYSQL_USER`/`MYSQL_PASS` in the service file

No metrics appearing in Prometheus

Symptom: Prometheus target shows DOWN, or metrics exist but return no data.

Step 1: Check the exporter is responding locally on the VoIP server:

curl -s http://localhost:9100/metrics | head -5   # node_exporter
curl -s http://localhost:9101/metrics | head -5   # asterisk_exporter

If this fails, the agent is not running or is listening on a different port.

Step 2: Check network connectivity from the monitoring server:

# From the monitoring server:
curl -s --connect-timeout 5 http://SERVER_IP:9100/metrics | head -5
curl -s --connect-timeout 5 http://SERVER_IP:9101/metrics | head -5

If this fails but Step 1 succeeded, it is a firewall issue. See Section 12.

Step 3: Check Prometheus configuration:

# Verify the target is configured
grep -A3 "SERVER_IP" prometheus/prometheus.yml

# Reload Prometheus after config changes
curl -X POST http://localhost:9090/-/reload

Step 4: Check Prometheus targets page:

Open http://YOUR_MONITORING_SERVER:9090/targets and look for error messages next to the target.

Logs not appearing in Loki/Grafana

Symptom: Querying {server="alpha"} in Grafana/Loki returns no results.

Step 1: Verify promtail is tailing the right files:

curl -s http://localhost:9080/targets

Look for entries showing RUNNING status and non-zero last_target_len.

Step 2: Check if the log files exist and are being written to:

ls -la /var/log/asterisk/messages
ls -la /var/log/asterisk/full
ls -la /var/log/astguiclient/
tail -1 /var/log/asterisk/messages

If a file does not exist (common: some servers do not have /var/log/asterisk/full), promtail will log a warning and skip it -- this is normal.

Step 3: Check promtail can reach Loki:

# From the VoIP server, test connectivity to Loki
curl -s -o /dev/null -w "%{http_code}" http://YOUR_MONITORING_SERVER:3100/ready
# Should return: 200

If this fails, check firewall rules on the monitoring server for port 3100.

Step 4: Check promtail logs for push errors:

journalctl -u promtail --no-pager -n 50 | grep -i "error\|level=error\|429"

Common errors:

429 Too Many Requests -- Loki rate limits exceeded. Increase ingestion_rate_mb and ingestion_burst_size_mb in Loki config.
connection refused -- Loki is not running or firewall is blocking.

Step 5: Check the positions file:

cat /var/lib/promtail/positions.yaml

If positions are advancing (numbers increasing on subsequent checks), promtail is reading the files. If positions are static, the log files are not being written to.

Log shipping delays

Symptom: Logs appear in Grafana/Loki but with a delay of minutes or hours.

Possible causes:

Loki ingestion rate limits: If promtail is sending faster than Loki can accept, entries queue up. Check promtail logs for 429 errors and increase Loki limits.
Large log files on first run: When promtail first starts on a server with large existing log files, it reads from the beginning. This can cause a large backlog. Consider setting positions.yaml to the end of each file before first start:
```
# Skip to end of existing logs (only for first deployment)
wc -c /var/log/asterisk/messages
# Use that byte count in positions.yaml
```
Clock skew: If the VoIP server's clock is off by more than a few minutes, Loki may reject entries as "too old" or "too far in the future." Verify NTP is running:
```
timedatectl status
# Or:
ntpq -p
```

SIP data not appearing in Homer

Symptom: Homer search returns no results for a server that should be sending SIP data.

Step 1: Verify heplify is capturing packets:

journalctl -u heplify --no-pager -n 20

If you see captured 0 packets, heplify may be listening on the wrong interface. Try specifying the interface explicitly:

# Find the correct interface
ip addr show | grep "inet "
# Update the service to use it
# -i eth0  instead of  -i any

Step 2: Verify SIP traffic exists on the server:

# Quick packet capture to confirm SIP is flowing
tcpdump -i any -c 10 port 5060 -n

If no SIP traffic is seen, the server may not have active SIP trunks or may use a non-standard SIP port.

Step 3: Test connectivity to heplify-server:

# UDP connectivity test (heplify sends HEP over UDP by default)
echo "test" | nc -u -w1 YOUR_MONITORING_SERVER 9060

Step 4: Check heplify-server logs on the monitoring server:

docker logs --tail 30 heplify-server

Look for incoming HEP packet counts or error messages.

asterisk_exporter shows partial metrics

Symptom: Some metrics (like asterisk_active_calls) work but others (like asterisk_agents_logged_in) are missing.

This usually means the MySQL connection is failing while Asterisk CLI commands succeed:

# Test MySQL connectivity with the same credentials
mysql -u cron -p1234 -e "SELECT COUNT(*) FROM asterisk.vicidial_live_agents;"

If this fails, check:

The MySQL user exists and has the correct password
The user has SELECT privileges on the asterisk database
MariaDB/MySQL is running: systemctl status mariadb

16. Summary

Deploying monitoring agents across distributed VoIP servers transforms your operational capabilities. Instead of reactive SSH-and-grep troubleshooting, you get a unified view of every server's health, every call's SIP flow, every agent's state, and every log entry -- all searchable from a single Grafana interface.

What you deployed

Agent	Port	Data Destination	What It Provides
node_exporter	9100	Prometheus (pull)	CPU, RAM, disk, network, load
promtail	9080	Loki (push)	Asterisk logs, ViciDial logs, syslog
heplify	--	heplify-server (push)	Complete SIP call flows
asterisk_exporter	9101	Prometheus (pull)	Active calls, agents, SIP peers, RTP, queues

The install process in brief

Run install-agents.sh from the monitoring server for each VoIP server.
Add scrape targets to prometheus.yml for node_exporter and asterisk_exporter.
Open firewall ports: 9100/9101 inbound on VoIP servers, 3100/9060 inbound on the monitoring server.
Verify: check Prometheus targets page, query Loki, search Homer.

Maintenance cadence

Task	Frequency	How
Check agent health	Daily (or alert on it)	Prometheus `up{}` metric
Update node_exporter	Quarterly	Download new release, restart service
Update promtail	Quarterly (match Loki version)	Download new release, restart service
Update heplify	As needed	Download new release, restart service
Update asterisk_exporter	When you add metrics	scp new script, restart service
Review promtail positions	After server reboot	Verify positions file is intact

This tutorial is part of a series on building production VoIP monitoring infrastructure. For the custom exporter internals, see Tutorial 08. For the central monitoring stack (Prometheus, Loki, Homer, Grafana), see Tutorials 01-07.

Deploying Monitoring Agents on VoIP Servers

Table of Contents

1. Why Centralized Monitoring Matters for VoIP

2. Architecture Overview

3. Prerequisites

On the monitoring server (central)

On each VoIP server (target)

Network requirements

4. Agent Reference

5. The Install Script

Usage

Complete Script

How the script works

6. Agent 1 -- node_exporter

What it does

Binary installation

systemd service file

Optional: Enabling/disabling specific collectors

Key metrics for VoIP servers

Enabling and starting

Quick test

7. Agent 2 -- promtail

What it does

Binary installation

Configuration file

Configuration breakdown

Important: The positions file

systemd service file

Querying logs in Grafana (Loki)

Note on syslog path

8. Agent 3 -- heplify

What it does

Binary installation

systemd service file

Command-line flags explained

Why filter OPTIONS and NOTIFY

Additional useful flags

Verifying heplify is capturing

9. Agent 4 -- asterisk_exporter

What it does

File location

Dependencies

systemd service file

Environment variables

Exposed metrics

How it collects data

Quick test

10. Prometheus Scrape Configuration

Key points

Adding a new server

Verifying targets in Prometheus

11. Verification Procedures

From the VoIP server (SSH in and check locally)

From the monitoring server (check data is arriving)

Verification checklist

12. Firewall Rules

On each VoIP server (allow monitoring server to scrape)

On the monitoring server (allow agents to push data)

Port summary

13. Handling OS Differences

Package managers

Python versions

mysql-connector-python compatibility

Syslog path

Firewall persistence

14. Updating Agents

Updating node_exporter

Updating promtail

Updating heplify

Updating asterisk_exporter

Bulk updates across all servers

15. Troubleshooting

Agent will not start

No metrics appearing in Prometheus

Logs not appearing in Loki/Grafana

Log shipping delays

SIP data not appearing in Homer

asterisk_exporter shows partial metrics

16. Summary

What you deployed