Tutorial 17 -- Deploying Monitoring Agents on VoIP Servers
Deploy node_exporter, promtail, heplify, and a custom Asterisk exporter to distributed VoIP servers from a central monitoring host via SSH.
Table of Contents
- Why Centralized Monitoring Matters for VoIP
- Architecture Overview
- Prerequisites
- Agent Reference
- The Install Script (install-agents.sh)
- Agent 1 -- node_exporter (System Metrics)
- Agent 2 -- promtail (Log Shipping)
- Agent 3 -- heplify (SIP Packet Capture)
- Agent 4 -- asterisk_exporter (Custom VoIP Metrics)
- Prometheus Scrape Configuration
- Verification Procedures
- Firewall Rules
- Handling OS Differences
- Updating Agents
- Troubleshooting
- Summary
1. Why Centralized Monitoring Matters for VoIP
Distributed VoIP infrastructure -- multiple Asterisk/ViciDial servers spread across data centers -- creates a visibility problem. Each server generates its own metrics, its own logs, and its own SIP traffic, but problems (call quality degradation, trunk failures, agent disconnections, disk filling up with recordings) almost never announce themselves on the server where you happen to be logged in.
Without centralized monitoring, troubleshooting looks like this:
- A client reports dropped calls.
- You SSH into server A, grep through Asterisk logs, find nothing.
- You SSH into server B, grep through different logs, find a clue.
- You SSH into server C, check SIP peer status, find the actual problem.
- Thirty minutes have passed. Calls were still dropping the entire time.
With centralized monitoring, troubleshooting looks like this:
- A client reports dropped calls.
- You open Grafana, filter by time range, see the SIP peer went UNREACHABLE at 14:32 on server C.
- You open Homer, search SIP traffic for that trunk, see the last successful registration and the failure response.
- You open Loki, query
{server="charlie"} |= "chan_sip.c", see the exact Asterisk error. - Five minutes total, including fixing the issue.
The four monitoring agents in this tutorial form a complete observability stack:
| Layer | Agent | What It Captures |
|---|---|---|
| System | node_exporter | CPU, RAM, disk, network, load averages |
| Logs | promtail | Asterisk logs, ViciDial logs, syslog |
| SIP | heplify | Every SIP transaction (INVITE, BYE, REGISTER, etc.) |
| Application | asterisk_exporter | Active calls, agent states, SIP peer health, RTP quality, queue depth |
Together, they answer any question you could ask about what happened on any server at any point in time.
2. Architecture Overview
+------------------+ +------------------+ +------------------+
| VoIP Server A | | VoIP Server B | | VoIP Server C |
| (openSUSE) | | (CentOS 7) | | (Ubuntu/Debian) |
| | | | | |
| node_exporter | | node_exporter | | node_exporter |
| :9100 ----+ | | :9100 ----+ | | :9100 ----+ |
| | | | | | | | |
| ast_exporter| | | ast_exporter| | | ast_exporter| |
| :9101 -+ | | | :9101 -+ | | | :9101 -+ | |
| | | | | | | | | | | |
| promtail | | | | promtail | | | | promtail | | |
| :9080 | | | | :9080 | | | | :9080 | | |
| | | | | | | | | | | | | | |
| heplify | | | | heplify | | | | heplify | | |
| | | | | | | | | | | | | | |
+-----|----+--+----+ +-----|----+--+----+ +-----|----+--+----+
| | | | | | | | |
| | | Prometheus | | | Prometheus | | |
| | | scrape :9100 | | | scrape :9100 | | |
| | | scrape :9101 | | | scrape :9101 | | |
| | | | | | | | |
| +--|--------+-------|----+--|--------+-------|----+--+
| | | | | | |
| v v | v v |
| +--------------------+| Prometheus | |
| | Prometheus (:9090) || scrapes | |
| +--------------------+| | |
| | | |
+-----------+------------+----------------+ |
| push logs to :3100 |
v |
+--------------------+ |
| Loki (:3100) | |
+--------------------+ |
|
+-----------------------------------------+
| push HEP/UDP to :9060
v
+-----------------------------+
| heplify-server (:9060) |
| | |
| v |
| PostgreSQL (homer_data) |
| | |
| v |
| Homer WebApp (:9080) |
+-----------------------------+
|
v
+--------------------+
| Grafana (:3000) | <-- unified dashboards
+--------------------+
Data flow summary:
- Metrics (node_exporter + asterisk_exporter): Prometheus on the monitoring server pulls metrics every 15 seconds from each VoIP server's
:9100and:9101endpoints. - Logs (promtail): Each server's promtail agent pushes log entries to Loki on the monitoring server (
:3100). - SIP traces (heplify): Each server's heplify agent captures SIP packets and sends them via HEP3/UDP to heplify-server on the monitoring server (
:9060).
3. Prerequisites
On the monitoring server (central)
- Prometheus running and accessible (typically Docker, port 9090)
- Loki running and accepting pushes on port 3100
- heplify-server running and accepting HEP on port 9060 (UDP+TCP)
- SSH key-based access to all target VoIP servers (the install script runs commands via SSH as root)
On each VoIP server (target)
- Root SSH access from the monitoring server (key-based recommended)
- systemd as init system (all modern Linux distributions)
- curl installed (for downloading binaries)
- Python 3 with pip (for the asterisk_exporter)
- Asterisk running (for VoIP-specific metrics)
- MySQL/MariaDB with a read-only user (for ViciDial metrics)
Network requirements
| Source | Destination | Port | Protocol | Purpose |
|---|---|---|---|---|
| Monitoring server | VoIP servers | 9100 | TCP | Prometheus scrapes node_exporter |
| Monitoring server | VoIP servers | 9101 | TCP | Prometheus scrapes asterisk_exporter |
| VoIP servers | Monitoring server | 3100 | TCP | promtail pushes logs to Loki |
| VoIP servers | Monitoring server | 9060 | UDP/TCP | heplify sends SIP to heplify-server |
4. Agent Reference
| Agent | Version | Port | Direction | Binary Path | Config Path |
|---|---|---|---|---|---|
| node_exporter | 1.7.0 | 9100 | Pull (Prometheus scrapes) | /usr/local/bin/node_exporter |
N/A (CLI flags) |
| promtail | 2.9.6 | 9080 | Push (to Loki) | /usr/local/bin/promtail |
/etc/promtail/config.yml |
| heplify | 1.67.1 | N/A | Push (to heplify-server) | /usr/local/bin/heplify |
N/A (CLI flags) |
| asterisk_exporter | custom | 9101 | Pull (Prometheus scrapes) | /opt/asterisk_exporter/asterisk_exporter.py |
N/A (env vars) |
5. The Install Script
This script is designed to run from the monitoring server, deploying all four agents to a remote VoIP server over SSH in a single pass. It handles three OS families (openSUSE/SUSE, CentOS/RHEL, Ubuntu/Debian), is idempotent (safe to re-run), and verifies each binary before downloading.
Usage
./install-agents.sh <server_ip> <ssh_port> <server_label> <monitor_vps_ip>
Parameters:
| Parameter | Description | Example |
|---|---|---|
server_ip |
IP address of the target VoIP server | 203.0.113.10 |
ssh_port |
SSH port on the target server | 22 or 9322 |
server_label |
Human-readable name used in metric labels | alpha, bravo, charlie |
monitor_vps_ip |
IP of the central monitoring server (where Loki and heplify-server run) | YOUR_MONITORING_SERVER |
Example invocation:
bash install-agents.sh 203.0.113.10 9322 alpha YOUR_MONITORING_SERVER
Complete Script
#!/bin/bash
# install-agents.sh -- Install monitoring agents on a ViciDial/Asterisk server
# Usage: ./install-agents.sh <server_ip> <ssh_port> <server_label> <monitor_vps_ip>
# Supports: openSUSE, CentOS 7, Ubuntu/Debian
set -e
SERVER_IP="${1:?Usage: $0 <server_ip> <ssh_port> <server_label> <monitor_vps_ip>}"
SSH_PORT="${2:-22}"
SERVER_LABEL="${3:?Provide server label (alpha/bravo/charlie/delta)}"
MONITOR_IP="${4:?Provide monitoring VPS IP}"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
echo "=== Installing monitoring agents on ${SERVER_LABEL} (${SERVER_IP}:${SSH_PORT}) ==="
echo "Monitor VPS: ${MONITOR_IP}"
echo ""
SSH_CMD="ssh -o StrictHostKeyChecking=no -p ${SSH_PORT} root@${SERVER_IP}"
# --------------------------------------------------------------------------
# 1. heplify (SIP capture)
# --------------------------------------------------------------------------
echo "[1/4] Installing heplify..."
${SSH_CMD} bash << REMOTEOF
set -e
if [ ! -f /usr/local/bin/heplify ] || ! /usr/local/bin/heplify -version 2>/dev/null | grep -q heplify; then
rm -f /usr/local/bin/heplify
curl -sL https://github.com/sipcapture/heplify/releases/download/v1.67.1/heplify \
-o /usr/local/bin/heplify
chmod +x /usr/local/bin/heplify
echo " heplify binary installed"
else
echo " heplify already installed"
fi
cat > /etc/systemd/system/heplify.service << SVCFILE
[Unit]
Description=heplify SIP Capture Agent
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/heplify -hs ${MONITOR_IP}:9060 -i any -dim "OPTIONS,NOTIFY" -e
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
SVCFILE
systemctl daemon-reload
systemctl enable heplify
systemctl restart heplify
echo " heplify service started"
REMOTEOF
# --------------------------------------------------------------------------
# 2. node_exporter (system metrics)
# --------------------------------------------------------------------------
echo "[2/4] Installing node_exporter..."
${SSH_CMD} bash << 'REMOTEOF'
set -e
if [ ! -f /usr/local/bin/node_exporter ]; then
cd /tmp
curl -sL https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz \
| tar xz
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-1.7.0.linux-amd64*
echo " node_exporter binary installed"
else
echo " node_exporter already installed"
fi
cat > /etc/systemd/system/node_exporter.service << 'SVCFILE'
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
SVCFILE
systemctl daemon-reload
systemctl enable node_exporter
systemctl restart node_exporter
echo " node_exporter service started"
REMOTEOF
# --------------------------------------------------------------------------
# 3. promtail (log shipping)
# --------------------------------------------------------------------------
echo "[3/4] Installing promtail..."
${SSH_CMD} bash << REMOTEOF
set -e
if [ ! -f /usr/local/bin/promtail ]; then
cd /tmp
curl -sL https://github.com/grafana/loki/releases/download/v2.9.6/promtail-linux-amd64.zip \
-o promtail.zip
# Install unzip -- works across all OS families
if command -v apt-get &>/dev/null; then
apt-get install -y unzip 2>/dev/null || true
elif command -v zypper &>/dev/null; then
zypper install -y unzip 2>/dev/null || true
elif command -v yum &>/dev/null; then
yum install -y unzip 2>/dev/null || true
fi
unzip -o promtail.zip
mv promtail-linux-amd64 /usr/local/bin/promtail
chmod +x /usr/local/bin/promtail
rm -f promtail.zip
echo " promtail binary installed"
else
echo " promtail already installed"
fi
mkdir -p /etc/promtail
cat > /etc/promtail/config.yml << CFGFILE
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: http://${MONITOR_IP}:3100/loki/api/v1/push
scrape_configs:
- job_name: asterisk_messages
static_configs:
- targets: [localhost]
labels:
job: asterisk
server: ${SERVER_LABEL}
logtype: messages
__path__: /var/log/asterisk/messages
- job_name: asterisk_full
static_configs:
- targets: [localhost]
labels:
job: asterisk
server: ${SERVER_LABEL}
logtype: full
__path__: /var/log/asterisk/full
- job_name: vicidial
static_configs:
- targets: [localhost]
labels:
job: vicidial
server: ${SERVER_LABEL}
logtype: vicidial
__path__: /var/log/astguiclient/*.log
- job_name: syslog
static_configs:
- targets: [localhost]
labels:
job: syslog
server: ${SERVER_LABEL}
logtype: syslog
__path__: /var/log/messages
CFGFILE
mkdir -p /var/lib/promtail
cat > /etc/systemd/system/promtail.service << 'SVCFILE'
[Unit]
Description=Promtail Log Agent
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
SVCFILE
systemctl daemon-reload
systemctl enable promtail
systemctl restart promtail
echo " promtail service started"
REMOTEOF
# --------------------------------------------------------------------------
# 4. asterisk_exporter (VoIP metrics)
# --------------------------------------------------------------------------
echo "[4/4] Installing asterisk_exporter..."
${SSH_CMD} bash << REMOTEOF
set -e
mkdir -p /opt/asterisk_exporter
REMOTEOF
# Copy the exporter script from the local scripts directory
scp -o StrictHostKeyChecking=no -P ${SSH_PORT} \
${SCRIPT_DIR}/asterisk_exporter.py \
root@${SERVER_IP}:/opt/asterisk_exporter/asterisk_exporter.py
# Install Python dependencies and create systemd service
${SSH_CMD} bash << REMOTEOF
set -e
# Find the right python3 binary -- try versioned names first
PYTHON_BIN=""
for p in python3.11 python3.6 python3; do
if command -v \$p &>/dev/null; then
PYTHON_BIN=\$(command -v \$p)
break
fi
done
if [ -z "\$PYTHON_BIN" ]; then
# CentOS 7: install python3 via yum
if command -v yum &>/dev/null; then
yum install -y python3 python3-pip 2>/dev/null || true
PYTHON_BIN=\$(command -v python3)
fi
fi
echo " Using Python: \$PYTHON_BIN"
# Install mysql-connector (try latest first, fall back to <8.1 for old Python)
\$PYTHON_BIN -m pip install mysql-connector-python 2>/dev/null \
|| \$PYTHON_BIN -m pip install "mysql-connector-python<8.1" 2>/dev/null \
|| true
# Verify the import works
\$PYTHON_BIN -c "import mysql.connector; print(' mysql-connector OK')" \
|| echo " WARNING: mysql-connector import failed"
chmod +x /opt/asterisk_exporter/asterisk_exporter.py
cat > /etc/systemd/system/asterisk_exporter.service << SVCFILE
[Unit]
Description=Asterisk/ViciDial Prometheus Exporter
After=network.target mariadb.service asterisk.service
Wants=mariadb.service
[Service]
Type=simple
ExecStart=\$PYTHON_BIN /opt/asterisk_exporter/asterisk_exporter.py
Restart=always
RestartSec=10
Environment=EXPORTER_PORT=9101
Environment=MYSQL_HOST=localhost
Environment=MYSQL_USER=cron
Environment=MYSQL_PASS=1234
Environment=MYSQL_DB=asterisk
Environment=SERVER_LABEL=${SERVER_LABEL}
[Install]
WantedBy=multi-user.target
SVCFILE
systemctl daemon-reload
systemctl enable asterisk_exporter
systemctl restart asterisk_exporter
echo " asterisk_exporter service started"
REMOTEOF
echo ""
echo "=== All 4 agents installed on ${SERVER_LABEL} (${SERVER_IP}) ==="
echo " heplify -> sending HEP to ${MONITOR_IP}:9060"
echo " node_exporter -> :9100"
echo " promtail -> shipping logs to ${MONITOR_IP}:3100"
echo " ast_exporter -> :9101"
echo ""
How the script works
SSH-based deployment: The script runs entirely from the monitoring server. Each section opens an SSH session, streams a heredoc of shell commands to execute on the remote host, then closes the connection.
Idempotent: Before downloading any binary, it checks whether the file already exists at the expected path. Re-running the script on a server that already has agents installed will simply restart the services with the latest configuration.
OS-agnostic package installation: When
unzipis needed for promtail, the script detects the package manager (apt-get,zypper, oryum) and installs accordingly.Python version detection: For the asterisk_exporter, the script tries
python3.11,python3.6, andpython3in order, covering openSUSE (which ships 3.11), CentOS 7 (which uses 3.6), and Ubuntu/Debian (which usepython3).scp for the exporter: The asterisk_exporter is a custom Python script, so it gets copied from the monitoring server's
scripts/directory viascprather than downloaded from a release URL.
6. Agent 1 -- node_exporter
What it does
node_exporter is the standard Prometheus exporter for hardware and OS-level metrics. It exposes approximately 277 metrics covering CPU, memory, disk, network, filesystem, and load average data.
Binary installation
cd /tmp
curl -sL https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz \
| tar xz
cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
rm -rf node_exporter-1.7.0.linux-amd64*
The tarball contains a single static binary (~20 MB). No dependencies, no runtime, no configuration file. It runs on any Linux x86_64 system regardless of distribution.
systemd service file
Path: /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter --web.listen-address=:9100
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Key settings:
--web.listen-address=:9100-- Listens on all interfaces, port 9100. Change to127.0.0.1:9100if you want localhost-only access (not useful for remote Prometheus scraping).Restart=always+RestartSec=10-- Automatically restarts on crash after a 10-second delay.
Optional: Enabling/disabling specific collectors
By default, node_exporter enables a broad set of collectors. For VoIP servers with heavy I/O, you may want to add flags to include or exclude specific collectors:
# Include only the collectors you care about:
ExecStart=/usr/local/bin/node_exporter \
--web.listen-address=:9100 \
--collector.cpu \
--collector.meminfo \
--collector.diskstats \
--collector.filesystem \
--collector.loadavg \
--collector.netdev \
--collector.stat \
--collector.time \
--collector.uname \
--no-collector.wifi \
--no-collector.infiniband \
--no-collector.nfs \
--no-collector.nfsd
For most VoIP deployments, the default set is fine.
Key metrics for VoIP servers
| Metric | What to watch |
|---|---|
node_cpu_seconds_total |
High system or iowait indicates Asterisk is under load |
node_memory_MemAvailable_bytes |
Asterisk leaks memory slowly; watch for steady decline |
node_filesystem_avail_bytes |
Recordings fill disks; alert at 80% |
node_load1 / node_load5 |
Should stay below CPU count during peak hours |
node_network_receive_bytes_total |
Baseline for detecting DDoS or SIP floods |
node_disk_io_time_seconds_total |
High I/O wait degrades call recording quality |
Enabling and starting
systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter
Quick test
curl -s http://localhost:9100/metrics | head -20
You should see lines like:
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 1.234567e+06
node_cpu_seconds_total{cpu="0",mode="system"} 12345.67
...
7. Agent 2 -- promtail
What it does
promtail is the log-shipping agent for Grafana Loki. It tails log files on the VoIP server, attaches labels (server name, log type), and pushes the entries to a central Loki instance. This enables centralized log search across all servers from Grafana.
Binary installation
cd /tmp
curl -sL https://github.com/grafana/loki/releases/download/v2.9.6/promtail-linux-amd64.zip \
-o promtail.zip
unzip -o promtail.zip
mv promtail-linux-amd64 /usr/local/bin/promtail
chmod +x /usr/local/bin/promtail
rm -f promtail.zip
Unlike node_exporter, promtail is distributed as a zip file containing a single binary.
Configuration file
Path: /etc/promtail/config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: http://YOUR_MONITORING_SERVER:3100/loki/api/v1/push
scrape_configs:
- job_name: asterisk_messages
static_configs:
- targets: [localhost]
labels:
job: asterisk
server: YOUR_SERVER_LABEL
logtype: messages
__path__: /var/log/asterisk/messages
- job_name: asterisk_full
static_configs:
- targets: [localhost]
labels:
job: asterisk
server: YOUR_SERVER_LABEL
logtype: full
__path__: /var/log/asterisk/full
- job_name: vicidial
static_configs:
- targets: [localhost]
labels:
job: vicidial
server: YOUR_SERVER_LABEL
logtype: vicidial
__path__: /var/log/astguiclient/*.log
- job_name: syslog
static_configs:
- targets: [localhost]
labels:
job: syslog
server: YOUR_SERVER_LABEL
logtype: syslog
__path__: /var/log/messages
Configuration breakdown
server block:
http_listen_port: 9080-- promtail exposes a status API here. You can visithttp://localhost:9080/targetsto see which log files are being tailed and their current read position.grpc_listen_port: 0-- Disables the gRPC server (not needed for push mode).
positions block:
filename: /var/lib/promtail/positions.yaml-- Tracks the byte offset of each tailed file. This ensures promtail resumes from where it left off after a restart, without re-sending old log lines. The directory must exist (the install script creates it).
clients block:
url-- The Loki push endpoint. ReplaceYOUR_MONITORING_SERVERwith the IP of your central monitoring server.
scrape_configs block -- the four log sources:
| Job Name | Path | What It Captures |
|---|---|---|
asterisk_messages |
/var/log/asterisk/messages |
Asterisk NOTICE/WARNING/ERROR messages (SIP registration failures, channel errors, peer unreachable) |
asterisk_full |
/var/log/asterisk/full |
Full verbose Asterisk log (every dialplan step, every SIP message -- high volume) |
vicidial |
/var/log/astguiclient/*.log |
ViciDial application logs (agent login/logout, call routing, list loading, campaign actions) |
syslog |
/var/log/messages |
System syslog (kernel, cron, auth, services) |
Labels explained:
job-- Groups related logs. Useasteriskfor Asterisk logs,vicidialfor application logs,syslogfor system logs.server-- The server label (e.g.,alpha,bravo). This is the most important label for filtering in Grafana.logtype-- Distinguishes between different log files from the same server.__path__-- Special label telling promtail which file to tail. Supports globs (*.log).
Important: The positions file
The positions file (/var/lib/promtail/positions.yaml) is critical. It looks like this after promtail has been running:
positions:
/var/log/asterisk/messages: "4521789"
/var/log/asterisk/full: "98234567"
/var/log/astguiclient/VDadapt.log: "12345"
/var/log/messages: "67890123"
Each number is the byte offset where promtail last read. If this file is deleted, promtail will re-read and re-send all log data from the beginning of each file. On a busy server, this can mean pushing gigabytes of old logs to Loki. If you need to reset positions, do so deliberately and during a low-traffic period.
systemd service file
Path: /etc/systemd/system/promtail.service
[Unit]
Description=Promtail Log Agent
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yml
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Querying logs in Grafana (Loki)
Once promtail is shipping logs, query them in Grafana using LogQL:
# All Asterisk errors on server alpha
{server="alpha", job="asterisk"} |= "ERROR"
# SIP registration failures across all servers
{job="asterisk", logtype="messages"} |= "Registration" |= "failed"
# ViciDial agent login events on a specific server
{server="charlie", job="vicidial"} |= "LOGIN"
# High-volume: all full Asterisk logs (use with time range filter)
{server="delta", logtype="full"}
Note on syslog path
On Debian/Ubuntu, the system log is /var/log/syslog rather than /var/log/messages. If deploying to Debian/Ubuntu, update the syslog scrape config:
- job_name: syslog
static_configs:
- targets: [localhost]
labels:
job: syslog
server: YOUR_SERVER_LABEL
logtype: syslog
__path__: /var/log/syslog
The install script as written uses /var/log/messages, which works on openSUSE, CentOS, and RHEL. On Debian/Ubuntu, if /var/log/messages does not exist, promtail will log a warning but continue running -- it simply will not ship syslog data. Adjust the path if needed.
8. Agent 3 -- heplify
What it does
heplify is a SIP packet capture agent. It sniffs network traffic, extracts SIP messages (INVITE, ACK, BYE, REGISTER, etc.), and sends them to a Homer server using the HEP3 (Homer Encapsulation Protocol) format over UDP.
This gives you full SIP call flow visualization: you can search for any call by phone number, SIP Call-ID, or time range and see every SIP message in the transaction, including response codes, SDP negotiation, and timing.
Binary installation
curl -sL https://github.com/sipcapture/heplify/releases/download/v1.67.1/heplify \
-o /usr/local/bin/heplify
chmod +x /usr/local/bin/heplify
Single static binary, no dependencies.
systemd service file
Path: /etc/systemd/system/heplify.service
[Unit]
Description=heplify SIP Capture Agent
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/heplify -hs YOUR_MONITORING_SERVER:9060 -i any -dim "OPTIONS,NOTIFY" -e
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Command-line flags explained
| Flag | Value | Purpose |
|---|---|---|
-hs |
YOUR_MONITORING_SERVER:9060 |
Homer server address and port (HEP receiver) |
-i |
any |
Capture on all network interfaces |
-dim |
"OPTIONS,NOTIFY" |
Discard In Method -- drop these SIP methods before sending. OPTIONS keepalives and NOTIFY events are extremely high-volume and rarely useful for troubleshooting |
-e |
(flag) | Send via UDP (the default and recommended transport for HEP) |
Why filter OPTIONS and NOTIFY
On a typical VoIP server with 30 SIP peers, each sending OPTIONS keepalives every 60 seconds, that is 30 messages/minute -- 43,200 per day -- of pure noise. NOTIFY messages (for MWI, dialog-info, etc.) add another significant volume. Filtering these at the capture level reduces:
- Network traffic between the VoIP server and the Homer server
- Storage consumption in the Homer PostgreSQL database
- Query time when searching for actual call flows
The important SIP methods (INVITE, BYE, REGISTER, ACK, CANCEL, REFER, UPDATE) are all captured and forwarded.
Additional useful flags
# Capture only on a specific interface (useful if the server has multiple NICs)
-i eth0
# Filter to a specific port (only capture SIP on port 5060)
-p "port 5060"
# Also capture RTCP (RTP Control Protocol) for quality metrics
-pr 5060-5062
# Set a custom HEP node ID (useful when multiple servers send to the same Homer)
-hn 100
# Enable TLS for HEP transport (if your Homer server requires it)
-hs YOUR_MONITORING_SERVER:9060 -ht tls
Verifying heplify is capturing
# Check the service is running
systemctl status heplify
# Check recent logs for capture stats
journalctl -u heplify --no-pager -n 20
You should see output like:
heplify: sending to YOUR_MONITORING_SERVER:9060
heplify: captured 47 packets, sent 47
On the Homer server side, verify data is arriving:
# Check heplify-server is receiving packets
docker logs --tail 20 heplify-server
9. Agent 4 -- asterisk_exporter
What it does
The asterisk_exporter is a custom Python script that queries Asterisk CLI commands and ViciDial's MySQL database to expose VoIP-specific metrics as a Prometheus endpoint on port 9101. This fills the gap between generic system metrics (node_exporter) and what you actually need to monitor in a call center: active calls, agent states, SIP peer health, RTP quality, queue depth, and more.
A full walkthrough of the exporter's internals is available in Tutorial 08 -- Building a Custom Asterisk Prometheus Exporter. This section covers only what you need for deployment.
File location
/opt/asterisk_exporter/
asterisk_exporter.py # The exporter script (copied from monitoring server)
Dependencies
- Python 3 (3.6+ on CentOS 7, 3.11+ on openSUSE, 3.8+ on Ubuntu/Debian)
- mysql-connector-python (pip package)
- Asterisk (the exporter calls
asterisk -rxto run CLI commands) - MySQL/MariaDB (for ViciDial metrics)
systemd service file
Path: /etc/systemd/system/asterisk_exporter.service
[Unit]
Description=Asterisk/ViciDial Prometheus Exporter
After=network.target mariadb.service asterisk.service
Wants=mariadb.service
[Service]
Type=simple
ExecStart=/usr/bin/python3 /opt/asterisk_exporter/asterisk_exporter.py
Restart=always
RestartSec=10
Environment=EXPORTER_PORT=9101
Environment=MYSQL_HOST=localhost
Environment=MYSQL_USER=cron
Environment=MYSQL_PASS=1234
Environment=MYSQL_DB=asterisk
Environment=SERVER_LABEL=YOUR_SERVER_LABEL
[Install]
WantedBy=multi-user.target
Environment variables
| Variable | Default | Purpose |
|---|---|---|
EXPORTER_PORT |
9101 |
Port to listen on |
MYSQL_HOST |
localhost |
MySQL server address |
MYSQL_USER |
cron |
MySQL user (needs SELECT on vicidial tables) |
MYSQL_PASS |
1234 |
MySQL password |
MYSQL_DB |
asterisk |
Database name |
SERVER_LABEL |
alpha |
Label attached to all metrics |
Important: The MySQL user only needs SELECT privileges. Use an existing read-only user or create one:
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'YOUR_PASSWORD';
GRANT SELECT ON asterisk.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;
Exposed metrics
The exporter exposes approximately 25 metric families:
| Metric | Type | Description |
|---|---|---|
asterisk_active_calls |
gauge | Current active calls |
asterisk_active_channels |
gauge | Current active channels |
asterisk_sip_peer_up |
gauge | SIP peer reachability (1=up, 0=down) |
asterisk_sip_peer_latency_ms |
gauge | SIP peer qualify latency in milliseconds |
asterisk_sip_peer_status |
gauge | SIP peer status string (OK, Lagged, UNREACHABLE) |
asterisk_agents_logged_in |
gauge | Number of agents logged in |
asterisk_agents_incall |
gauge | Number of agents currently in a call |
asterisk_agents_paused |
gauge | Number of agents in pause state |
asterisk_agents_waiting |
gauge | Number of agents ready/waiting |
asterisk_agent_status |
gauge | Per-agent status (INCALL, PAUSED, READY, CLOSER) |
asterisk_agent_incall_duration_seconds |
gauge | How long the current agent has been in-call |
asterisk_agent_pause_duration_seconds |
gauge | How long the current agent has been paused |
asterisk_queue_depth |
gauge | Calls waiting in queue per inbound group |
asterisk_rtp_packet_loss_percent |
gauge | RTP packet loss percentage per channel |
asterisk_rtp_jitter_ms |
gauge | RTP jitter in milliseconds per channel |
asterisk_rtp_rtt_ms |
gauge | RTP round-trip time per channel |
asterisk_uptime_seconds |
gauge | Asterisk system uptime |
asterisk_confbridge_count |
gauge | Active ConfBridge/MeetMe conferences |
asterisk_channels_by_codec |
gauge | Channel count per codec (alaw, ulaw, g722, etc.) |
asterisk_transcoding_channels |
gauge | Channels actively transcoding between codecs |
asterisk_fail2ban_active_bans |
gauge | Current fail2ban active bans per jail |
asterisk_fail2ban_bans_total |
counter | Total fail2ban bans per jail |
asterisk_recordings_missing |
gauge | CDR entries from the last hour without matching recordings |
How it collects data
The exporter uses two data sources:
1. Asterisk CLI (via asterisk -rx):
asterisk -rx "sip show peers" # SIP peer status and latency
asterisk -rx "core show channels" # Active call/channel counts
asterisk -rx "sip show channelstats" # RTP quality (loss, jitter, RTT)
asterisk -rx "core show uptime seconds" # Asterisk uptime
asterisk -rx "confbridge list" # Active conferences
asterisk -rx "core show channel SIP/..." # Per-channel codec/transcoding info
2. ViciDial MySQL (via mysql-connector-python):
-- Agent states
SELECT status, COUNT(*) FROM vicidial_live_agents GROUP BY status;
-- Per-agent detail
SELECT user, status, pause_code,
TIMESTAMPDIFF(SECOND, last_state_change, NOW()) as state_duration
FROM vicidial_live_agents;
-- Queue depth
SELECT campaign_id, COUNT(*) FROM vicidial_auto_calls
WHERE status = 'LIVE' GROUP BY campaign_id;
-- Missing recordings
SELECT COUNT(*) FROM vicidial_closer_log cl
LEFT JOIN recording_log rl ON ...
WHERE cl.call_date >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
AND cl.length_in_sec > 10 AND rl.recording_id IS NULL;
Quick test
curl -s http://localhost:9101/metrics | grep asterisk_active
Expected output:
# HELP asterisk_active_calls Number of active calls
# TYPE asterisk_active_calls gauge
asterisk_active_calls{server="alpha"} 12
# HELP asterisk_active_channels Number of active channels
# TYPE asterisk_active_channels gauge
asterisk_active_channels{server="alpha"} 24
10. Prometheus Scrape Configuration
On the monitoring server, add scrape targets for each VoIP server in your Prometheus configuration.
Path: prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
# --- Prometheus self-monitoring ---
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
# --- Node Exporter (system metrics per server) ---
- job_name: "node"
static_configs:
- targets: ["SERVER_A_IP:9100"]
labels:
server: "alpha"
- targets: ["SERVER_B_IP:9100"]
labels:
server: "bravo"
- targets: ["SERVER_C_IP:9100"]
labels:
server: "charlie"
- targets: ["SERVER_D_IP:9100"]
labels:
server: "delta"
# --- Asterisk Exporter (VoIP metrics per server) ---
- job_name: "asterisk"
scrape_interval: 15s
static_configs:
- targets: ["SERVER_A_IP:9101"]
labels:
server: "alpha"
- targets: ["SERVER_B_IP:9101"]
labels:
server: "bravo"
- targets: ["SERVER_C_IP:9101"]
labels:
server: "charlie"
- targets: ["SERVER_D_IP:9101"]
labels:
server: "delta"
# --- heplify-server metrics (on the monitoring server itself) ---
- job_name: "heplify-server"
static_configs:
- targets: ["heplify-server:9096"]
Key points
job_name: "node"groups all node_exporter targets. Theserverlabel differentiates them.job_name: "asterisk"groups all asterisk_exporter targets. Theserverlabel matches theSERVER_LABELused during agent installation.- Consistent labels: The
serverlabel used in Prometheus must match theserverlabel used in promtail config and theSERVER_LABELenvironment variable in the asterisk_exporter. This allows you to correlate metrics, logs, and SIP traces for the same server. - Scrape interval: 15 seconds is a good default. The asterisk_exporter collects data on each scrape, so shorter intervals increase load on Asterisk and MySQL.
Adding a new server
When you deploy agents to a new server:
- Run
install-agents.shwith the new server's IP, SSH port, and label. - Add two new entries to
prometheus.yml(one undernode, one underasterisk). - Reload Prometheus:
# If running in Docker: docker exec prometheus kill -HUP 1 # Or via API (if --web.enable-lifecycle is set): curl -X POST http://localhost:9090/-/reload
Verifying targets in Prometheus
Open http://YOUR_MONITORING_SERVER:9090/targets in a browser. You should see all configured targets with their status (UP or DOWN), last scrape time, and scrape duration.
11. Verification Procedures
After running install-agents.sh, verify each agent is working correctly.
From the VoIP server (SSH in and check locally)
# --- 1. node_exporter ---
systemctl status node_exporter
# Should show: active (running)
curl -s http://localhost:9100/metrics | head -5
# Should show Prometheus metric lines
# --- 2. promtail ---
systemctl status promtail
# Should show: active (running)
curl -s http://localhost:9080/targets
# Should show each scrape target and its status
curl -s http://localhost:9080/ready
# Should return: "ready"
# --- 3. heplify ---
systemctl status heplify
# Should show: active (running)
journalctl -u heplify --no-pager -n 10
# Should show capture statistics (packets captured/sent)
# --- 4. asterisk_exporter ---
systemctl status asterisk_exporter
# Should show: active (running)
curl -s http://localhost:9101/metrics | grep "asterisk_active_calls"
# Should show the current call count
From the monitoring server (check data is arriving)
# --- Prometheus targets (check all UP) ---
curl -s http://localhost:9090/api/v1/targets | python3 -m json.tool | grep -A2 '"health"'
# --- Query node_exporter data ---
curl -s --data-urlencode 'query=up{job="node"}' http://localhost:9090/api/v1/query \
| python3 -m json.tool
# --- Query asterisk_exporter data ---
curl -s --data-urlencode 'query=asterisk_active_calls' http://localhost:9090/api/v1/query \
| python3 -m json.tool
# --- Check Loki is receiving logs ---
curl -s -G --data-urlencode 'query={server="alpha"}' \
--data-urlencode 'limit=5' \
http://localhost:3100/loki/api/v1/query_range
# --- Check Homer is receiving SIP ---
# Look at heplify-server logs for packet counts
docker logs --tail 10 heplify-server
Verification checklist
| Check | Command | Expected |
|---|---|---|
| node_exporter running | systemctl is-active node_exporter |
active |
| node_exporter responding | curl -s localhost:9100/metrics | wc -l |
> 200 |
| promtail running | systemctl is-active promtail |
active |
| promtail tailing files | curl -s localhost:9080/targets |
All targets "RUNNING" |
| heplify running | systemctl is-active heplify |
active |
| heplify capturing | journalctl -u heplify -n 5 |
Shows packet counts |
| asterisk_exporter running | systemctl is-active asterisk_exporter |
active |
| asterisk_exporter responding | curl -s localhost:9101/metrics | wc -l |
> 30 |
| Prometheus scraping node | Prometheus UI /targets |
node job shows UP |
| Prometheus scraping asterisk | Prometheus UI /targets |
asterisk job shows UP |
12. Firewall Rules
On each VoIP server (allow monitoring server to scrape)
The monitoring server needs TCP access to ports 9100 and 9101 on each VoIP server for Prometheus scraping:
# Allow the monitoring server to scrape exporters
iptables -I INPUT -s YOUR_MONITORING_SERVER -p tcp --dport 9100 -j ACCEPT
iptables -I INPUT -s YOUR_MONITORING_SERVER -p tcp --dport 9101 -j ACCEPT
# Persist rules (distribution-dependent)
# CentOS/RHEL:
iptables-save > /etc/sysconfig/iptables
# openSUSE:
iptables-save > /etc/sysconfig/iptables
# Ubuntu/Debian:
iptables-save > /etc/iptables/rules.v4
On the monitoring server (allow agents to push data)
Each VoIP server needs to reach the monitoring server on ports 3100 (Loki) and 9060 (Homer/heplify-server):
# Allow each VoIP server to push logs and SIP data
for SERVER_IP in SERVER_A_IP SERVER_B_IP SERVER_C_IP SERVER_D_IP; do
iptables -I INPUT -s ${SERVER_IP} -p tcp --dport 3100 -j ACCEPT
iptables -I INPUT -s ${SERVER_IP} -p udp --dport 9060 -j ACCEPT
iptables -I INPUT -s ${SERVER_IP} -p tcp --dport 9060 -j ACCEPT
done
# Persist
iptables-save > /etc/iptables/rules.v4
If you are running the monitoring stack in Docker, you may need to add rules to the DOCKER-USER chain instead of INPUT:
iptables -I DOCKER-USER -s SERVER_A_IP -p tcp --dport 3100 -j ACCEPT
iptables -I DOCKER-USER -s SERVER_A_IP -p udp --dport 9060 -j ACCEPT
Port summary
| Port | Protocol | Direction | Service |
|---|---|---|---|
| 9100 | TCP | Monitoring -> VoIP | node_exporter |
| 9101 | TCP | Monitoring -> VoIP | asterisk_exporter |
| 3100 | TCP | VoIP -> Monitoring | Loki (promtail push) |
| 9060 | UDP+TCP | VoIP -> Monitoring | heplify-server (HEP) |
13. Handling OS Differences
The install script supports three Linux families. Here are the differences that matter and how the script handles them.
Package managers
| OS Family | Package Manager | Used For |
|---|---|---|
| openSUSE/SLES | zypper |
Installing unzip |
| CentOS/RHEL | yum |
Installing unzip, python3, python3-pip |
| Ubuntu/Debian | apt-get |
Installing unzip |
The script detects the package manager by checking which command exists:
if command -v apt-get &>/dev/null; then
apt-get install -y unzip
elif command -v zypper &>/dev/null; then
zypper install -y unzip
elif command -v yum &>/dev/null; then
yum install -y unzip
fi
Python versions
| OS | Default Python 3 | Binary Name |
|---|---|---|
| openSUSE 15.x | 3.6 + 3.11 available | python3.11 preferred |
| CentOS 7 | Not installed by default | python3 (3.6) after yum install python3 |
| Ubuntu 22.04+ | 3.10+ | python3 |
| Debian 12 | 3.11 | python3 |
The script tries versioned names first (python3.11, python3.6) before falling back to python3. On CentOS 7 where Python 3 is not installed, it runs yum install -y python3 python3-pip.
mysql-connector-python compatibility
The mysql-connector-python package version 8.1+ requires Python 3.8+. On CentOS 7 (Python 3.6), the install script falls back:
pip install mysql-connector-python 2>/dev/null \
|| pip install "mysql-connector-python<8.1" 2>/dev/null \
|| true
Syslog path
| OS | Syslog Path |
|---|---|
| openSUSE, CentOS, RHEL | /var/log/messages |
| Ubuntu, Debian | /var/log/syslog |
The default promtail config uses /var/log/messages. For Ubuntu/Debian deployments, update the config or add both paths.
Firewall persistence
| OS | Save Command | Restore Mechanism |
|---|---|---|
| openSUSE | iptables-save > /etc/sysconfig/iptables |
SuSEfirewall2 or manual |
| CentOS 7 | iptables-save > /etc/sysconfig/iptables |
iptables-restore in init |
| Ubuntu/Debian | iptables-save > /etc/iptables/rules.v4 |
iptables-persistent package |
14. Updating Agents
Updating node_exporter
# On the VoIP server:
NEW_VERSION="1.8.0" # Check https://github.com/prometheus/node_exporter/releases
systemctl stop node_exporter
cd /tmp
curl -sL "https://github.com/prometheus/node_exporter/releases/download/v${NEW_VERSION}/node_exporter-${NEW_VERSION}.linux-amd64.tar.gz" \
| tar xz
cp "node_exporter-${NEW_VERSION}.linux-amd64/node_exporter" /usr/local/bin/
rm -rf "node_exporter-${NEW_VERSION}.linux-amd64"*
systemctl start node_exporter
# Verify
node_exporter --version
Updating promtail
NEW_VERSION="3.0.0" # Check https://github.com/grafana/loki/releases
systemctl stop promtail
cd /tmp
curl -sL "https://github.com/grafana/loki/releases/download/v${NEW_VERSION}/promtail-linux-amd64.zip" \
-o promtail.zip
unzip -o promtail.zip
mv promtail-linux-amd64 /usr/local/bin/promtail
chmod +x /usr/local/bin/promtail
rm -f promtail.zip
systemctl start promtail
# Verify -- promtail will pick up from the last position
promtail --version
Important: When updating promtail, do not delete /var/lib/promtail/positions.yaml. The new version will resume from the last recorded position.
Updating heplify
NEW_VERSION="1.68.0" # Check https://github.com/sipcapture/heplify/releases
systemctl stop heplify
curl -sL "https://github.com/sipcapture/heplify/releases/download/v${NEW_VERSION}/heplify" \
-o /usr/local/bin/heplify
chmod +x /usr/local/bin/heplify
systemctl start heplify
Updating asterisk_exporter
The asterisk_exporter is a custom script, so updates are done by copying the new version from the monitoring server:
# From the monitoring server:
scp -P SSH_PORT /opt/monitoring/scripts/asterisk_exporter.py \
root@SERVER_IP:/opt/asterisk_exporter/asterisk_exporter.py
ssh -p SSH_PORT root@SERVER_IP "systemctl restart asterisk_exporter"
Bulk updates across all servers
For updating agents across multiple servers, wrap the update commands in a loop:
#!/bin/bash
# update-node-exporter.sh -- Update node_exporter on all servers
VERSION="1.8.0"
SERVERS=("SERVER_A_IP:9322:alpha" "SERVER_B_IP:9322:bravo" "SERVER_C_IP:9322:charlie")
for entry in "${SERVERS[@]}"; do
IFS=: read -r ip port label <<< "$entry"
echo "=== Updating ${label} (${ip}) ==="
ssh -p ${port} root@${ip} bash << EOF
systemctl stop node_exporter
cd /tmp
curl -sL "https://github.com/prometheus/node_exporter/releases/download/v${VERSION}/node_exporter-${VERSION}.linux-amd64.tar.gz" | tar xz
cp "node_exporter-${VERSION}.linux-amd64/node_exporter" /usr/local/bin/
rm -rf "node_exporter-${VERSION}.linux-amd64"*
systemctl start node_exporter
echo " Updated to ${VERSION}"
EOF
done
15. Troubleshooting
Agent will not start
Symptom: systemctl status <agent> shows failed or inactive (dead).
# Check the full error
journalctl -u node_exporter --no-pager -n 30
journalctl -u promtail --no-pager -n 30
journalctl -u heplify --no-pager -n 30
journalctl -u asterisk_exporter --no-pager -n 30
Common causes:
| Agent | Error | Fix |
|---|---|---|
| node_exporter | address already in use :9100 |
Another process is using port 9100. Check with ss -tlnp | grep 9100 |
| promtail | permission denied on log file |
promtail runs as root by default, but check if log files have restrictive ACLs |
| promtail | error creating positions file |
Create the directory: mkdir -p /var/lib/promtail |
| heplify | permission denied: /dev/net/tun or pcap errors |
heplify needs root or CAP_NET_RAW. Ensure the service runs as root |
| asterisk_exporter | ModuleNotFoundError: No module named 'mysql' |
Reinstall: python3 -m pip install mysql-connector-python |
| asterisk_exporter | mysql.connector.errors.InterfaceError |
MySQL is not running, or credentials are wrong. Check MYSQL_USER/MYSQL_PASS in the service file |
No metrics appearing in Prometheus
Symptom: Prometheus target shows DOWN, or metrics exist but return no data.
Step 1: Check the exporter is responding locally on the VoIP server:
curl -s http://localhost:9100/metrics | head -5 # node_exporter
curl -s http://localhost:9101/metrics | head -5 # asterisk_exporter
If this fails, the agent is not running or is listening on a different port.
Step 2: Check network connectivity from the monitoring server:
# From the monitoring server:
curl -s --connect-timeout 5 http://SERVER_IP:9100/metrics | head -5
curl -s --connect-timeout 5 http://SERVER_IP:9101/metrics | head -5
If this fails but Step 1 succeeded, it is a firewall issue. See Section 12.
Step 3: Check Prometheus configuration:
# Verify the target is configured
grep -A3 "SERVER_IP" prometheus/prometheus.yml
# Reload Prometheus after config changes
curl -X POST http://localhost:9090/-/reload
Step 4: Check Prometheus targets page:
Open http://YOUR_MONITORING_SERVER:9090/targets and look for error messages next to the target.
Logs not appearing in Loki/Grafana
Symptom: Querying {server="alpha"} in Grafana/Loki returns no results.
Step 1: Verify promtail is tailing the right files:
curl -s http://localhost:9080/targets
Look for entries showing RUNNING status and non-zero last_target_len.
Step 2: Check if the log files exist and are being written to:
ls -la /var/log/asterisk/messages
ls -la /var/log/asterisk/full
ls -la /var/log/astguiclient/
tail -1 /var/log/asterisk/messages
If a file does not exist (common: some servers do not have /var/log/asterisk/full), promtail will log a warning and skip it -- this is normal.
Step 3: Check promtail can reach Loki:
# From the VoIP server, test connectivity to Loki
curl -s -o /dev/null -w "%{http_code}" http://YOUR_MONITORING_SERVER:3100/ready
# Should return: 200
If this fails, check firewall rules on the monitoring server for port 3100.
Step 4: Check promtail logs for push errors:
journalctl -u promtail --no-pager -n 50 | grep -i "error\|level=error\|429"
Common errors:
429 Too Many Requests-- Loki rate limits exceeded. Increaseingestion_rate_mbandingestion_burst_size_mbin Loki config.connection refused-- Loki is not running or firewall is blocking.
Step 5: Check the positions file:
cat /var/lib/promtail/positions.yaml
If positions are advancing (numbers increasing on subsequent checks), promtail is reading the files. If positions are static, the log files are not being written to.
Log shipping delays
Symptom: Logs appear in Grafana/Loki but with a delay of minutes or hours.
Possible causes:
Loki ingestion rate limits: If promtail is sending faster than Loki can accept, entries queue up. Check promtail logs for
429errors and increase Loki limits.Large log files on first run: When promtail first starts on a server with large existing log files, it reads from the beginning. This can cause a large backlog. Consider setting
positions.yamlto the end of each file before first start:# Skip to end of existing logs (only for first deployment) wc -c /var/log/asterisk/messages # Use that byte count in positions.yamlClock skew: If the VoIP server's clock is off by more than a few minutes, Loki may reject entries as "too old" or "too far in the future." Verify NTP is running:
timedatectl status # Or: ntpq -p
SIP data not appearing in Homer
Symptom: Homer search returns no results for a server that should be sending SIP data.
Step 1: Verify heplify is capturing packets:
journalctl -u heplify --no-pager -n 20
If you see captured 0 packets, heplify may be listening on the wrong interface. Try specifying the interface explicitly:
# Find the correct interface
ip addr show | grep "inet "
# Update the service to use it
# -i eth0 instead of -i any
Step 2: Verify SIP traffic exists on the server:
# Quick packet capture to confirm SIP is flowing
tcpdump -i any -c 10 port 5060 -n
If no SIP traffic is seen, the server may not have active SIP trunks or may use a non-standard SIP port.
Step 3: Test connectivity to heplify-server:
# UDP connectivity test (heplify sends HEP over UDP by default)
echo "test" | nc -u -w1 YOUR_MONITORING_SERVER 9060
Step 4: Check heplify-server logs on the monitoring server:
docker logs --tail 30 heplify-server
Look for incoming HEP packet counts or error messages.
asterisk_exporter shows partial metrics
Symptom: Some metrics (like asterisk_active_calls) work but others (like asterisk_agents_logged_in) are missing.
This usually means the MySQL connection is failing while Asterisk CLI commands succeed:
# Test MySQL connectivity with the same credentials
mysql -u cron -p1234 -e "SELECT COUNT(*) FROM asterisk.vicidial_live_agents;"
If this fails, check:
- The MySQL user exists and has the correct password
- The user has
SELECTprivileges on theasteriskdatabase - MariaDB/MySQL is running:
systemctl status mariadb
16. Summary
Deploying monitoring agents across distributed VoIP servers transforms your operational capabilities. Instead of reactive SSH-and-grep troubleshooting, you get a unified view of every server's health, every call's SIP flow, every agent's state, and every log entry -- all searchable from a single Grafana interface.
What you deployed
| Agent | Port | Data Destination | What It Provides |
|---|---|---|---|
| node_exporter | 9100 | Prometheus (pull) | CPU, RAM, disk, network, load |
| promtail | 9080 | Loki (push) | Asterisk logs, ViciDial logs, syslog |
| heplify | -- | heplify-server (push) | Complete SIP call flows |
| asterisk_exporter | 9101 | Prometheus (pull) | Active calls, agents, SIP peers, RTP, queues |
The install process in brief
- Run
install-agents.shfrom the monitoring server for each VoIP server. - Add scrape targets to
prometheus.ymlfor node_exporter and asterisk_exporter. - Open firewall ports: 9100/9101 inbound on VoIP servers, 3100/9060 inbound on the monitoring server.
- Verify: check Prometheus targets page, query Loki, search Homer.
Maintenance cadence
| Task | Frequency | How |
|---|---|---|
| Check agent health | Daily (or alert on it) | Prometheus up{} metric |
| Update node_exporter | Quarterly | Download new release, restart service |
| Update promtail | Quarterly (match Loki version) | Download new release, restart service |
| Update heplify | As needed | Download new release, restart service |
| Update asterisk_exporter | When you add metrics | scp new script, restart service |
| Review promtail positions | After server reboot | Verify positions file is intact |
This tutorial is part of a series on building production VoIP monitoring infrastructure. For the custom exporter internals, see Tutorial 08. For the central monitoring stack (Prometheus, Loki, Homer, Grafana), see Tutorials 01-07.