← All Tutorials

VoIP Troubleshooting Runbook

Monitoring & Observability Beginner 59 min read #19

VoIP Troubleshooting Runbook

A Systematic 7-Step Diagnostic Procedure for Asterisk & ViciDial Issues

Difficulty: Intermediate to Advanced | Use case: Day-to-day operations | Asterisk version: 16+ (tested on 18.x, compatible with 11.x/13.x)


Difficulty Intermediate to Advanced
Time to Use 5-30 minutes per incident
Prerequisites Root SSH access to Asterisk/ViciDial server, basic SQL knowledge, familiarity with SIP concepts
Tested On Asterisk 18.26, ViciDial 2.14+, MariaDB 10.5+, Homer 7.x, Smokeping 2.8

Table of Contents

  1. Introduction: Why You Need a Runbook
  2. Diagnostic Philosophy
  3. The 7-Step Diagnostic Procedure
  4. Common Issues and Solutions
  5. Decision Trees
  6. SIP Response Code Reference
  7. Asterisk Hangup Cause Reference
  8. ViciDial Status and Term Reason Reference
  9. Tool Commands Quick Reference
  10. Building Your Own Diagnostic CLI Tool
  11. Appendix: Configuration File Locations

1. Introduction: Why You Need a Runbook

At 2 AM, a trunk goes down. At 9 AM on Monday, three agents report choppy audio. At noon, a client calls to say customers are hearing silence after they answer. In each of these situations, you need to diagnose the problem quickly and accurately -- not fumble through random CLI commands hoping to stumble on the cause.

This runbook is the exact diagnostic procedure used to troubleshoot a production ViciDial call center fleet handling thousands of calls per day across multiple servers and SIP providers. Every command, every SQL query, every decision tree comes from real incidents that cost real money when they were not caught quickly.

What this runbook gives you:

Who this is for:


2. Diagnostic Philosophy

Before diving into the steps, internalize these principles:

Always work from the database outward

The database is the source of truth. Every call that passes through ViciDial gets logged with timestamps, statuses, hangup reasons, and unique IDs. Start there, get the facts, then go hunting in logs and live systems.

Correlate across layers

A single symptom can have causes at different layers:

[Application Layer]  ViciDial disposition codes, agent states, call routing
        |
[Signaling Layer]    SIP INVITE/BYE/CANCEL, 4xx/5xx responses, SDP negotiation
        |
[Media Layer]        RTP streams, codecs, jitter buffers, packet loss
        |
[Network Layer]      Latency, packet loss, MTU, firewall rules, NAT traversal
        |
[Infrastructure]     Server load, disk space, DNS resolution, time sync

A call that "drops after 30 seconds" could be a missing SIP ACK (signaling), asymmetric NAT (network), a full disk stopping recording (infrastructure), or a ViciDial timeout (application). The 7-step procedure checks all layers systematically.

Document as you go

When you find the root cause at 3 AM, you will not remember the details at 9 AM. Capture:

The 80/20 of VoIP problems

In practice, 80% of VoIP issues fall into five categories:

Category Typical share Root cause
NAT / firewall 30% Agent behind restrictive NAT, RTP ports blocked, SIP ALG interference
Trunk / carrier 25% Provider outage, IP changed, SIP credentials expired, route congestion
Network quality 15% High latency (>150ms), packet loss (>1%), jitter (>30ms)
Agent software 15% Old softphone, wrong codec, microphone issues, VPN interference
Server-side 15% Disk full, Asterisk overloaded, misconfigured dialplan, time drift

3. The 7-Step Diagnostic Procedure

Step 1: Identify the Call in the Database

Goal: Get the uniqueid, timestamps, status codes, and call metadata. This anchors every subsequent step.

For inbound calls (customer called in):

-- Search by phone number in the inbound call log
SELECT call_date, phone_number, length_in_sec, status, term_reason,
       uniqueid, user, campaign_id, queue_seconds, closecallid
FROM vicidial_closer_log
WHERE phone_number LIKE '%PHONE_NUMBER%'
ORDER BY call_date DESC
LIMIT 20;

For outbound calls (dialer called out):

-- Search by phone number in the outbound call log
SELECT call_date, phone_number, length_in_sec, status, term_reason,
       uniqueid, user, campaign_id, list_id
FROM vicidial_log
WHERE phone_number LIKE '%PHONE_NUMBER%'
ORDER BY call_date DESC
LIMIT 20;

What to look for:

Field What it tells you
status Call disposition -- A (answered), DISMX (abnormal disconnect inbound), DCMX (abnormal disconnect outbound), DROP (abandoned in queue), NANQUE (no agent in queue), AFTHRS (after hours)
term_reason Who ended the call -- CALLER, AGENT, NONE (system), ABANDON (hung up in queue), NOAGENT, AFTERHOURS
length_in_sec Duration -- calls under 15 seconds with CALLER termination often indicate one-way audio (customer hears nothing, hangs up)
queue_seconds Time spent waiting -- high values indicate no available agents
uniqueid The Asterisk channel unique ID -- you need this for every subsequent step
user The agent who handled the call (if any)

Quick pattern recognition:

-- Find all abnormal disconnects in the last 24 hours
SELECT call_date, phone_number, status, term_reason, uniqueid, user
FROM vicidial_closer_log
WHERE status IN ('DISMX', 'DCMX')
  AND call_date > NOW() - INTERVAL 24 HOUR
ORDER BY call_date DESC;

-- Find calls that dropped within 15 seconds (possible one-way audio)
SELECT call_date, phone_number, length_in_sec, status, term_reason, user
FROM vicidial_closer_log
WHERE length_in_sec < 15
  AND length_in_sec > 0
  AND term_reason = 'CALLER'
  AND call_date > NOW() - INTERVAL 24 HOUR
ORDER BY call_date DESC;

Tip: If the user reports "calls dropping" but you see term_reason = CALLER, the customer is hanging up. This often points to audio quality issues rather than system failures -- the customer cannot hear the agent, so they hang up.


Step 2: Trace the Call in Homer SIP Capture

Goal: See the complete SIP signaling flow -- INVITE, 100 Trying, 180 Ringing, 200 OK, ACK, BYE -- and identify where the conversation broke down.

If you have Homer deployed (see Tutorial 01), search by the phone number or Call-ID:

  1. Open Homer web UI at http://YOUR_MONITORING_SERVER:9080
  2. Set the time range to cover the call
  3. Search by:
    • Calling number or Called number (the phone number from Step 1)
    • Call-ID (if you have it from Asterisk logs)
  4. Click on the call to see the SIP ladder diagram

What to look for in the ladder diagram:

Pattern Meaning
INVITE -> 100 -> 180 -> 200 -> ACK -> (talk) -> BYE Normal call flow
INVITE -> 100 -> 480 Temporarily unavailable -- agent not registered
INVITE -> 100 -> 486 Busy -- agent on another call
INVITE -> 100 -> 408 Request timeout -- network/registration issue
INVITE -> 100 -> 503 Service unavailable -- trunk overloaded or down
INVITE -> 200 -> ACK -> (short pause) -> BYE Call connected but dropped quickly -- check media
INVITE -> 200 -> ACK -> (no BYE for a long time) Zombie channel -- call ended but no BYE sent
No INVITE at all Call never reached the trunk -- check dialplan

Check SDP (Session Description Protocol) for media issues:

In the 200 OK response, examine the SDP body:

v=0
o=- 12345 12345 IN IP4 203.0.113.50
c=IN IP4 203.0.113.50          <-- Media IP (should be public)
m=audio 18000 RTP/AVP 0 8 18   <-- RTP port and codec offers
a=rtpmap:0 PCMU/8000            <-- G.711 ulaw
a=rtpmap:8 PCMA/8000            <-- G.711 alaw
a=rtpmap:18 G729/8000           <-- G.729

Red flags in SDP:

If you do not have Homer:

You can still capture SIP packets on the server:

# Live capture of SIP traffic on port 5060 (run during a test call)
tcpdump -i eth0 -n -s 0 port 5060 -w /tmp/sip-capture.pcap

# Then analyze with sngrep (if installed) or download the pcap
sngrep -I /tmp/sip-capture.pcap

# Or use ngrep for real-time SIP message viewing
ngrep -W byline -d eth0 port 5060

Step 3: Check Asterisk Logs by Call ID

Goal: Follow the call through Asterisk's internal processing -- channel creation, dialplan execution, bridge setup, hangup cause.

Find the call in Asterisk logs:

# Using the uniqueid from Step 1, search the Asterisk message log
grep 'UNIQUEID_FROM_STEP_1' /var/log/asterisk/messages

# If you have a Call-ID (C-XXXXXXXX format), search by that
grep 'C-XXXXXXXX' /var/log/asterisk/messages

# Search by phone number to find the Call-ID first
grep 'PHONE_NUMBER' /var/log/asterisk/messages | tail -50

Critical log patterns to look for:

Normal call flow:

-- Executing [s@default:1] Answer()
-- Executing [s@default:2] Dial(SIP/agent_100,,tT)
-- SIP/agent_100-00001234 answered SIP/trunk_provider-00001235
-- Channel SIP/trunk_provider-00001235 joined bridge
-- Channel SIP/agent_100-00001234 joined bridge

Jitter buffer resyncing (IAX2 audio issues):

chan_iax2.c: Resyncing the jb -- Loss: 0.0024  Delay: 15234  Jit: 0.03

If the delay values are in the thousands or tens of thousands, the IAX2 jitter buffer is struggling. This causes choppy, delayed, or garbled audio on conference bridges.

# Count jitter buffer resyncs today
grep "$(date +%Y-%m-%d)" /var/log/asterisk/messages | grep -c 'Resyncing the jb'
# More than 100/day = problem; more than 1000/day = critical

Strict RTP source switching (NAT issue):

res_rtp_asterisk.c: Strict RTP switching source address to 198.51.100.25:12345

This means Asterisk received RTP from a different IP than expected. Common when agents are behind NAT. Usually harmless (Asterisk adapts), but if it happens repeatedly on the same call, it indicates unstable NAT mapping.

Dial failures:

app_dial.c: Called SIP/trunk_provider/441234567890
app_dial.c: SIP/trunk_provider-0001 is circuit-busy

The trunk is at capacity or not responding.

Hangup cause in the h extension:

-- Executing [h@default:1] NoOp("SIP/trunk-0001", "Hangup cause: 16")

Cause 16 is normal clearing. Anything else warrants investigation (see the full reference in Section 7).

Channel errors:

func_hangupcause.c: Unable to find information for channel
bridge_channel.c: Channel SIP/agent_100-0001 left bridge

These indicate abnormal channel teardown -- the call was not cleanly terminated.


Step 4: Inspect Carrier-Level Details

Goal: Determine if the problem is on the carrier/trunk side -- SIP response codes, dial timing, and hangup causes at the network edge.

-- Check carrier log for the specific call (use uniqueid from Step 1)
SELECT call_date, dialstatus, hangup_cause, sip_hangup_cause,
       sip_hangup_reason, dial_time, answered_time, channel
FROM vicidial_carrier_log
WHERE uniqueid = 'UNIQUEID_FROM_STEP_1';

-- Or search by trunk name for recent failures
SELECT call_date, dialstatus, hangup_cause, sip_hangup_cause,
       sip_hangup_reason, channel
FROM vicidial_carrier_log
WHERE channel LIKE '%TRUNK_NAME%'
ORDER BY call_date DESC
LIMIT 20;

-- Aggregate trunk health: failure rate in the last hour
SELECT
    SUBSTRING_INDEX(channel, '-', 1) AS trunk,
    COUNT(*) AS total_calls,
    SUM(CASE WHEN dialstatus = 'ANSWER' THEN 1 ELSE 0 END) AS answered,
    SUM(CASE WHEN dialstatus = 'BUSY' THEN 1 ELSE 0 END) AS busy,
    SUM(CASE WHEN dialstatus = 'NOANSWER' THEN 1 ELSE 0 END) AS noanswer,
    SUM(CASE WHEN dialstatus = 'CHANUNAVAIL' THEN 1 ELSE 0 END) AS unavail,
    SUM(CASE WHEN dialstatus = 'CONGESTION' THEN 1 ELSE 0 END) AS congestion
FROM vicidial_carrier_log
WHERE call_date > NOW() - INTERVAL 1 HOUR
GROUP BY trunk
ORDER BY total_calls DESC;

Interpreting carrier log fields:

Field Values Meaning
dialstatus ANSWER Call was answered normally
BUSY Destination returned 486 Busy
NOANSWER Ring timeout -- no answer within dial_time
CHANUNAVAIL Trunk is down or unregistered
CONGESTION Network congestion or invalid number (usually 503/404)
CANCEL Caller hung up before answer
hangup_cause 1-127 PRI/Q.931 cause code (see Section 7)
sip_hangup_cause 200-699 SIP response code (see Section 6)
sip_hangup_reason Text Human-readable reason from the carrier
dial_time Seconds How long the call rang before answer/hangup
answered_time Seconds Duration of the answered portion

Red flags:


Step 5: Analyze Audio and Recordings

Goal: Listen to the actual call audio to confirm the reported symptom and identify the audio problem type.

Locate the recording:

# Recordings are stored by date. Find by the call date from Step 1:
find /var/spool/asterisk/monitorDONE/YYYYMMDD/ -name "*UNIQUEID*" -o -name "*PHONE_NUMBER*"

# Or search the ViciDial recordings table:
mysql -u USER -pPASSWORD DATABASE -e "
    SELECT recording_id, filename, location, start_time, length_in_sec
    FROM recording_log
    WHERE lead_id = LEAD_ID
    ORDER BY start_time DESC
    LIMIT 10;
"

Audio analysis checklist:

Listen for Indicates
Silence on one side only One-way audio (NAT / firewall blocking RTP in one direction)
Complete silence both sides RTP not flowing at all (ports blocked, codec mismatch, wrong media IP)
Choppy / cutting in and out Packet loss > 2% or high jitter > 30ms
Robotic / metallic voice Extreme jitter, jitter buffer underruns
Echo Impedance mismatch at PSTN gateway, or acoustic echo from agent's speaker/mic
Background static / hiss Low-quality codec (G.729) + packet loss, or analog line noise
Delayed audio (walkie-talkie) High latency > 300ms (satellite link, distant agent, VPN overhead)
Audio fine then sudden drop Network route change, NAT mapping timeout, SIP session timer mismatch

Automated audio analysis (if you have the service):

# If you have deployed the AI audio analysis service (Tutorial 02):
curl "http://YOUR_MONITORING_SERVER:8084/analyze?recording_url=http://YOUR_VICIDIAL_SERVER/RECORDINGS/path/to/file.wav"

# This returns NISQA MOS scores, silence detection, and AI-generated analysis

Step 6: Check Network Quality (Smokeping / RTCP)

Goal: Determine if the network path between your server and the carrier/agent has latency, packet loss, or jitter that explains audio problems.

Check Smokeping for trunk provider latency:

If you have Smokeping deployed (see Tutorial 01):

  1. Open http://YOUR_MONITORING_SERVER:8081
  2. Navigate to your SIP providers target group
  3. Look for:
    • Latency spikes that coincide with the reported call time
    • Packet loss (gaps in the graph) at the same time
    • Baseline comparison -- is today's latency higher than the 7-day average?

Check live RTP statistics:

# Show RTP channel statistics for all active calls
asterisk -rx 'sip show channelstats'

Output format:

Peer             Call ID      Duration Recv: Pack  Lost       (     %) Jitter Send: Pack  Lost       (     %) Jitter
203.0.113.50     2cb43240375  00:44:44 0000000134K 0000000008 ( 0.01%) 0.0019 0000000134K 0000000016 ( 0.01%) 0.0004
198.51.100.25    9af21c9f-99  00:01:58 0000005766  0000000000 ( 0.00%) 0.0000 0000005892  0000000001 ( 0.02%) 0.0000

Interpreting the output:

Metric Good Warning Critical
Packet Loss < 0.5% 0.5% - 2% > 2%
Jitter < 10ms 10 - 30ms > 30ms
Latency (from sip show peer) < 80ms 80 - 150ms > 150ms

Quick network tests:

# Ping the trunk provider (check latency and loss)
ping -c 20 TRUNK_PROVIDER_IP

# Traceroute to identify where latency is introduced
traceroute -n TRUNK_PROVIDER_IP

# Check for asymmetric routing (can cause one-way audio)
mtr --report --report-cycles 50 TRUNK_PROVIDER_IP

# Verify RTP port range is open in firewall
iptables -L INPUT -n --line-numbers | grep -E "udp.*10000|udp.*20000"
# You should see an ACCEPT rule for UDP ports 10000:20000 (or your configured range)

# Check RTP configuration
asterisk -rx 'rtp show settings'

Check RTP port range matches firewall:

# Read the configured RTP port range
grep -E 'rtpstart|rtpend' /etc/asterisk/rtp.conf
# Example output:
#   rtpstart=10000
#   rtpend=20000

# Verify firewall allows that UDP range
iptables -S INPUT | grep -E 'udp.*dport.*(10000|20000)'
# Must see: -A INPUT -p udp --dport 10000:20000 -j ACCEPT

If there is a mismatch between the RTP port range and the firewall rule, media will be partially or completely blocked.


Step 7: Verify Agent State and SIP Registration

Goal: Check if the problem is agent-specific -- their softphone, network connection, or registration state.

Check agent SIP registration:

# Show detailed SIP peer information for a specific agent extension
asterisk -rx 'sip show peer AGENT_EXTENSION'

Key fields to check:

Field What to look for
Status OK (XXms) -- the ms value is latency. Under 100ms is good; over 150ms will cause noticeable delay
Addr->IP Agent's public IP address. If it is a private address (10.x, 192.168.x), registration is coming through a NAT without proper traversal
Useragent Software version. Old softphones (eyeBeam 1.x, X-Lite 3.x) have known SIP bugs
Codecs Must match what the trunk supports. If agent offers only G.729 and trunk requires G.711, audio will fail or require transcoding
Qualify Must be enabled for Asterisk to detect when the agent goes offline
Nat Should show force_rport+comedia for agents behind NAT
ACL If configured, verify agent's IP is within the allowed range

Check if agent is LAGGED:

# Show all SIP peers and their qualification status
asterisk -rx 'sip show peers' | grep -E 'LAGGED|UNREACHABLE|UNKNOWN'

LAGGED means the qualify response took longer than the configured threshold (default 2000ms). This usually means:

-- Check LAGGED agents in ViciDial's live agent table
SELECT user, server_ip, status, callerid, last_update_time,
       TIMESTAMPDIFF(SECOND, last_update_time, NOW()) AS seconds_since_update
FROM vicidial_live_agents
WHERE status = 'LAGGED'
   OR TIMESTAMPDIFF(SECOND, last_update_time, NOW()) > 30;

An agent whose last_update_time is more than 30 seconds old is effectively frozen -- they will not receive calls even if they appear logged in.

Check agent live status in ViciDial:

-- All agents currently logged in with their state
SELECT la.user, vu.full_name, la.status, la.callerid,
       la.campaign_id, la.phone_login, la.server_ip,
       TIMESTAMPDIFF(SECOND, la.last_update_time, NOW()) AS idle_seconds
FROM vicidial_live_agents la
JOIN vicidial_users vu ON la.user = vu.user
ORDER BY la.status, idle_seconds DESC;
Status Meaning
READY Waiting for a call
INCALL On an active call
PAUSED Manually paused
CLOSER Waiting for inbound calls
QUEUE Call ringing to agent
LAGGED SIP registration delayed
DEAD Connection lost -- agent will not receive calls

4. Common Issues and Solutions

4.1 One-Way Audio

Symptom: One party can hear the other, but not vice versa. Or both parties hear silence.

Root cause: RTP (audio) packets are flowing in one direction only. Almost always a NAT/firewall issue.

Diagnosis:

# 1. Check the agent's SIP peer for NAT settings
asterisk -rx 'sip show peer AGENT_EXT' | grep -E 'Nat|Addr|Status'

# 2. Check if strict RTP is interfering
grep 'Strict RTP' /var/log/asterisk/messages | tail -10

# 3. Check RTP settings
grep strictrtp /etc/asterisk/rtp.conf

Fix:

; In sip.conf or sip-vicidial.conf, for the agent's peer definition:
[agent_100]
nat=force_rport,comedia    ; Force symmetric RTP and rport
qualify=yes                ; Keep NAT pinholes open
qualifyfreq=30             ; Qualify every 30 seconds
directmedia=no             ; Force media through Asterisk (never direct)

For rtp.conf, if strict RTP is causing issues:

; Try seqno mode instead of full strict RTP
strictrtp=seqno

If the carrier's SDP has a private IP in the media address:

; Add to the trunk peer definition
nat=force_rport,comedia

This tells Asterisk to ignore the SDP media IP and use the IP from which it actually receives RTP packets.


4.2 LAGGED Agents

Symptom: Agent shows as "LAGGED" in sip show peers. They appear logged in but do not receive calls.

Diagnosis:

# Check qualify time for the agent
asterisk -rx 'sip show peer AGENT_EXT' | grep -E 'Status|Qualify|Addr'

# Check all LAGGED peers
asterisk -rx 'sip show peers' | grep LAGGED

Common causes and fixes:

Cause Fix
Agent's internet is slow Ask agent to run a speed test, check for downloads/streaming
Agent is on VPN VPN adds latency; try split tunneling or direct connection
Qualify threshold too aggressive Increase qualifyfreq and qualify timeout
DNS resolution delay Use IP addresses instead of hostnames in SIP config
Server overloaded Check top / uptime -- high load delays qualify responses
; Increase qualify tolerance for remote agents:
[agent_100]
qualify=5000         ; Allow up to 5 seconds (default is 2000ms)
qualifyfreq=60       ; Check every 60 seconds instead of 30

4.3 Call Drops Mid-Conversation

Symptom: Calls are answered and both parties can hear each other, then the call suddenly disconnects during the conversation.

Diagnosis:

-- Find mid-call disconnects (abnormal dispositions)
SELECT call_date, phone_number, length_in_sec, status, term_reason, uniqueid
FROM vicidial_closer_log
WHERE status IN ('DISMX', 'DCMX')
  AND call_date > NOW() - INTERVAL 24 HOUR
ORDER BY call_date DESC;
# Check if there is a pattern in the call duration
# SIP session timers expire at fixed intervals (commonly 1800s = 30 min)
# If calls consistently drop at the same duration, it is a timer issue

# Check Asterisk log for the specific call
grep 'UNIQUEID' /var/log/asterisk/messages | grep -i 'hangup\|destroy\|bye'

Common causes:

Pattern Likely cause Fix
Drops at exactly 30 min (1800s) SIP session timer mismatch Set session-timers=refuse on the trunk or session-expires=7200
Drops at random times NAT mapping timeout (typically 60-300s for UDP) Enable SIP keepalives: qualify=yes, qualifyfreq=25
Drops correlate with network events Route change, ISP failover Check Smokeping for latency spikes at call time
Drops only on specific trunk Carrier issue Contact carrier with Call-IDs and timestamps
Multiple agents affected simultaneously Server issue Check uptime, disk space, Asterisk core dumps

4.4 Trunk UNREACHABLE

Symptom: sip show peers shows the trunk as UNREACHABLE. All outbound calls through that trunk fail.

Diagnosis:

# 1. Check trunk status
asterisk -rx 'sip show peers' | grep TRUNK_NAME

# 2. Ping the trunk provider
ping -c 5 TRUNK_PROVIDER_IP

# 3. Check if the IP is whitelisted in the firewall
iptables -S INPUT | grep TRUNK_PROVIDER_IP

# 4. Check if SIP port 5060 is reachable
nc -zvu TRUNK_PROVIDER_IP 5060

# 5. Check DNS resolution (if trunk uses hostname)
dig TRUNK_HOSTNAME

# 6. Manually send a SIP OPTIONS to test connectivity
sipvicious_svmap TRUNK_PROVIDER_IP  # or use sipsak if installed
sipsak -vv -s sip:TRUNK_PROVIDER_IP:5060

Fix by cause:

Cause Fix
Provider is down Contact provider; route traffic to backup trunk
IP address changed Update sip-vicidial.conf with new IP; reload SIP
Firewall blocking Add iptables -I INPUT -s TRUNK_IP -j ACCEPT; save rules
DNS resolution failure Use IP instead of hostname; fix /etc/resolv.conf
SIP credentials expired Update secret= in trunk config; reload SIP
Provider requires re-registration Add register => line to sip.conf; reload
# After fixing, reload SIP configuration (does not drop active calls):
asterisk -rx 'sip reload'

# Verify the trunk comes back:
asterisk -rx 'sip show peer TRUNK_NAME' | grep Status

4.5 Codec Mismatches

Symptom: Calls connect (SIP 200 OK) but there is no audio. Or audio is distorted with clicking/popping sounds.

Diagnosis:

# Check what codecs the trunk supports vs what it is offered
asterisk -rx 'sip show peer TRUNK_NAME' | grep -A5 Codecs

# Check active call codec negotiation
asterisk -rx 'core show channels verbose' | grep -E 'Codec|Format'

# Look for transcoding (uses CPU, degrades quality)
asterisk -rx 'core show translation'

Fix:

; Ensure trunk and agent peers agree on codecs
; In the trunk peer definition:
[trunk_provider]
disallow=all
allow=alaw          ; G.711 A-law (European standard)
allow=ulaw          ; G.711 Mu-law (North American standard)
allow=g729          ; G.729 (bandwidth-efficient, requires license)

; In the agent peer definition -- match the trunk:
[agent_100]
disallow=all
allow=alaw
allow=ulaw

Best practices for codec ordering:


4.6 NAT and Firewall Issues

Symptom: Agents behind home routers or corporate firewalls experience one-way audio, registration drops, or LAGGED status.

Comprehensive NAT checklist:

# 1. Verify the agent is behind NAT (their registered IP will be public, not private)
asterisk -rx 'sip show peer AGENT_EXT' | grep 'Addr->IP'

# 2. Check if the SIP ALG is interfering (common on consumer routers)
# The agent needs to disable SIP ALG on their router
# Signs of SIP ALG: mangled Contact headers, wrong port in Via

# 3. Verify NAT settings in the peer config
grep -A 20 '\[AGENT_EXT\]' /etc/asterisk/sip-vicidial.conf | grep nat

# 4. Check if RTP ports are open
iptables -L INPUT -n | grep -E "udp.*(10000|20000)"

Required peer settings for NAT'd agents:

[agent_100]
nat=force_rport,comedia
qualify=yes
qualifyfreq=25
directmedia=no
; directmedia=no forces ALL media through Asterisk.
; Without this, Asterisk may try to send RTP directly between
; the trunk and agent -- which fails if the agent is behind NAT.

Required global settings:

; In [general] section of sip.conf:
localnet=10.0.0.0/8          ; Define your local networks
localnet=172.16.0.0/12
localnet=192.168.0.0/16
externaddr=YOUR_PUBLIC_IP     ; Your server's public IP

Agent-side fixes:

  1. Disable SIP ALG on the router (Application Layer Gateway -- it mangles SIP headers)
  2. Use port 5060 or a non-standard port (some ISPs block 5060)
  3. Enable STUN in the softphone if available
  4. Use TCP for SIP signaling if UDP is unreliable (some firewalls drop UDP after timeout)

4.7 High Jitter and Choppy Audio

Symptom: Audio cuts in and out, sounds robotic, or has a "underwater" quality.

Diagnosis:

# Check live RTP stats for all active calls
asterisk -rx 'sip show channelstats'

# Check for IAX2 jitter buffer issues (if using IAX trunks or ConfBridge with IAX2)
grep -c 'Resyncing the jb' /var/log/asterisk/messages

# Check today's jitter buffer resyncs
grep "$(date +%b\ %d)" /var/log/asterisk/messages | grep -c 'Resyncing the jb'

Jitter buffer resync thresholds:

Count (per day) Severity Action
< 10 Normal No action needed
10 - 100 Warning Monitor; check agent connections
100 - 1000 Problem Identify affected channels; check clock sync
> 1000 Critical Likely clock desync in ConfBridge/MeetMe IAX2 loopback

Fix for IAX2 jitter buffer issues:

; In iax.conf, disable the jitter buffer for the local loopback:
[general]
jitterbuffer=no
forcejitterbuffer=no

Fix for agent-side jitter:

The jitter is happening on the agent's internet connection. You cannot fix their ISP, but you can mitigate:

; Enable Asterisk-side jitter buffer for the agent peer
[agent_100]
jbimpl=adaptive
jbmaxsize=200       ; Maximum jitter buffer in milliseconds
jbresyncthreshold=1000
jblog=no

4.8 DISMX / DCMX Dispositions

Symptom: Calls show disposition DISMX (inbound) or DCMX (outbound) -- indicating an abnormal disconnect by the system rather than by either party.

What these mean:

Code Full name Meaning
DISMX Disconnect - Manager eXternal Inbound call disconnected abnormally (not by caller or agent)
DCMX Disconnect - Campaign Manager eXternal Outbound call disconnected abnormally

Common causes:

Investigation:

-- Check if DISMX/DCMX correlates with specific agents
SELECT user, COUNT(*) AS abnormal_count
FROM vicidial_closer_log
WHERE status IN ('DISMX', 'DCMX')
  AND call_date > NOW() - INTERVAL 7 DAY
GROUP BY user
ORDER BY abnormal_count DESC
LIMIT 10;
-- If one agent has significantly more than others, it is their connection

-- Check if DISMX/DCMX correlates with time of day
SELECT HOUR(call_date) AS hour, COUNT(*) AS count
FROM vicidial_closer_log
WHERE status IN ('DISMX', 'DCMX')
  AND call_date > NOW() - INTERVAL 7 DAY
GROUP BY hour
ORDER BY hour;
-- Spikes at specific hours suggest network congestion patterns

4.9 Conference Bridge Issues

Symptom: Agents join a conference bridge but hear echo, garbled audio, or no audio at all. MeetMe/ConfBridge conferences show zombie channels.

Diagnosis:

# Check active conferences
asterisk -rx 'confbridge list'          # For ConfBridge
asterisk -rx 'meetme list'              # For MeetMe

# Check for zombie conferences (conferences with 0 or 1 participant that persist)
asterisk -rx 'confbridge list' | awk '$2 <= 1 {print}'

# Check DAHDI timing (required for MeetMe)
asterisk -rx 'dahdi show status'
# Should show timer device with accuracy close to 100%

# Check for stuck channels in conferences
asterisk -rx 'core show channels' | grep -i conf

Fix zombie conferences:

# Kick all users from a specific conference
asterisk -rx 'confbridge kick CONF_NUMBER all'

# Or for MeetMe:
asterisk -rx 'meetme kick CONF_NUMBER all'

Fix DAHDI timing (MeetMe):

# Check if DAHDI timer module is loaded
lsmod | grep dahdi

# If not loaded:
modprobe dahdi
dahdi_genconf
dahdi_cfg -vv

# Verify timing accuracy
cat /proc/dahdi/timer

4.10 Recording Failures

Symptom: Calls are not being recorded, or recordings are 0 bytes, or recordings contain only silence.

Diagnosis:

# Check disk space (most common cause)
df -h /var/spool/asterisk/monitorDONE/
# If > 90% full, recordings will fail silently

# Check recording directory permissions
ls -la /var/spool/asterisk/monitor/
# Owner should be asterisk:asterisk with write permission

# Check for 0-byte recording files today
find /var/spool/asterisk/monitorDONE/$(date +%Y%m%d)/ -size 0 -name "*.wav" | wc -l

# Check Asterisk recording settings
asterisk -rx 'core show settings' | grep -i record

# Verify MixMonitor/Monitor is running on active calls
asterisk -rx 'core show channels verbose' | grep -i mix

Common fixes:

Cause Fix
Disk full Clean old recordings: find /var/spool/asterisk/monitorDONE/ -mtime +90 -name "*.wav" -delete
Wrong permissions chown -R asterisk:asterisk /var/spool/asterisk/monitor/
Missing sox/lame Install: yum install sox or zypper install sox
Recording format wrong Check mixmon_format in ViciDial system settings

5. Decision Trees

5.1 "Caller Reports No Audio" Flowchart

CALLER REPORTS NO AUDIO
│
├── Is it ONE-WAY or BOTH-WAY silence?
│   │
│   ├── ONE-WAY (caller hears agent but agent can't hear caller, or vice versa)
│   │   │
│   │   ├── Check: Is agent behind NAT?
│   │   │   ├── YES → Verify nat=force_rport,comedia and directmedia=no
│   │   │   │         Also check: Is SIP ALG disabled on agent's router?
│   │   │   └── NO  → Check firewall: Are RTP ports (10000-20000 UDP) open?
│   │   │
│   │   ├── Check: Does carrier SDP show private IP in c= line?
│   │   │   ├── YES → Add nat=force_rport,comedia to trunk peer config
│   │   │   └── NO  → Check codec negotiation in SDP INVITE vs 200 OK
│   │   │
│   │   └── Check: Is strictrtp enabled?
│   │       ├── YES → Try strictrtp=seqno or strictrtp=no temporarily
│   │       └── NO  → Escalate: capture RTP with tcpdump, analyze packet flow
│   │
│   └── BOTH-WAY (complete silence for both parties)
│       │
│       ├── Check: Are there ANY RTP packets flowing?
│       │   ├── NO  → Firewall is blocking ALL RTP. Check iptables UDP rules.
│       │   └── YES → Codec mismatch. Check 'core show translation' for errors.
│       │
│       ├── Check: Is the media IP reachable?
│       │   ├── NO  → Carrier or agent has wrong externaddr/externip
│       │   └── YES → Check if directmedia=yes is causing media bypass issues
│       │
│       └── Check: Did the call actually connect? (200 OK + ACK in Homer)
│           ├── NO  → SIP signaling issue, not media issue
│           └── YES → Media path is broken. Capture and compare SDP from both legs.

5.2 "Calls Keep Dropping" Flowchart

CALLS KEEP DROPPING
│
├── Check: Are ALL calls dropping or just some?
│   │
│   ├── ALL CALLS (every call on the system)
│   │   │
│   │   ├── Check: Is Asterisk running?
│   │   │   ├── NO  → Restart: systemctl start asterisk
│   │   │   │         Check /var/log/asterisk/ for crash dumps
│   │   │   └── YES → Check server load: uptime, top, df -h
│   │   │
│   │   ├── Check: Is disk full?
│   │   │   ├── YES → Emergency cleanup of old recordings/logs
│   │   │   └── NO  → Check: Are ALL trunks UNREACHABLE?
│   │   │
│   │   └── Check: Is there a network outage?
│   │       ├── YES → Contact datacenter/ISP
│   │       └── NO  → Check Asterisk error log for repeated errors
│   │
│   ├── CALLS ON ONE TRUNK ONLY
│   │   │
│   │   ├── Check: sip show peers | grep TRUNK_NAME
│   │   │   ├── UNREACHABLE → Trunk is down (see Section 4.4)
│   │   │   └── OK          → Trunk is up but failing
│   │   │
│   │   ├── Check: carrier_log for sip_hangup_cause patterns
│   │   │   ├── 503 consistently → Carrier is overloaded
│   │   │   ├── 403 consistently → Authentication failure
│   │   │   └── Various codes   → Check each code in Section 6
│   │   │
│   │   └── Contact the carrier with Call-IDs and timestamps
│   │
│   └── CALLS FOR ONE AGENT ONLY
│       │
│       ├── Check: sip show peer AGENT_EXT
│       │   ├── LAGGED      → Agent's internet is slow (Section 4.2)
│       │   ├── UNREACHABLE → Agent disconnected
│       │   └── OK (high ms)→ Marginal connection, drops under load
│       │
│       ├── Check: Does the agent's call always drop at the same duration?
│       │   ├── YES (e.g., always at 30 min) → SIP session timer issue
│       │   └── NO (random times)            → NAT mapping timeout
│       │
│       └── Ask agent: Are they on WiFi, VPN, or shared internet?
│           Any of these can cause intermittent drops.

5.3 "Trunk Down" Flowchart

TRUNK IS UNREACHABLE
│
├── Step 1: Can you ping the trunk IP?
│   │
│   ├── NO (100% loss)
│   │   │
│   │   ├── Check: Has the provider's IP changed?
│   │   │   ├── dig PROVIDER_HOSTNAME → Compare with config
│   │   │   └── If changed → Update sip-vicidial.conf, reload SIP
│   │   │
│   │   ├── Check: Is the IP blocked by your firewall?
│   │   │   ├── iptables -S INPUT | grep PROVIDER_IP
│   │   │   └── If missing → Add: iptables -I INPUT -s IP -j ACCEPT
│   │   │
│   │   └── Provider may be down → Check their status page
│   │       └── Route traffic to backup trunk while waiting
│   │
│   └── YES (ping works)
│       │
│       ├── Check: Can you reach SIP port 5060?
│       │   ├── nc -zvu PROVIDER_IP 5060
│       │   ├── If CLOSED → Provider firewall is blocking you
│       │   │   └── Verify your server's IP is whitelisted with provider
│       │   └── If OPEN → SIP is reachable but qualify fails
│       │
│       ├── Check: Is the SIP registration failing?
│       │   ├── asterisk -rx 'sip show registry'
│       │   ├── If "Rejected" → Wrong username/password
│       │   ├── If "Timeout"  → Network issue or wrong port
│       │   └── If "No Registry" → This trunk does not use registration
│       │
│       └── Check: Is there a TLS/SRTP mismatch?
│           └── Try connecting without encryption to test
│
├── Step 2: Do you have a backup trunk?
│   │
│   ├── YES → Reroute traffic via dialplan or ViciDial carrier settings
│   └── NO  → This is a single point of failure. Plan for redundancy.
│
└── Step 3: Document and notify
    ├── Record the time, trunk name, and symptoms
    ├── Open a ticket with the carrier
    └── Set up monitoring to alert on trunk state changes (see Tutorial 01)

6. SIP Response Code Reference

1xx -- Provisional (Information)

Code Name What It Means
100 Trying The request has been received and is being processed. Next-hop server is working on it.
180 Ringing The destination phone is ringing. The caller should hear ring-back tone.
181 Call is Being Forwarded The call is being redirected to another destination.
182 Queued The call has been queued because the destination is temporarily unavailable.
183 Session Progress Early media is available (e.g., the carrier is playing an in-band ringback or announcement).
199 Early Dialog Terminated A provisional dialog was terminated before the final response.

2xx -- Success

Code Name What It Means
200 OK The request was successful. For INVITE, this means the call was answered. For REGISTER, registration accepted.
202 Accepted The request has been accepted for processing (used for REFER/MESSAGE).
204 No Notification The request was successful but no notification body is included.

3xx -- Redirection

Code Name What It Means
300 Multiple Choices The destination can be reached at multiple addresses; the caller should choose.
301 Moved Permanently The user is no longer at this address. Update your records.
302 Moved Temporarily The user is temporarily at a different address. Try the Contact header.
305 Use Proxy The request must be routed through the specified proxy.
380 Alternative Service The call failed but an alternative service (e.g., voicemail) is available.

4xx -- Client Error (Your Side)

Code Name What It Means Common Cause
400 Bad Request Malformed SIP message. Broken SIP ALG, buggy softphone, or truncated packet.
401 Unauthorized Authentication required (used by registrars). Missing or wrong secret= in peer config.
403 Forbidden The server understood the request but refuses to fulfill it. IP not whitelisted, credentials revoked, or calling number blocked.
404 Not Found The destination user/number does not exist. Wrong number, DID not configured on trunk, or typo in dialplan.
405 Method Not Allowed The SIP method (e.g., INFO, MESSAGE) is not supported. Trying to use a method the proxy does not implement.
406 Not Acceptable The response content is not acceptable (based on Accept header). Codec or content-type negotiation failure.
407 Proxy Authentication Required Authentication required by the proxy. Similar to 401, but from an intermediate proxy.
408 Request Timeout The server could not respond in time. Network latency, overloaded server, or DNS timeout.
410 Gone The user existed but is no longer available at this URI. Deactivated account or ported number.
412 Conditional Request Failed A precondition (If-Match header) failed. SRTP preconditions not met.
413 Request Entity Too Large The SIP message body is too large. Oversized SDP with too many codec lines.
415 Unsupported Media Type The server does not support the content type. Wrong SDP format or non-SDP body.
416 Unsupported URI Scheme The URI scheme (e.g., tel:) is not supported. Use sip: instead of tel: for the destination.
420 Bad Extension The server does not support a required SIP extension. Remove unsupported Require: headers.
421 Extension Required The server needs a specific extension that the client did not provide. Add the required extension.
422 Session Interval Too Small The Session-Expires value is too small. Increase session-timers interval in sip.conf.
423 Interval Too Brief The registration expiry time is too short. Increase the Expires value in REGISTER.
424 Bad Location Information The location information in the request is malformed. E911/location service configuration issue.
428 Use Identity Header The server requires an Identity header for authentication. STIR/SHAKEN configuration required.
429 Provide Referrer Identity A Referred-By header is needed for REFER. Add Referred-By when doing attended transfers.
433 Anonymity Disallowed The server does not accept anonymous calls. Remove Privacy header or present valid caller ID.
436 Bad Identity-Info The Identity-Info header URI is invalid. STIR/SHAKEN certificate issue.
437 Unsupported Certificate The certificate used for Identity validation is not supported. Update STIR/SHAKEN certificates.
438 Invalid Identity Header The Identity header is present but invalid. STIR/SHAKEN signing issue.
439 First Hop Lacks Outbound Support The first proxy does not support the outbound extension. Proxy configuration issue.
440 Max-Breadth Exceeded The server cannot fork the request to more destinations. Too many simultaneous ring targets.
469 Bad Info Package The Info-Package header references an unknown package. Remove unsupported INFO packages.
470 Consent Needed The server requires consent for this operation. User consent/privacy configuration.
480 Temporarily Unavailable The user is registered but currently not answering. Agent busy, DND enabled, or phone in power-save mode.
481 Call/Transaction Does Not Exist The server received a BYE or CANCEL for a nonexistent call. Race condition, or the call was already terminated.
482 Loop Detected The server detected a routing loop. Misconfigured proxy or dialplan creating circular route.
483 Too Many Hops The Max-Forwards counter reached zero. Routing loop or excessively deep proxy chain.
484 Address Incomplete The destination address is too short or incomplete. Missing digits in dialed number.
485 Ambiguous The destination address matches multiple users. Ambiguous number routing configuration.
486 Busy Here The destination is busy. Agent is on another call.
487 Request Terminated The INVITE was cancelled by a CANCEL request. Caller hung up before the call was answered. Normal.
488 Not Acceptable Here No common codec or media capability could be negotiated. Codec mismatch between trunk and destination. Fix allow= settings.
489 Bad Event The Event header references an unknown event package. Subscription event type not supported.
491 Request Pending The server has a pending request and cannot process another. Re-INVITE collision. Usually resolves automatically.
493 Undecipherable The server cannot decrypt S/MIME body. Encryption key mismatch.
494 Security Agreement Required A security mechanism negotiation is needed. TLS/IPSEC configuration required.

5xx -- Server Error (Their Side)

Code Name What It Means Common Cause
500 Server Internal Error The server encountered an unexpected condition. Carrier software crash or overload.
501 Not Implemented The server does not support the requested functionality. SIP method not implemented by the carrier.
502 Bad Gateway The server received an invalid response from a downstream server. Carrier's upstream route is broken.
503 Service Unavailable The server is temporarily unable to handle the request. Carrier overloaded, maintenance, or all circuits busy.
504 Server Time-out The server did not receive a response from a downstream server. Carrier's upstream provider is not responding.
505 Version Not Supported The SIP version in the request is not supported. Version mismatch (rare -- almost everything is SIP/2.0).
513 Message Too Large The SIP message exceeds the server's maximum size limit. Oversized request body. Reduce codec offerings.
555 Push Notification Service Not Supported The push notification service is not available. Mobile push notification issue.
580 Precondition Failure A required precondition (QoS, security) was not met. QoS reservation failure.

6xx -- Global Failure

Code Name What It Means Common Cause
600 Busy Everywhere The user is busy at all known locations. All of the user's devices are in use.
603 Decline The user explicitly declined the call. Call rejection or DND.
604 Does Not Exist Anywhere The destination number does not exist on any server. Invalid or decommissioned number.
606 Not Acceptable The user's capabilities do not match the request. No compatible codecs or media types.
607 Unwanted The call has been identified as unwanted (spam). STIR/SHAKEN attestation or spam filter.
608 Rejected The call was rejected by a policy or intermediary. Carrier-level call blocking.

7. Asterisk Hangup Cause Reference

These are Q.931/PRI cause codes used by Asterisk internally. They appear in the h extension, the HANGUPCAUSE channel variable, and the hangup_cause field in vicidial_carrier_log.

Normal Operation (1-31)

Cause Name What It Means Action
1 Unallocated Number The dialed number is not assigned to any route. Check the number is valid; verify trunk routing.
2 No Route to Network The carrier cannot find a route to the destination network. Carrier routing issue. Try another trunk.
3 No Route to Destination The carrier cannot route to the specific destination. Number may be invalid for that carrier or region.
4 Send Special Information Tone An operator intercept recording should be played. Usually means the number is disconnected.
5 Misdialed Trunk Prefix The trunk prefix (e.g., international dialing code) is wrong. Fix the dial pattern in the outbound route.
6 Channel Unacceptable The requested channel cannot be used. Try a different channel or trunk.
7 Call Awarded and Being Delivered The call is being connected (used in interworking). Informational; no action needed.
8 Preemption A higher-priority call preempted this one. Rare; usually only in military/government networks.
9 Preemption - Circuit Reserved The circuit was reserved for a higher-priority call. Same as above.
16 Normal Clearing The call was hung up normally by one of the parties. This is the expected hangup cause for normal calls.
17 User Busy The destination phone is busy. Normal -- the person is on another call.
18 No User Responding The destination phone is ringing but not being answered. Normal -- no answer. Check ring timeout.
19 No Answer from User Same as 18, but the ringing phase completed without answer. Normal -- ring timeout expired.
20 Subscriber Absent The destination user is not registered/reachable. Mobile is off, or SIP user not registered.
21 Call Rejected The destination explicitly rejected the call. The callee pressed reject, or a call screening rule blocked it.
22 Number Changed The number has been changed to a new number. Update your records with the new number.
23 Redirection to New Destination The call is being redirected. The call is being forwarded.
25 Exchange Routing Error A routing error occurred in the exchange. Carrier-side routing misconfiguration.
26 Non-Selected User Clearing The called party was not selected (hunt group scenario). Normal for hunt groups and ring groups.
27 Destination Out of Order The destination is unreachable due to a fault. Could be a dead phone, severed line, or crashed PBX.
28 Invalid Number Format The number format is invalid (too short, wrong prefix). Fix the dial pattern. Check country code and number length.
29 Facility Rejected A requested facility (e.g., call transfer) was rejected. The network does not support the requested feature.
30 Response to STATUS ENQUIRY Informational response to a status check. Informational; no action needed.
31 Normal, Unspecified Normal call clearing with no specific reason. Usually benign -- similar to cause 16.

Resource Issues (34-47)

Cause Name What It Means Action
34 No Circuit/Channel Available All circuits to the destination are busy. Trunk capacity exhausted. Wait or add more channels.
38 Network Out of Order The network is experiencing a failure. Major carrier issue. Switch to backup trunk.
41 Temporary Failure A temporary failure occurred in the network. Retry the call. If persistent, contact carrier.
42 Switching Equipment Congestion The switching equipment is overloaded. Carrier is overloaded. Reduce call volume or use alt trunk.
43 Access Information Discarded Required information was lost during transit. Carrier interworking issue.
44 Requested Circuit Not Available The specific requested circuit is not available. Similar to 34. Use any available circuit.
46 Precedence Call Blocked A higher-precedence call blocked this one. Government/military networks only.
47 Resource Unavailable, Unspecified Resources are not available (no specific detail). General capacity issue. Retry or use alt trunk.

Service Issues (49-69)

Cause Name What It Means Action
49 Quality of Service Unavailable The requested QoS cannot be provided. Network cannot guarantee quality. Try without QoS.
50 Requested Facility Not Subscribed The requested feature is not part of your subscription. Enable the feature with your carrier.
55 Incoming Calls Barred within CUG Incoming calls are blocked for this group. Call restriction setting on the destination.
57 Bearer Capability Not Authorized You are not authorized for this type of call (e.g., data). Check your service subscription with the carrier.
58 Bearer Capability Not Available The requested bearer capability is not available. The circuit type does not support this call type.
63 Service/Option Not Available The service is not available (unspecified reason). Contact carrier for details.
65 Bearer Capability Not Implemented The requested bearer type is not implemented. Use a different call type or codec.
66 Channel Type Not Implemented The requested channel type is not supported. Use a different channel type.
69 Requested Facility Not Implemented The requested network feature is not implemented. Feature not available on this network.

Invalid Messages (79-100)

Cause Name What It Means Action
79 Service/Option Not Implemented The service option is valid but not implemented. Feature request to carrier.
81 Invalid Call Reference The call reference value is not valid. Protocol error -- usually a software bug.
82 Identified Channel Does Not Exist The referenced channel does not exist. Configuration mismatch or timing issue.
83 A Suspended Call Exists There is already a suspended call on this reference. Resume the existing call first.
84 Call Identity In Use The call identity is already in use. Race condition in call setup.
85 No Call Suspended There is no call to resume. The call was already terminated.
86 Call Has Been Cleared The referenced call has already been cleared. Normal in some race conditions.
87 User Not Member of CUG The user is not part of the Closed User Group. Add the user to the group.
88 Incompatible Destination The destination is incompatible with this call type. Codec or service mismatch.
95 Invalid Message, Unspecified An invalid message was received (no specific detail). Protocol error. Check SIP message format.
96 Mandatory IE Missing A required information element is missing from the message. Broken SIP implementation. Update software.
97 Message Type Non-Existent The message type is not recognized. Protocol version mismatch.
98 Message Type Incompatible The message type is incompatible with the call state. State machine error. Usually a bug.
99 IE Non-Existent or Not Implemented An information element is not recognized or implemented. Usually non-fatal. May cause feature limitation.
100 Invalid IE Contents An information element has invalid content. Corrupted or malformed message.

Protocol Errors (101-127)

Cause Name What It Means Action
101 Message Not Compatible with Call State The message was received at the wrong time. State machine error. Usually resolves on retry.
102 Recovery on Timer Expiry A protocol timer expired and recovery was attempted. Network congestion or slow processing.
103 Parameter Non-Existent A parameter does not exist (passed through). Interworking issue between networks.
111 Protocol Error, Unspecified A protocol error occurred (no specific detail). Check logs for more detail. May need software update.
127 Interworking, Unspecified An error occurred at the boundary between networks. Carrier gateway issue. Contact carrier.

8. ViciDial Status and Term Reason Reference

Call Disposition Statuses

Status Full Name Description
A Answered Call was answered by an agent and properly dispositioned
DROP Drop Call was abandoned by the caller while waiting in queue
XDROP Extended Drop Call was dropped by the system after exceeding the max wait time
NANQUE No Agent in Queue Call arrived but no agents were logged in or available for the inbound group
AFTHRS After Hours Call arrived outside the configured business hours
DISMX Disconnect Manager External Inbound call disconnected abnormally (not by caller or agent)
DCMX Disconnect Campaign Manager External Outbound call disconnected abnormally
INCALL In Call Call is currently active (should not persist after call ends)
QUEUE Queue Call is currently waiting in queue (should not persist after call ends)
DISPO Disposition Agent is in the disposition screen for this call

Term Reason Values

Term Reason Meaning Diagnostic Value
CALLER The caller hung up Normal -- customer ended the call
AGENT The agent hung up Normal -- agent ended the call after completing the interaction
NONE No termination reason recorded System event -- could indicate a crash, timeout, or unclean disconnect
ABANDON Caller abandoned the queue Caller hung up while waiting for an agent. Check queue times.
NOAGENT No agent available No agents were logged in or in READY state for this inbound group
AFTERHOURS After hours routing triggered Call came in outside business hours. Check after-hours config.

9. Tool Commands Quick Reference

Asterisk CLI Commands

# === SIP STATUS ===
asterisk -rx 'sip show peers'                    # All SIP peers with status
asterisk -rx 'sip show peers' | grep LAGGED       # Find LAGGED agents
asterisk -rx 'sip show peers' | grep UNREACHABLE  # Find dead trunks/agents
asterisk -rx 'sip show peer EXTENSION'             # Detailed info for one peer
asterisk -rx 'sip show registry'                   # SIP trunk registrations
asterisk -rx 'sip show channelstats'               # Live RTP stats (loss, jitter)

# === CALLS & CHANNELS ===
asterisk -rx 'core show channels'                  # Active channels summary
asterisk -rx 'core show channels concise'          # Machine-readable channel list
asterisk -rx 'core show channels verbose'          # Detailed channel info
asterisk -rx 'core show channel SIP/peer-id'       # Single channel deep dive

# === CONFERENCES ===
asterisk -rx 'confbridge list'                     # ConfBridge conferences
asterisk -rx 'meetme list'                         # MeetMe conferences
asterisk -rx 'confbridge kick CONF all'            # Kick all from a conference

# === CODEC & MEDIA ===
asterisk -rx 'core show translation'               # Codec translation paths
asterisk -rx 'rtp show settings'                   # RTP configuration
asterisk -rx 'core show codecs'                    # Available codecs

# === DIAGNOSTICS ===
asterisk -rx 'core show uptime'                    # Asterisk uptime
asterisk -rx 'core show version'                   # Asterisk version
asterisk -rx 'core show settings'                  # Global settings
asterisk -rx 'core show sysinfo'                   # System resource info

# === DEBUGGING ===
asterisk -rx 'sip set debug on'                    # Enable SIP debug (VERBOSE)
asterisk -rx 'sip set debug off'                   # Disable SIP debug
asterisk -rx 'rtp set debug on'                    # Enable RTP debug (VERY VERBOSE)
asterisk -rx 'rtp set debug off'                   # Disable RTP debug

# === RELOAD (safe, does not drop calls) ===
asterisk -rx 'sip reload'                          # Reload SIP config
asterisk -rx 'dialplan reload'                     # Reload dialplan

MySQL/MariaDB Diagnostic Queries

-- Active calls right now
SELECT COUNT(*) AS active_calls FROM vicidial_auto_calls;

-- Agents currently logged in
SELECT user, status, campaign_id, phone_login,
       TIMESTAMPDIFF(SECOND, last_update_time, NOW()) AS idle_sec
FROM vicidial_live_agents
ORDER BY status, idle_sec DESC;

-- Call volume by hour (today)
SELECT HOUR(call_date) AS hr, COUNT(*) AS calls
FROM vicidial_closer_log
WHERE call_date >= CURDATE()
GROUP BY hr ORDER BY hr;

-- Trunk failure rate (last hour)
SELECT SUBSTRING_INDEX(channel, '-', 1) AS trunk,
       COUNT(*) AS total,
       SUM(dialstatus != 'ANSWER') AS failed,
       ROUND(SUM(dialstatus != 'ANSWER') / COUNT(*) * 100, 1) AS fail_pct
FROM vicidial_carrier_log
WHERE call_date > NOW() - INTERVAL 1 HOUR
GROUP BY trunk ORDER BY fail_pct DESC;

-- Average queue time by inbound group (today)
SELECT campaign_id, COUNT(*) AS calls,
       ROUND(AVG(queue_seconds), 0) AS avg_queue_sec,
       MAX(queue_seconds) AS max_queue_sec
FROM vicidial_closer_log
WHERE call_date >= CURDATE()
GROUP BY campaign_id ORDER BY avg_queue_sec DESC;

-- Abnormal disconnect trending (by day, last 7 days)
SELECT DATE(call_date) AS day,
       SUM(status = 'DISMX') AS dismx,
       SUM(status = 'DCMX') AS dcmx,
       COUNT(*) AS total_calls,
       ROUND((SUM(status IN ('DISMX','DCMX')) / COUNT(*)) * 100, 2) AS abnormal_pct
FROM vicidial_closer_log
WHERE call_date > NOW() - INTERVAL 7 DAY
GROUP BY day ORDER BY day;

Network and System Commands

# === NETWORK ===
ping -c 10 TRUNK_IP                       # Basic latency test
traceroute -n TRUNK_IP                     # Route path
mtr --report -c 50 TRUNK_IP               # Continuous route + loss report
nc -zvu TRUNK_IP 5060                      # Test SIP port reachability
tcpdump -i eth0 -n port 5060 -c 100       # Capture 100 SIP packets
ngrep -W byline -d eth0 port 5060         # Real-time SIP message dump

# === SYSTEM HEALTH ===
uptime                                     # Load average
df -h                                      # Disk space
free -m                                    # Memory usage
top -bn1 | head -20                        # CPU/process overview
iostat -x 1 5                              # Disk I/O stats

# === LOG ANALYSIS ===
# Count Asterisk errors today
grep "$(date +%b\ %d)" /var/log/asterisk/messages | grep -ci error

# Find all WARNING and ERROR lines in the last 100 lines
tail -100 /var/log/asterisk/messages | grep -iE 'WARNING|ERROR'

# Count jitter buffer resyncs (IAX2 audio issues)
grep -c 'Resyncing the jb' /var/log/asterisk/messages

# Find strict RTP source switches (NAT issues)
grep -c 'Strict RTP switching' /var/log/asterisk/messages

# === FIREWALL ===
iptables -L INPUT -n --line-numbers        # Show all INPUT rules with numbers
iptables -S INPUT | grep TRUNK_IP          # Check if trunk IP is whitelisted

10. Building Your Own Diagnostic CLI Tool

Typing these commands repeatedly is tedious and error-prone. Wrap your most-used diagnostics into a single shell script:

#!/bin/bash
# voip-diag -- VoIP Diagnostic Tool
# Usage: voip-diag <command>

case "${1:-help}" in

    # Show SIP peer status overview
    sip)
        echo "=== SIP Peers ==="
        asterisk -rx 'sip show peers' | head -50
        echo ""
        echo "LAGGED:      $(asterisk -rx 'sip show peers' | grep -c LAGGED)"
        echo "UNREACHABLE: $(asterisk -rx 'sip show peers' | grep -c UNREACHABLE)"
        echo "OK:          $(asterisk -rx 'sip show peers' | grep -c 'OK (')"
        ;;

    # Show active calls
    calls)
        echo "=== Active Calls ==="
        asterisk -rx 'core show channels'
        ;;

    # Show logged-in agents
    agents)
        mysql -u USER -pPASSWORD DATABASE -e "
            SELECT user, status, campaign_id,
                   TIMESTAMPDIFF(SECOND, last_update_time, NOW()) AS idle_sec
            FROM vicidial_live_agents
            ORDER BY status, idle_sec DESC;" 2>/dev/null
        ;;

    # Show RTP channel statistics
    rtp)
        echo "=== RTP Channel Stats ==="
        asterisk -rx 'sip show channelstats'
        ;;

    # Check a specific trunk
    trunk)
        if [ -z "$2" ]; then
            echo "Usage: voip-diag trunk TRUNK_NAME"
            exit 1
        fi
        echo "=== Trunk: $2 ==="
        asterisk -rx "sip show peer $2" | grep -E 'Status|Addr|Codecs|Qualify'
        echo ""
        echo "=== Recent Carrier Log ==="
        mysql -u USER -pPASSWORD DATABASE -e "
            SELECT call_date, dialstatus, hangup_cause, sip_hangup_cause
            FROM vicidial_carrier_log
            WHERE channel LIKE '%$2%'
            ORDER BY call_date DESC LIMIT 10;" 2>/dev/null
        ;;

    # Check a specific agent
    agent)
        if [ -z "$2" ]; then
            echo "Usage: voip-diag agent EXTENSION"
            exit 1
        fi
        echo "=== Agent: $2 ==="
        asterisk -rx "sip show peer $2" | grep -E 'Status|Addr|Useragent|Codecs|Nat|Qualify'
        echo ""
        echo "=== ViciDial Status ==="
        mysql -u USER -pPASSWORD DATABASE -e "
            SELECT user, status, campaign_id, phone_login,
                   TIMESTAMPDIFF(SECOND, last_update_time, NOW()) AS idle_sec
            FROM vicidial_live_agents
            WHERE phone_login = '$2';" 2>/dev/null
        ;;

    # Investigate a specific call by phone number
    call)
        if [ -z "$2" ]; then
            echo "Usage: voip-diag call PHONE_NUMBER"
            exit 1
        fi
        echo "=== Inbound Calls for $2 ==="
        mysql -u USER -pPASSWORD DATABASE -e "
            SELECT call_date, length_in_sec, status, term_reason, uniqueid, user
            FROM vicidial_closer_log
            WHERE phone_number LIKE '%$2%'
            ORDER BY call_date DESC LIMIT 10;" 2>/dev/null
        echo ""
        echo "=== Outbound Calls for $2 ==="
        mysql -u USER -pPASSWORD DATABASE -e "
            SELECT call_date, length_in_sec, status, term_reason, uniqueid, user
            FROM vicidial_log
            WHERE phone_number LIKE '%$2%'
            ORDER BY call_date DESC LIMIT 10;" 2>/dev/null
        ;;

    # Show system health
    health)
        echo "=== System Health ==="
        echo "Uptime: $(uptime)"
        echo "Disk:   $(df -h / | tail -1)"
        echo "Memory: $(free -m | grep Mem | awk '{printf "%dMB used / %dMB total (%.0f%%)\n", $3, $2, $3/$2*100}')"
        echo ""
        echo "=== Asterisk ==="
        asterisk -rx 'core show uptime'
        echo "Active channels: $(asterisk -rx 'core show channels concise' | wc -l)"
        echo ""
        echo "=== Errors Today ==="
        echo "Asterisk errors: $(grep "$(date +'%b %d')" /var/log/asterisk/messages 2>/dev/null | grep -ci error)"
        echo "JB resyncs:      $(grep "$(date +'%b %d')" /var/log/asterisk/messages 2>/dev/null | grep -c 'Resyncing the jb')"
        echo "RTP switches:    $(grep "$(date +'%b %d')" /var/log/asterisk/messages 2>/dev/null | grep -c 'Strict RTP')"
        ;;

    help|*)
        cat <<'EOF'
VoIP Diagnostic Tool
====================
Usage: voip-diag <command> [args]

Commands:
  sip              Show SIP peers status summary
  calls            Show active calls
  agents           Show logged-in agents
  rtp              Show live RTP statistics
  trunk NAME       Check a specific trunk
  agent EXT        Check a specific agent extension
  call NUMBER      Look up a phone number in call logs
  health           System health overview
  help             Show this help

Examples:
  voip-diag sip
  voip-diag trunk my_provider
  voip-diag agent 100
  voip-diag call 441234567890
EOF
        ;;
esac

Install it:

# Save the script
sudo cp voip-diag /usr/local/bin/voip-diag
sudo chmod +x /usr/local/bin/voip-diag

# Update the USER, PASSWORD, and DATABASE variables in the script
# to match your ViciDial MySQL credentials

11. Appendix: Configuration File Locations

Asterisk Configuration Files

File Purpose When to check
/etc/asterisk/sip.conf Global SIP settings NAT, codecs, qualify defaults
/etc/asterisk/sip-vicidial.conf SIP peers and trunks (ViciDial-managed) Agent/trunk registration, NAT settings
/etc/asterisk/extensions.conf Main dialplan Call routing logic
/etc/asterisk/extensions-vicidial.conf ViciDial dialplan additions Carrier routing, outbound call flow
/etc/asterisk/rtp.conf RTP port range, strict RTP, DTLS One-way audio, media issues
/etc/asterisk/iax.conf IAX2 peer config and jitter buffer IAX trunk issues, jitter buffer resyncs
/etc/asterisk/logger.conf Log file configuration If logs are missing or too verbose
/etc/asterisk/modules.conf Module loading If a feature is not available

ViciDial Key Files

File Purpose
/etc/astguiclient.conf ViciDial server settings (server_ip, DB credentials)
/srv/www/htdocs/agc/ or /var/www/html/agc/ Agent interface web files
/srv/www/htdocs/vicidial/ or /var/www/html/vicidial/ Admin interface web files
/var/spool/asterisk/monitor/ Active recording directory
/var/spool/asterisk/monitorDONE/ Completed recordings (organized by date)

Log Files

File Contains Rotation
/var/log/asterisk/messages Asterisk notices, warnings, errors Check logrotate config
/var/log/asterisk/queue_log Queue events (join, leave, abandon) Grows indefinitely if not rotated
/var/log/astguiclient/*.log ViciDial process logs Daily rotation by ViciDial

Key Database Tables

Table Purpose Key fields
vicidial_closer_log Inbound call records phone_number, status, term_reason, uniqueid, user
vicidial_log Outbound call records phone_number, status, term_reason, uniqueid, user
vicidial_carrier_log Carrier-level call details dialstatus, hangup_cause, sip_hangup_cause, channel
vicidial_agent_log Agent state changes user, event, event_time, pause_sec, wait_sec, talk_sec
vicidial_live_agents Current agent state (real-time) user, status, last_update_time, campaign_id
vicidial_auto_calls Currently active calls (real-time) phone_number, status, campaign_id
recording_log Recording file locations filename, location, start_time, lead_id
vicidial_inbound_dids DID routing configuration did_pattern, did_route, group_id
vicidial_inbound_groups Inbound group settings group_id, group_name, no_agent_action
system_settings Global ViciDial configuration Various server-wide settings

Final Notes

When to escalate

Not every problem can be solved from the command line. Escalate when:

Building institutional knowledge

Every incident you diagnose is a learning opportunity. Maintain a simple incident log:

Date: 2026-03-13
Symptom: Agent reports one-way audio on all calls
Root cause: SIP ALG enabled on agent's new router
Fix: Disabled SIP ALG, added nat=force_rport,comedia to peer config
Time to resolve: 45 minutes

After 50 entries, you will have a searchable knowledge base that makes the next 50 diagnoses twice as fast.

Recommended monitoring stack

To catch problems before users report them, deploy:

  1. Prometheus + Grafana for metrics (trunk status, agent counts, call volume)
  2. Loki for centralized log aggregation (search all server logs from one UI)
  3. Homer for SIP capture (full ladder diagrams for any call)
  4. Smokeping for continuous latency monitoring to all SIP providers
  5. Custom Asterisk exporter for Asterisk-specific metrics

See Tutorial 01 for the complete deployment guide.


This runbook is based on procedures used to troubleshoot a production ViciDial call center fleet handling thousands of daily calls across multiple servers and seven SIP providers. Every command was tested in production. Every decision tree was refined through real incidents.

Need expert help with your setup?

VoIP infrastructure consulting, AI voice agent integration, monitoring stacks, scaling — I've done it all in production.

Get a Free Consultation