01

Type a Slash Command

In Claude Code, type /health or /call-investigate 4412345678 — the skill activates with your arguments.

02

Claude Runs the Playbook

The skill contains investigation steps, SQL queries, SSH commands, and decision trees. Claude executes them across your servers.

03

Get a Structured Report

Results come back as a clear summary with flags for CRITICAL, WARNING, and OK — plus root cause analysis and recommended fixes.

/health /agents /calls /trunk-status /call-drops 🔍 /call-investigate /audio-quality /listen-recording /network-check /lagged /replication /agent-ranks 📞 /did-lookup 📊 /reports 🛡 /audit-server

Live Monitoring

/health

Quick health check across all ViciDial production servers. Shows Asterisk, MySQL, disk, uptime, fail2ban, replication.

Tools: Bash(ssh *)

Server Health Check

Run a quick health check across all production ViciDial servers. Use SSH config names (svr-a, svr-c, svr-d, svr-b, svr-e).

If $ARGUMENTS is provided, check only those servers. Otherwise check all production servers.

For each server, run ONE ssh command that gathers:

  1. hostname and uptime
  2. asterisk -rx "core show channels" | tail -1 (active calls)
  3. asterisk -rx "sip show peers" | tail -1 (SIP peers)
  4. mysqladmin status 2>/dev/null | head -1 (MySQL uptime/threads/queries)
  5. df -h / | tail -1 (disk usage)
  6. fail2ban-client status 2>/dev/null | head -2 (fail2ban)

Combine all into a single SSH command per server to minimize round-trips.

Present results in a clean table format. Flag any issues:

  • Disk > 80% = WARNING
  • No active Asterisk channels when agents should be online = WARNING
  • fail2ban not running = CRITICAL
  • MySQL not responding = CRITICAL

Also check replication on the svr-replica server (ssh svr-replica):

  • mysql -u report_user -pReportPass -e "SHOW ALL SLAVES STATUS\G" | grep -E "Connection_name|Slave_IO|Slave_SQL|Seconds_Behind"

Server reference:

  • svr-a (10.0.1.10) — UK primary, Asterisk 18, openSUSE
  • svr-c (10.0.1.20) — Romania, Asterisk 16, openSUSE
  • svr-d (10.0.1.30) — France, Asterisk 13, openSUSE
  • svr-b (10.0.1.40) — Italy, Asterisk 11, CentOS 7 (use asterisk -rx not rasterisk)
  • svr-e (10.0.1.50) — UK standalone, Asterisk 20, openSUSE

/agents

Show all logged-in ViciDial agents across all servers with status, pause codes, calls today.

Tools: Bash(ssh *)

Agent Status Overview

Show all logged-in ViciDial agents across production servers.

If $ARGUMENTS specifies a server (svr-a, svr-c, svr-d, svr-b, svr-e), check only that one. Otherwise check all.

For each server, query via SSH:

SELECT
  vla.user,
  vu.full_name,
  vla.status,
  vla.campaign_id,
  vla.closer_campaigns,
  vla.calls_today,
  vla.pause_code,
  TIMESTAMPDIFF(SECOND, vla.last_state_change, NOW()) as secs_in_state
FROM vicidial_live_agents vla
LEFT JOIN vicidial_users vu ON vla.user = vu.user
ORDER BY vla.status, vla.user;

MySQL access:

  • svr-a/svr-c/svr-d/svr-e: mysql asterisk (root, no pw via SSH)
  • svr-b: mysql -u dbuser -pSecurePass asterisk

Present as a table grouped by server, showing:

  • Agent ID, Name, Status, Campaign, Pause Code (if paused), Time in State, Calls Today

Summary line per server: X agents logged in, Y on calls, Z paused, W waiting

Flag:

  • Agents paused > 30 min = highlight
  • LAGGED agents = CRITICAL
  • Agents with 0 calls after 2+ hours = note

/calls

Show live calls across all ViciDial servers. Shows active channels, inbound queue, agents on calls.

Tools: Bash(ssh *)

Live Calls Overview

Show live call status across all production ViciDial servers. Use SSH config names.

If $ARGUMENTS is provided (e.g., "svr-a" or "svr-b"), check only those servers. Otherwise check all.

For each server, run:

  1. asterisk -rx "core show channels verbose" — active channels
  2. Query vicidial_live_agents for agents currently on calls:
    mysql -e "SELECT user, status, campaign_id, closer_campaigns, calls_today FROM vicidial_live_agents WHERE status IN ('INCALL','QUEUE','CLOSER') ORDER BY status;" asterisk
    
  3. Query inbound queue:
    mysql -e "SELECT campaign_id, COUNT(*) as waiting FROM vicidial_auto_calls WHERE status='LIVE' GROUP BY campaign_id;" asterisk
    

Present a summary:

  • Total active channels per server
  • Agents on calls (by status)
  • Calls waiting in queue
  • Any DROP or problem statuses

Server MySQL credentials:

  • svr-a/svr-c/svr-d/svr-e: mysql asterisk (as root, no password needed from localhost via SSH)
  • svr-b: mysql -u dbuser -pSecurePass asterisk

/trunk-status

Check SIP trunk status across all ViciDial servers. Shows registration state, latency, active calls per trunk. Use when calls fail to connect, trunks go UNREACHABLE, or provider issues suspected.

Tools: Bash(ssh *)

SIP Trunk Status Check

Check all SIP trunks across production servers. $ARGUMENTS: server name or "all".

Per Server Check

# Show all SIP peers with status
ssh SERVER "asterisk -rx 'sip show peers'"

# Show only trunks (filter out agent extensions)
ssh SERVER "asterisk -rx 'sip show peers' | grep -E 'TrunkVendorB|TrunkVendorC|TrunkVendorA|itaserver|UNREACHABLE'"

# Detailed info for a specific trunk
ssh SERVER "asterisk -rx 'sip show peer TRUNKNAME'"

Trunk Inventory by Server

Server-A (10.0.1.10)

Trunk Provider IP Purpose
mutitel_de 203.0.113.10 Primary inbound
mutitel_nl 203.0.113.11 Outbound
mutitel_uk 203.0.113.12 UK
mutitel_fr 203.0.113.11 France
mutitel_au 203.0.113.13 Australia
TrunkVendorB 203.0.113.14 General
TrunkVendorC 203.0.113.20 UK inbound
trunk_inbound_c1 203.0.113.21 UK inbound

Italy / Server-B (10.0.1.40)

Trunk Provider IP Purpose
TrunkVendorA (various DIDs) sip.trunkvendora.example (198.51.100.x) Italian inbound, alaw only
TrunkVendorB various Italian outbound

Server-E (10.0.1.50)

Trunk Provider IP Purpose
itaserver 10.0.1.40 Inter-server to Italy
TrunkVendorB various UK calls
TrunkVendorC various UK calls

Server-C (10.0.1.20) / Server-D (10.0.1.30)

  • TrunkVendorB + TrunkVendorC trunks (check via SSH)

Quick All-Server Trunk Check

for srv in svr-a svr-c svr-d svr-b svr-e; do
  echo "=== $srv ==="
  ssh $srv "asterisk -rx 'sip show peers' | grep -cE 'OK|UNREACHABLE|UNKNOWN'" 2>/dev/null
  ssh $srv "asterisk -rx 'sip show peers' | grep -E 'UNREACHABLE|UNKNOWN'" 2>/dev/null
  echo ""
done

Troubleshooting UNREACHABLE Trunks

  1. Ping the provider IP: ssh SERVER "ping -c 3 PROVIDER_IP"
  2. Check firewall: ssh SERVER "iptables -S INPUT | grep PROVIDER_IP" (must be whitelisted, final rule is DROP)
  3. Check SIP registration: ssh SERVER "asterisk -rx 'sip show registry'"
  4. Check if provider changed IP: ssh SERVER "dig SIP_HOSTNAME" (e.g., dig sip.trunkvendora.example)
  5. Test SIP OPTIONS: ssh SERVER "asterisk -rx 'sip qualify peer TRUNKNAME'"
  6. Check carrier log for failures:
    SELECT call_date, dialstatus, hangup_cause, sip_hangup_cause
    FROM vicidial_carrier_log
    WHERE channel LIKE '%TRUNKNAME%'
    ORDER BY call_date DESC LIMIT 10;
    

Investigation & Troubleshooting

/call-drops

Investigate call drops, failed calls, and problem dispositions across servers. Analyzes DROP, DISMX, DCMX, TIMEOT, AFTHRS, XDROP and other problem statuses with carrier-level detail.

Tools: Bash(ssh *), Bash(docker *)

Call Drop Investigation

Investigate dropped calls and problem statuses. $ARGUMENTS: server name, status code, date, or "all" for today's overview.

Step 1: Get Today's Problem Status Overview

-- Inbound problem statuses today
SELECT status, COUNT(*) as cnt,
       AVG(queue_seconds) as avg_queue,
       AVG(length_in_sec) as avg_len
FROM vicidial_closer_log
WHERE call_date >= CURDATE()
  AND status IN ('DROP','DISMX','DCMX','TIMEOT','ADCT','AFTHRS','NANQUE','QVMAIL','HXFER','XDROP','IQNANQ','PDROP','AB','BLOCK')
GROUP BY status ORDER BY cnt DESC;

-- Same but by hour (spot patterns)
SELECT HOUR(call_date) as hr, status, COUNT(*) as cnt
FROM vicidial_closer_log
WHERE call_date >= CURDATE()
  AND status NOT IN ('SALE','NI','A','CALLBK','XFER','DNC')
GROUP BY hr, status ORDER BY hr, cnt DESC;

Step 2: Investigate Specific Status Types

DROP (Call dropped from queue — no agent answered)

-- Check drop details: how long did they wait?
SELECT call_date, phone_number, queue_seconds, campaign_id,
       term_reason, uniqueid
FROM vicidial_closer_log
WHERE status='DROP' AND call_date >= CURDATE()
ORDER BY queue_seconds DESC LIMIT 20;

-- Check ingroup drop settings
SELECT group_id, drop_call_seconds, after_hours_action,
       no_agent_no_queue, no_agents_action
FROM vicidial_inbound_groups WHERE active='Y';

DISMX / DCMX (Disconnect mid-call — abnormal hangup)

-- Find disconnected calls with carrier details
SELECT cl.call_date, cl.phone_number, cl.length_in_sec, cl.user,
       cl.term_reason, cl.uniqueid,
       cr.dialstatus, cr.hangup_cause, cr.sip_hangup_cause, cr.sip_hangup_reason
FROM vicidial_closer_log cl
LEFT JOIN vicidial_carrier_log cr ON cl.uniqueid = cr.uniqueid
WHERE cl.status IN ('DISMX','DCMX') AND cl.call_date >= CURDATE()
ORDER BY cl.call_date DESC LIMIT 20;

TIMEOT (Agent didn't answer in time)

-- Which agents timed out?
SELECT call_date, phone_number, user, queue_seconds, campaign_id
FROM vicidial_closer_log
WHERE status='TIMEOT' AND call_date >= CURDATE()
ORDER BY call_date DESC LIMIT 20;

AFTHRS (After hours — no agents available)

-- Check after-hours config
SELECT group_id, afterhours_action, afterhours_message_filename,
       afterhours_xfer_group, call_time_id
FROM vicidial_inbound_groups WHERE active='Y';

-- Check call time definitions
SELECT call_time_id, call_time_name, ct_default_start, ct_default_stop
FROM vicidial_call_times;

Step 3: Carrier-Level Analysis

-- Today's hangup cause distribution (across all calls)
SELECT hangup_cause, sip_hangup_cause, COUNT(*) as cnt,
       CASE hangup_cause
         WHEN 16 THEN 'Normal'
         WHEN 17 THEN 'Busy'
         WHEN 18 THEN 'No response'
         WHEN 20 THEN 'Subscriber absent'
         WHEN 21 THEN 'Rejected'
         WHEN 31 THEN 'Normal unspec'
         WHEN 34 THEN 'No circuits (congestion)'
         WHEN 38 THEN 'Network down'
         WHEN 127 THEN 'Internal error'
         ELSE 'Other'
       END as meaning
FROM vicidial_carrier_log
WHERE call_date >= CURDATE()
GROUP BY hangup_cause, sip_hangup_cause
ORDER BY cnt DESC LIMIT 20;

-- Failed dial attempts (never answered)
SELECT call_date, channel, dialstatus, hangup_cause, sip_hangup_reason,
       dial_time, answered_time
FROM vicidial_carrier_log
WHERE call_date >= CURDATE()
  AND dialstatus NOT IN ('ANSWER','')
ORDER BY call_date DESC LIMIT 20;

Step 4: Check Ring Group Fallback

# For after-hours / no-agent calls, check if ring group extensions are working
# Server-A: ring group extensions in customexte.conf
ssh SERVER "grep -A10 'exten => RINGGROUP' /etc/asterisk/extensions.conf"

# Check if ring group phones are registered
ssh SERVER "asterisk -rx 'sip show peers' | grep -E '1031|1032|1033|1034|1035'"

# Italy ring groups: region_a, region_b, region_c, region_d, region_e
ssh svr-b "grep -A20 'region_a\|region_b\|region_c' /etc/asterisk/customexte.conf"

Step 5: Compare with Historical Baseline

-- Daily drop rate for last 7 days (is today worse than usual?)
SELECT DATE(call_date) as day,
       COUNT(*) as total_calls,
       SUM(CASE WHEN status='DROP' THEN 1 ELSE 0 END) as drops,
       ROUND(SUM(CASE WHEN status='DROP' THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) as drop_pct
FROM vicidial_closer_log
WHERE call_date >= DATE_SUB(CURDATE(), INTERVAL 7 DAY)
GROUP BY day ORDER BY day;

ViciDial Reports for Drops

# Inbound stats (shows drops, hold times)
http://SERVER_IP/vicidial/AST_CLOSERstats.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&group[]=--ALL--&shift=ALL&SUBMIT=submit

# Carrier log report
http://SERVER_IP/vicidial/AST_carrier_log_report.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&server_ip[]=--ALL--&SUBMIT=submit

# Hangup cause report
http://SERVER_IP/vicidial/AST_hangup_cause_report.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&server_ip[]=--ALL--&SUBMIT=submit

MySQL Access

  • svr-a/svr-c/svr-d/svr-e: ssh SERVER "mysql asterisk -e '...'"
  • svr-b: ssh svr-b "mysql -u dbuser -pSecurePass asterisk -e '...'"
  • Server-Replica: ssh svr-replica "mysql -u report_user -pReportPass replica_asterisk -e '...'"
🔍

/call-investigate

Deep investigation of specific calls by phone number, uniqueid, or agent ID. Traces full call path from DID through routing to agent, checks carrier logs, SIP traces, recordings, and dispositions. Use for any call complaint or incident.

Tools: Bash(ssh *), Bash(docker *), Bash(curl *), Bash(scp *), Bash(sox *), Bash(soxi *)

Call Investigation

Deep-dive into specific calls. $ARGUMENTS: phone number(s), uniqueid(s), agent ID(s), or date range.

GOLDEN RULES — avoid wasted queries

  1. ALL production servers use mysql -udbuser -pSecurePass asterisk (NOT anonymous mysql asterisk). Wrap in single quotes and escape inner double quotes:

    ssh svr-a 'mysql -udbuser -pSecurePass asterisk -N -B -e "SELECT ... WHERE x=\"y\";"'
    

    Use -N -B (no headers, batch) for clean output and easier parsing.

  2. Decision tree — start HERE, don't shotgun queries:

    • Call length 0 sec OR customer complained "got cut off instantly"? → Skip vicidial_closer_log (call likely never reached queue). Go DIRECTLY to call_log by caller_code, then Asterisk log + Homer.
    • Call answered by agent, has talk time? → Start with vicidial_closer_log / vicidial_log.
    • Unknown? → Check DID routing first (tells you WHICH server to investigate).
  3. Known column gotchas — don't re-learn these:

    • call_log table uses start_time, NOT call_date.
    • vicidial_carrier_log does NOT have dead_sec. Valid cols: call_date, channel, server_ip, dialstatus, hangup_cause, sip_hangup_cause, sip_hangup_reason, dial_time, answered_time.
    • vicidial_agent_log does NOT have full_name. Join to vicidial_users for names.
    • vicidial_users does NOT have extension or status columns. Get extension from phones table: SELECT extension,phone_ip FROM phones WHERE extension=AGENT_ID.
    • custom_dual_calls table does NOT have uniqueid or call_date. Only has: id, phone_number, did, last_call_time.
    • vicidial_did_log does NOT have did_pattern. Valid cols: uniqueid, channel, server_ip, caller_id_number, caller_id_name, extension, call_date, did_id, did_route.
    • vicidial_inbound_groups does NOT have icbq_expiration.
    • Inbound calls that didn't answer have caller_code=<CLI> and number_dialed=<DID> in call_log.
  4. Channel ID reuse is common — Asterisk wraps C-XXXXXXXX IDs. ALWAYS include a time filter when greping messages or you'll pull matches from weeks/months ago:

    ssh svr-a 'grep -E "Apr 14 15:0[0-2].*(CHANNEL_ID|UNIQUEID)" /var/log/asterisk/messages'
    
  5. Timezones: Server-A/VPS = Europe/Rome (CET = UTC+1, CEST = UTC+2). Homer partitions are UTC. Subtract 1 or 2 hours from local time to find the right partition.

  6. If a query returns 0 rows, don't retry with variations — move to the next data source. The asterisk messages log + Homer will tell you everything that happened even if SQL tables are empty.

Step 1: Find the Call Records

A. Call_log (raw Asterisk CDR) — use when inbound didn't answer

SELECT start_time, uniqueid, channel, caller_code, number_dialed,
       extension, length_in_sec
FROM call_log
WHERE caller_code LIKE '%LAST_6_DIGITS%'
  AND start_time BETWEEN 'YYYY-MM-DD HH:MM:00' AND 'YYYY-MM-DD HH:MM:59'
ORDER BY start_time;

This finds 0-sec inbounds that vicidial_closer_log misses. Channel tells you which trunk (SIP/mutitel_nl-*, SIP/TrunkVendorC-*, etc.).

B. Inbound queue records (vicidial_closer_log) — only populated when call reached ingroup

SELECT call_date, phone_number, length_in_sec, status, term_reason,
       uniqueid, closecallid, user, campaign_id, queue_seconds
FROM vicidial_closer_log
WHERE phone_number='NUMBER'
  AND call_date >= 'YYYY-MM-DD'
ORDER BY call_date DESC LIMIT 20;

C. Outbound calls (vicidial_log)

SELECT call_date, phone_number, length_in_sec, status, term_reason,
       uniqueid, user, campaign_id
FROM vicidial_log
WHERE phone_number LIKE '%NUMBER%'
  AND call_date >= 'YYYY-MM-DD'
ORDER BY call_date DESC LIMIT 20;

D. By agent (remember: ViciDial agent IDs are 4-digit; numbers like "49" in CRM notes are usually pricing/reference, not agent IDs)

SELECT user, full_name FROM vicidial_users WHERE full_name LIKE '%NAME%' OR user='ID';

Step 2: DID Routing — also confirms WHICH server

SELECT did_id, did_pattern, did_description, did_route, did_agent_a,
       extension, exten_context, group_id
FROM vicidial_inbound_dids
WHERE did_pattern LIKE '%DID%';

Run across all candidate servers (for s in svr-a svr-c svr-d svr-e svr-b; do ...) — the server that returns a row is where the call landed.

On Server-A: all inbound DIDs → extension 999000111custom_router.agi → ingroup.

Step 3: Carrier Log (SIP-level, only if call was answered)

SELECT call_date, channel, server_ip, dialstatus, hangup_cause,
       sip_hangup_cause, sip_hangup_reason, dial_time, answered_time
FROM vicidial_carrier_log
WHERE uniqueid='UNIQUEID';

Often empty for 0-sec inbounds — go to Homer/Asterisk log instead.

Hangup causes: 16=Normal, 17=Busy, 18=NoAnswer, 20=SubscriberAbsent, 21=Rejected, 34=NoCircuit, 38=NetworkOOS, 127=Internal. Dialstatuses: ANSWER, BUSY, NOANSWER, CANCEL, CHANUNAVAIL, CONGESTION.

Step 4: Asterisk Log Trace (authoritative for inbound flow)

# Always time-filter to avoid channel-ID reuse from months ago:
ssh svr-a 'grep -E "Apr 14 15:0[0-2].*(CHANNEL_ID|mutitel_nl-XXXXX)" /var/log/asterisk/messages'

# Broad search by phone — use count first:
ssh svr-a 'grep -c "PHONE" /var/log/asterisk/messages'
# Then pull with head -100 if count is reasonable.

Normal inbound flow (Server-A):

  1. process@trunkinbound — CALLER/CALLED set, block checks
  2. trunkinbound-route:Ringing() + Wait(N) — early-media ringback
  3. agi-DID_route.agi — MixMonitor starts recording
  4. 999000111@default — custom_router.agi decides ingroup
  5. agi-VDAD_ALL_inbound.agi — inserts into queue (park extension)
  6. INGROUP@default — ring to agents

Red flags:

  • AGI Script ... completed, returning 4 on VDAD_ALL_inbound = caller hung up DURING AGI (premature hangup, not server issue)
  • func_hangupcause.c: Unable to find information for channel = abnormal hangup (often caller-initiated)
  • Spawn extension ... exited non-zero = call ended before dialplan finished
  • Strict RTP switching / learning after remote address set = NAT/media IP issue
  • Unable to create channel of type 'sip' (cause 20 - Subscriber absent) = ring group member offline

Step 5: Homer SIP Trace (definitive for hangup direction)

Partition layout: hep_proto_1_call_YYYYMMDD_HHMM — 2-hour UTC buckets. Convert local→UTC first. Other partitions: hep_proto_1_registration_* (REGISTERs), hep_proto_5_* (RTCP).

docker exec homer-db psql -U homer -d homer_data -c "
SELECT tablename FROM pg_tables WHERE tablename LIKE 'hep_proto_1_call_20260414%' ORDER BY tablename;"

# Find the Call-ID
docker exec homer-db psql -U homer -d homer_data -c "
SELECT create_date, data_header->>'callid' as callid,
       protocol_header->>'srcIp' as src, protocol_header->>'dstIp' as dst,
       substring(raw from 1 for 80) as first_line
FROM hep_proto_1_call_20260414_1200
WHERE (raw LIKE '%PHONE_NUMBER%' OR raw LIKE '%DID%')
  AND create_date BETWEEN 'YYYY-MM-DD 12:59:00' AND 'YYYY-MM-DD 13:02:00'
ORDER BY create_date LIMIT 40;"

# Full SIP dialog for a specific Call-ID
docker exec homer-db psql -U homer -d homer_data -c "
SELECT create_date, raw FROM hep_proto_1_call_20260414_1200
WHERE data_header->>'callid'='CALLID' ORDER BY create_date;"

Premature CANCEL detection — the Reason: header (RFC 3326)

When a CANCEL arrives, check its Reason: header:

  • Reason: SIP;cause=487;text="ORIGINATOR_CANCEL" → Caller hung up. NOT a carrier issue.
  • Reason: Q.850;cause=X or no Reason → Possible carrier-initiated.
  • No Reason header + gap < 2 sec → Suspicious, likely carrier.

Baseline check: pull the gap distribution for that trunk/day to see if your call is an outlier:

WITH u AS (
  SELECT data_header->>'callid' as cid,
         MIN(CASE WHEN raw LIKE 'INVITE%' THEN create_date END) as invite_t,
         MIN(CASE WHEN raw LIKE 'CANCEL%' THEN create_date END) as cancel_t
  FROM hep_proto_1_call_YYYYMMDD_HHMM
  WHERE protocol_header->>'srcIp'='TRUNK_IP'
  GROUP BY 1
)
SELECT cid, invite_t, EXTRACT(EPOCH FROM (cancel_t - invite_t))::numeric(6,2) as gap_sec
FROM u WHERE cancel_t IS NOT NULL ORDER BY gap_sec LIMIT 20;

Normal human hangup gap: 12-100+ sec. Suspicious (sub-second): likely carrier/network. 3-10 sec: inconclusive — trust the Reason: header.

Step 6: Recording — check audio actually exists

Path convention: /var/spool/asterisk/monitorDONE/{MP3|ORIG}/YYYYMMDDHHMMSS_DID_CLI-{all.mp3|in.wav|out.wav}

# File listing
ssh SERVER 'ls -la /var/spool/asterisk/monitorDONE/MP3/YYYYMMDDHHMMSS_DID_CLI*'
ssh SERVER 'find /var/spool/asterisk/monitorDONE -name "YYYYMMDDHHMMSS_*"'

# Quick metadata (duration) without downloading
ssh SERVER 'soxi /var/spool/asterisk/monitorDONE/MP3/FILE.mp3'

# Download + audio stats (silence detection)
scp SERVER:/var/spool/asterisk/monitorDONE/ORIG/FILE-in.wav /tmp/
sox /tmp/FILE-in.wav -n stats  # Max level=0 and RMS=-inf means pure silence

Interpreting short/silent recordings:

  • 0-1 sec silent, call never reached 200 OK → expected. MixMonitor starts before answer; no RTP until answer. NOT a bug.
  • Full-duration silent after answer → real audio issue (one-way audio, codec mismatch) — use /audio-quality.
  • Truncated mid-conversation → disconnect mid-call, cross-check with carrier_log.

For DB-registered recordings with full analysis:

curl -s "http://localhost:9090/analyze?uniqueid=UNIQUEID&server=SERVER_KEY" | jq .

Step 7: Agent State at Time of Call

SELECT event_time, user, pause_epoch, wait_epoch, talk_epoch, dispo_epoch,
       status, sub_status, pause_type
FROM vicidial_agent_log
WHERE user='AGENT_ID'
  AND event_time BETWEEN 'YYYY-MM-DD HH:MM:00' AND 'YYYY-MM-DD HH:MM:59'
ORDER BY event_time;

Problem Status Reference

Status Meaning Investigation
DISMX Disconnect mid-call (inbound) carrier_log, network, agent connection
DCMX Disconnect mid-call (outbound) Same as above
DROP Call dropped from queue (timeout) queue timeout, agent availability
TIMEOT Agent didn't answer in time agent alert settings, softphone
ADCT Auto-disconnect campaign dead_max setting
AFTHRS After hours routing ingroup after_hours settings
NANQUE No agent, no queue ingroup no_agent_no_queue setting
QVMAIL Queue voicemail voicemail routing
HXFER Hangup during transfer transfer target availability
XDROP External drop carrier/trunk issue
LAGGED Agent lagged out /lagged skill

MySQL Access (confirmed working)

  • svr-a/svr-c/svr-d/svr-e: ssh SERVER 'mysql -udbuser -pSecurePass asterisk -N -B -e "..."'
  • svr-b (Italy): ssh svr-b 'mysql -u dbuser -pSecurePass asterisk -N -B -e "..."'
  • Server-Replica (read-only, all 3 DBs): ssh svr-replica 'mysql -u report_user -pReportPass replica_asterisk -N -B -e "..."'

Quick 0-sec inbound playbook (proven on YYYY-MM-DD Example Corp call)

  1. call_log by caller_code + time window → get uniqueid + channel + trunk
  2. vicidial_inbound_dids → confirm DID routing/description
  3. grep "Apr DD HH:MM.*CHANNEL_ID" asterisk/messages → full dialplan trace
  4. Homer hep_proto_1_call_YYYYMMDD_HHMM (UTC) by Call-ID → Reason: header proves direction
  5. soxi + sox -n stats on recording → confirms silence is expected (pre-answer) vs real audio issue

DualCall Callback Investigation (proven on YYYY-MM-DD agent 1012/SoftphoneApp case)

When dualcall callbacks disconnect prematurely (answered but BYE within seconds):

Decision: Is it carrier, dialplan, ViciDial, or softphone?

  1. Check AMI log first: grep "sendcron.*hanging up" asterisk/messages at the time — if sendcron hung up the callback channel → ViciDial logic issue
  2. No AMI hangup? → Go to Homer: find who sent BYE (protocol_header->>'srcIp')
  3. BYE from carrier IP → carrier issue, use /trunk-status
  4. BYE from agent phone IP → softphone issue (see below)

Key facts about dualcall callbacks:

  • Callback prefixes: 88 = TrunkVendorC trunk, 79 = TrunkVendorB trunk (in extensions-vicidial.conf)
  • custom_router.agi only handles inbound ROUTING decisions (repeat→dualcall, new→normal). It does NOT control callbacks. Don't waste time investigating it for callback issues.
  • ViciDial EMCB creates TWO simultaneous SIP sessions to the agent's phone:
    1. ConfBridge channel (SIP/AGENT-xxx) — park session, stays alive during callback
    2. Callback channel (SIP/AGENT-yyy) — outbound via TrunkVendorC/TrunkVendorB
  • The softphone must handle concurrent calls. If it can't → premature BYE.
  • Callback channels use simple_bridge (not ConfBridge) — a direct Dial() bridge.

Softphone BYE pattern (confirmed):

200 OK (customer answers) → re-INVITE from agent phone (media renegotiation) → BYE from agent phone (1-7 sec later)

This means the softphone itself is terminating the call. Check:

  1. User-Agent header in BYE: grep 'User-Agent' in Homer raw → identifies softphone (SoftphoneApp, Ooma, etc.)
  2. SDP in re-INVITE: c=IN IP4 10.x.x.x (private IP) with a=X-nat:0 = NAT misconfiguration
  3. SIP peer config: grep -A 15 '\[AGENT_ID\]' /etc/asterisk/sip-vicidial.conf — missing nat=force_rport,comedia causes RTP to be sent to private IP after re-INVITE

Homer trace for callbacks:

# Callback via TrunkVendorC: Server-A→203.0.113.20, agent phone→Server-A
# Callback via TrunkVendorB: Server-A→203.0.113.11, agent phone→Server-A
docker exec homer-db psql -U homer -d homer_data -c "
SELECT create_date, protocol_header->>'srcIp' as src,
       substring(raw from 1 for 80) as first_line
FROM hep_proto_1_call_YYYYMMDD_HHMM
WHERE raw LIKE '%PHONE_NUMBER%'
  AND create_date BETWEEN 'YYYY-MM-DD HH:MM:00' AND 'YYYY-MM-DD HH:MM:59'
ORDER BY create_date LIMIT 40;"

# Get BYE with User-Agent header:
docker exec homer-db psql -U homer -d homer_data -c "
SELECT create_date, protocol_header->>'srcIp' as src, raw
FROM hep_proto_1_call_YYYYMMDD_HHMM
WHERE data_header->>'callid'='CALLID' AND raw LIKE 'BYE%'
ORDER BY create_date LIMIT 2;"

Agent phone info:

-- Get agent SIP extension + phone IP
SELECT extension, phone_ip, protocol FROM phones WHERE extension=AGENT_ID;
-- Get SIP peer config (check for nat= setting)
ssh svr-a "grep -A 15 '\[AGENT_ID\]' /etc/asterisk/sip-vicidial.conf"

Cross-reference to other skills

  • Audio quality complaints (choppy, silent, one-way) → use /audio-quality instead
  • Agent LAGGED disconnects → use /lagged instead
  • Drop patterns across many calls → use /call-drops for bulk analysis
  • Network/jitter/packet loss → use /network-check
  • Agent not receiving calls / routing issues → use /agent-ranks
  • Trunk down / carrier issues → use /trunk-status
  • DID routing questions → use /did-lookup
  • Call distribution fairness → use /call-fairness

/audio-quality

Investigate audio quality issues for specific calls or agents. Uses Homer RTCP, audio analysis service, Asterisk logs, recording playback, codec checks. Use when agents or clients complain about voice quality, one-way audio, choppy audio, echo, or silence.

Tools: Bash(ssh *), Bash(docker *), Bash(curl *), Bash(ping *)

Audio Quality Investigation

Investigate voice quality issues using ALL available tools. $ARGUMENTS can be: phone number(s), agent ID(s), or "all" for a general sweep.

Available Tools on VPS (10.0.2.100 — this machine)

1. Homer RTCP Analysis (PostgreSQL — Docker container postgres)

Query RTCP data from Homer to check packet loss and jitter between endpoints.

# Connect to Homer DB
docker exec -i postgres psql -U homer -d homer_data

# Find RTCP table names (6-hour partitions, UTC time)
docker exec -i postgres psql -U homer -d homer_data -c "\dt hep_proto_5_default_*" | tail -20

# Query RTCP from a specific source IP (e.g., TrunkVendorA 198.51.100.x, agent IP, trunk IP)
docker exec -i postgres psql -U homer -d homer_data -c "
SELECT
  create_date,
  protocol_header->>'srcIp' as src,
  protocol_header->>'dstIp' as dst,
  (raw::jsonb->'sender_information'->>'packets')::bigint as pkts,
  (raw::jsonb->'report_blocks'->0->>'fraction_lost')::bigint as frac_lost,
  (raw::jsonb->'report_blocks'->0->>'ia_jitter')::bigint as jitter,
  (raw::jsonb->'report_blocks'->0->>'packets_lost')::bigint as lost
FROM hep_proto_5_default_YYYYMMDD_HHMM
WHERE protocol_header->>'srcIp' LIKE 'IP_PATTERN%'
  AND create_date > NOW() - INTERVAL '2 hours'
ORDER BY create_date DESC LIMIT 50;
"

CRITICAL: Table partitions are by UTC time. VPS timezone is CET (UTC+1). If it's 14:00 CET, that's 13:00 UTC → use table *_1200 (covers 12:00-18:00 UTC).

  • fraction_lost: 0-255 scale (0=perfect, 255=100% loss). >5 is bad.
  • ia_jitter: In timestamp units. Divide by 8 for milliseconds. >50ms is bad.
  • packets_lost values of 16777215 (2^24-1) are overflow, treat as 0.

2. Audio Analysis Service (FastAPI on localhost:9090)

Analyzes recordings with NISQA neural quality model + Silero VAD.

# Analyze a specific recording (by ViciDial uniqueid)
curl -s "http://localhost:9090/analyze?uniqueid=UNIQUEID&server=SERVERNAME" | jq .

# AI-powered analysis (uses Claude Haiku)
curl -s "http://localhost:9090/ai-analyze?uniqueid=UNIQUEID&server=SERVERNAME" | jq .

Server names: "uk" (Server-A), "romania" (Server-C), "france" (Server-D), "italy" (Echo — BUT Italy is actually Server-B 10.0.1.40, this mapping is broken, see workaround below)

Italy workaround: Download recording directly and analyze locally:

# Download from Italy (VPS IP 10.0.2.100 is whitelisted)
curl -s -L -o /tmp/recording.mp3 "http://10.0.1.40/RECORDINGS/MP3/FILENAME.mp3"
# Then use the /analyze endpoint with a local file path

ORIG recordings (separate in/out legs, better for analysis):

  • Path: /RECORDINGS/ORIG/{filename}-in.wav (caller) and {filename}-out.wav (agent)
  • Retention: ~20 days on Server-A, 7 days on others
  • Italy ORIG needs VPS IP whitelisted in httpd.conf (done YYYY-MM-DD)

3. Asterisk Logs (on production servers via SSH)

# Check for codec issues
ssh SERVER "grep 'Unknown RTP codec' /var/log/asterisk/messages | tail -20"

# Check for RTP source switching (NAT issues)
ssh SERVER "grep 'Strict RTP' /var/log/asterisk/messages | tail -20"

# Check for jitter buffer resyncs (IAX2 issues)
ssh SERVER "grep 'Resyncing the jb' /var/log/asterisk/messages | tail -20"

# Check specific call by uniqueid
ssh SERVER "grep 'UNIQUEID\|CALLERID' /var/log/asterisk/messages | tail -50"

4. SIP Peer Quality (live agent quality)

# Check agent SIP registration quality
ssh SERVER "asterisk -rx 'sip show peer AGENT_EXT'"
# Look for: Status (latency), Useragent (softphone version), codecs

# Live RTP stats for all active channels
ssh SERVER "asterisk -rx 'sip show channelstats'"
# Shows: Recv/Sent packets, Lost packets, Jitter, RTT per channel

5. Codec Verification

# Check what codecs an agent negotiated
ssh SERVER "asterisk -rx 'core show channel SIP/AGENT-CHANNELID'"
# Look for: NativeFormats, ReadFormat, WriteFormat
# If Read≠Write, there's transcoding (quality loss)

# Check trunk codec config
ssh SERVER "grep -A5 'TRUNK_NAME' /etc/asterisk/sip-vicidial.conf"

# Check global codec order
ssh SERVER "grep -E 'allow|disallow' /etc/asterisk/sip.conf | head -10"

6. Network Quality (Smokeping + Ping)

# Smokeping graphs available at http://10.0.2.100:8081/smokeping/
# Direct ping test
ping -c 10 TARGET_IP

# Check UDP buffer overflows (on production server)
ssh SERVER "cat /proc/net/snmp | grep Udp"
# RcvbufErrors > 0 = packets dropped due to small UDP buffers

Investigation Workflow

  1. Find the calls: Query vicidial_closer_log or vicidial_log by phone number
  2. Identify endpoints: Get agent ID → SIP peer → agent IP. Get trunk → trunk IP
  3. Check Homer RTCP: Query for both directions (trunk→server, server→agent, agent→server, server→trunk)
  4. Check Asterisk logs: Look for codec errors, RTP switching, jitter resyncs
  5. Check live SIP quality: sip show peer, sip show channelstats
  6. Listen to recording: Download and analyze via audio analysis service
  7. Check network: Smokeping, ping, UDP buffers

Common Root Causes (from past investigations)

  • Codec mismatch: Agent softphone doesn't offer alaw → ulaw↔alaw transcoding through MeetMe/ConfBridge = quality loss
  • TrunkVendorA packet loss: Transient, check Homer RTCP from 198.51.100.x IPs
  • Old legacy softphone: Missing alaw, sends unknown codec 126, high latency
  • RTP Keepalive disabled: NAT binding timeout → intermittent one-way audio (Italy svr-b has rtpkeepalive=0)
  • UDP buffer overflow: Default rmem_default=212992 too small for busy servers
  • MeetMe (Italy): Always transcodes to slin internally, adds overhead vs ConfBridge

Server-Specific Notes

  • Server-A/Server-C/Server-D/Server-E: ConfBridge, newer Asterisk, ORIG recordings available
  • Italy (Server-B): MeetMe, Asterisk 11, CentOS 7, TrunkVendorA trunk (alaw only), MySQL user: dbuser/SecurePass
  • Italy agents: All behind 192.168.100.1, legacy softphone (old), high latency

/listen-recording

Download and analyze ViciDial call recordings. Supports both MIX (combined) and ORIG (separate in/out legs) formats. Uses NISQA neural quality scoring, Silero VAD silence detection, and optional AI analysis. Use when you need to check actual audio quality of a specific call.

Tools: Bash(ssh *), Bash(curl *), Bash(sox *), Bash(soxi *), Bash(ffprobe *)

Listen to / Analyze Call Recordings

Download and analyze recordings. $ARGUMENTS: uniqueid, phone number, agent ID, or direct filename.

Step 1: Find Recording

-- By uniqueid
SELECT recording_id, filename, location, length_in_sec, start_time, user, lead_id
FROM recording_log
WHERE vicidial_id='UNIQUEID'
ORDER BY start_time DESC LIMIT 5;

-- By phone number (via closer_log → lead_id → recording_log)
SELECT r.filename, r.location, r.length_in_sec, r.start_time, r.user
FROM recording_log r
JOIN vicidial_closer_log cl ON r.lead_id = cl.lead_id
WHERE cl.phone_number LIKE '%NUMBER%'
  AND cl.call_date >= CURDATE()
ORDER BY r.start_time DESC LIMIT 5;

-- By agent today
SELECT filename, location, length_in_sec, start_time
FROM recording_log WHERE user='AGENT_ID'
  AND start_time >= CURDATE()
ORDER BY start_time DESC LIMIT 10;

Step 2: Download Recording

MIX recording (MP3, combined both sides)

# Server-A/Server-C/Server-D/Server-E (use server IP from recording_log.location)
curl -s -o /tmp/call.mp3 "http://SERVER_IP/RECORDINGS/MP3/FILENAME-all.mp3"

# Italy (Server-B) — VPS IP 10.0.2.100 is whitelisted
curl -s -L -o /tmp/call.mp3 "http://10.0.1.40/RECORDINGS/MP3/FILENAME-all.mp3"

ORIG recordings (WAV, separate caller/agent — BETTER for analysis)

# -in.wav = caller audio, -out.wav = agent audio
curl -s -o /tmp/call-in.wav "http://SERVER_IP/RECORDINGS/ORIG/FILENAME-in.wav"
curl -s -o /tmp/call-out.wav "http://SERVER_IP/RECORDINGS/ORIG/FILENAME-out.wav"

# ORIG retention: ~20 days on Server-A, 7 days on others
# Italy ORIG: accessible from VPS (IP whitelisted YYYY-MM-DD)

Step 3: Quick Audio Info

# File info with soxi (for WAV)
soxi /tmp/call-in.wav /tmp/call-out.wav

# Or ffprobe (for MP3)
ffprobe -i /tmp/call.mp3 -show_format -show_streams 2>&1 | grep -E "duration|bit_rate|sample_rate|channels"

# Check if file has actual audio (not just headers)
ls -la /tmp/call*.wav /tmp/call*.mp3
# Files < 1KB = empty/header only (recording failed)

Step 4: Analyze with Audio Analysis Service

# Full NISQA quality analysis (neural network scoring)
curl -s "http://localhost:9090/analyze?uniqueid=UNIQUEID&server=SERVER_KEY" | python3 -m json.tool

# AI-powered analysis with Claude Haiku (interprets NISQA + VAD results)
curl -s "http://localhost:9090/ai-analyze?uniqueid=UNIQUEID&server=SERVER_KEY" | python3 -m json.tool

Server keys: "uk" (Server-A), "romania" (Server-C), "france" (Server-D) Note: "italy" maps to Echo (10.0.1.60) which is IDLE. For Italy, download manually and analyze locally.

Step 5: Manual Audio Analysis with SoX

# Get volume statistics (RMS levels, peak, silence detection)
sox /tmp/call-in.wav -n stat 2>&1
sox /tmp/call-out.wav -n stat 2>&1

# Check for silence segments (>0.5s of silence below -40dB)
sox /tmp/call-in.wav /tmp/trimmed.wav silence 1 0.5 -40d

# Get audio spectrogram (visual check)
sox /tmp/call.mp3 -n spectrogram -o /tmp/spectrogram.png

# Split stereo into channels (if MIX recording)
sox /tmp/call.mp3 /tmp/left.wav remix 1
sox /tmp/call.mp3 /tmp/right.wav remix 2

Step 6: Interpret Results

NISQA Scores (1-5 scale)

Score Quality
4.0+ Excellent
3.5-4.0 Good
3.0-3.5 Fair
2.5-3.0 Poor
< 2.5 Bad

What to look for

  • -out.wav tiny (44 bytes): Agent audio not recorded = likely one-way audio issue
  • -in.wav has audio but -out.wav doesn't: Agent's microphone not working or not reaching server
  • Both have audio but MIX sounds bad: Transcoding issue (check codecs)
  • Silence segments > 3s: Possible packet loss or hold without MOH
  • RMS level < -35dB: Very quiet audio (volume issue)
  • RMS level > -10dB: Clipping/distortion

Cleanup

rm -f /tmp/call*.mp3 /tmp/call*.wav /tmp/trimmed.wav /tmp/spectrogram.png

/network-check

Check network quality to/from ViciDial servers using Homer RTCP, Smokeping, ping, and UDP buffer stats. Use when suspecting network issues, packet loss, high latency, or jitter affecting call quality.

Tools: Bash(ssh *), Bash(docker *), Bash(ping *), Bash(curl *)

Network Quality Check

Check network quality for VoIP infrastructure. $ARGUMENTS: server name, IP address, "trunks", "agents", or "all".

1. Homer RTCP Analysis (Best Source for RTP Quality)

Homer PostgreSQL on VPS has RTCP data from all servers sending via sip-capture-agent.

# Find today's RTCP table partitions (UTC-based, 6-hour windows)
docker exec -i postgres psql -U homer -d homer_data -c "\dt hep_proto_5_default_$(date -u +%Y%m%d)*"

# Check packet loss FROM a specific IP (e.g., trunk provider)
docker exec -i postgres psql -U homer -d homer_data -c "
SELECT
  date_trunc('minute', create_date) as minute,
  protocol_header->>'srcIp' as src,
  COUNT(*) as reports,
  ROUND(AVG((raw::jsonb->'report_blocks'->0->>'fraction_lost')::numeric), 1) as avg_frac_lost,
  ROUND(AVG((raw::jsonb->'report_blocks'->0->>'ia_jitter')::numeric / 8), 1) as avg_jitter_ms
FROM hep_proto_5_default_TABLE_NAME
WHERE protocol_header->>'srcIp' LIKE 'IP_PATTERN%'
  AND create_date > NOW() - INTERVAL '2 hours'
GROUP BY minute, src
ORDER BY minute DESC;
"

# Check packet loss FROM agents (agent office IPs)
# Known agent IPs: 192.168.100.1 (Italy office), 192.168.100.2 (UK office), etc.

# Check ALL sources with loss > 0 in last hour
docker exec -i postgres psql -U homer -d homer_data -c "
SELECT
  protocol_header->>'srcIp' as src,
  protocol_header->>'dstIp' as dst,
  COUNT(*) as reports,
  ROUND(AVG((raw::jsonb->'report_blocks'->0->>'fraction_lost')::numeric), 1) as avg_loss,
  MAX((raw::jsonb->'report_blocks'->0->>'fraction_lost')::bigint) as max_loss,
  ROUND(AVG((raw::jsonb->'report_blocks'->0->>'ia_jitter')::numeric / 8), 1) as avg_jitter_ms
FROM hep_proto_5_default_TABLE_NAME
WHERE create_date > NOW() - INTERVAL '1 hour'
  AND (raw::jsonb->'report_blocks'->0->>'fraction_lost')::bigint > 0
GROUP BY src, dst
ORDER BY avg_loss DESC;
"

Interpreting RTCP:

  • fraction_lost: 0-255 scale. 0=perfect. >5 = noticeable. >25 = bad. >50 = unusable.
  • ia_jitter: Timestamp units. ÷8 = milliseconds. <20ms good. >50ms bad. >100ms = call breaking.
  • Massive jitter spikes (>1000 ts units) = network dropout (agent internet went down briefly)

Known trunk IPs to check:

  • TrunkVendorA: 198.51.100.x (Italy)
  • TrunkVendorB: 203.0.113.10, 203.0.113.11, 203.0.113.12, 203.0.113.14
  • TrunkVendorC: 203.0.113.20, 203.0.113.21

2. Smokeping (VPS — http://10.0.2.100:8081)

Visual latency graphs. Check programmatically:

# Check Smokeping RRD data for a target
docker exec net-monitor ls /var/lib/smokeping/ 2>/dev/null

# Direct ping test (faster than Smokeping)
ping -c 20 -i 0.5 TARGET_IP

3. Direct Ping Tests

# Ping all production servers from VPS
for srv in 10.0.1.10 10.0.1.20 10.0.1.30 10.0.1.40 10.0.1.50; do
  echo -n "$srv: "
  ping -c 5 -W 2 $srv 2>/dev/null | tail -1
done

# Ping trunk providers
for trunk in 203.0.113.10 203.0.113.11 203.0.113.20 198.51.100.13; do
  echo -n "$trunk: "
  ping -c 5 -W 2 $trunk 2>/dev/null | tail -1
done

# Ping from a production server to agent IP
ssh SERVER "ping -c 10 AGENT_IP"

4. UDP Buffer Stats (On Production Servers)

# Check for UDP receive buffer overflows
ssh SERVER "cat /proc/net/snmp | grep Udp"
# InErrors and RcvbufErrors > 0 = packets dropped

# Check current buffer sizes
ssh SERVER "sysctl net.core.rmem_default net.core.rmem_max"
# Should be at least 2097152 (2MB) for VoIP. Default 212992 is too small.

# Check current socket buffer usage
ssh SERVER "ss -u -a | head -20"

5. SIP Peer Latency (Asterisk qualify)

# Check SIP peer response times
ssh SERVER "asterisk -rx 'sip show peers'"
# OK (Xms) — latency. >150ms = bad for voice quality

# Detailed peer stats
ssh SERVER "asterisk -rx 'sip show peer PEERNAME'"

6. Live RTP Channel Stats

# Show per-channel RTP statistics (packet loss, jitter)
ssh SERVER "asterisk -rx 'sip show channelstats'"
# Columns: Recv/Sent count, Lost packets, %, Jitter, RTT

7. MTR (Traceroute + Ping Combined)

# From production server to trunk/agent (shows hop-by-hop loss)
ssh SERVER "mtr --report -c 20 TARGET_IP"

Red Flags to Report

  • Packet loss > 1% = voice quality degraded
  • Packet loss > 5% = calls likely unusable
  • Jitter > 50ms = choppy audio
  • Latency > 200ms = noticeable delay
  • UDP RcvbufErrors increasing = server dropping packets
  • SIP peer UNREACHABLE = trunk down
  • Massive jitter spike followed by zero = network dropout + recovery

/lagged

Investigate agent LAGGED events — when ViciDial kicks agents due to network issues or heartbeat failures. Shows LAGGED history, correlates with network data, identifies root cause.

Tools: Bash(ssh *), Bash(docker *)

Agent LAGGED Investigation

Investigate LAGGED events (ViciDial kicking agents for heartbeat failure). $ARGUMENTS: server name, agent ID, or date.

What Is LAGGED?

ViciDial's conf_exten_check.php sends periodic heartbeat checks. If an agent's browser fails to respond (network dropout, PC freeze, browser crash), the system marks them LAGGED and forces logout. This appears as sub_status=LAGGED, pause_type=SYSTEM in agent logs.

Step 1: Find LAGGED Events

-- Recent LAGGED events
SELECT event_time, user, campaign_id, pause_sec, wait_sec, talk_sec,
       dead_sec, sub_status, pause_type
FROM vicidial_agent_log
WHERE sub_status='LAGGED'
  AND event_time >= CURDATE()
ORDER BY event_time DESC;

-- LAGGED events for specific agent
SELECT event_time, campaign_id, pause_sec, wait_sec, talk_sec, dead_sec
FROM vicidial_agent_log
WHERE user='AGENT_ID' AND sub_status='LAGGED'
  AND event_time >= DATE_SUB(CURDATE(), INTERVAL 7 DAY)
ORDER BY event_time DESC;

-- Login/logout storm detection (many logouts in short time = network issue)
SELECT user, COUNT(*) as events,
       MIN(event_time) as first, MAX(event_time) as last,
       TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) as span_min
FROM vicidial_agent_log
WHERE sub_status='LAGGED' AND event_time >= CURDATE()
GROUP BY user
HAVING COUNT(*) > 2
ORDER BY events DESC;

Step 2: Check ViciDial LAGGED Report

http://SERVER_IP/vicidial/AST_LAGGED_log_report.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&SUBMIT=submit

Step 3: Correlate with Network Data

# Check Homer RTCP for agent IP at the time of LAGGED event
# Agent IPs: 192.168.100.1 (Italy office), 192.168.100.2 (UK office)
docker exec -i postgres psql -U homer -d homer_data -c "
SELECT create_date,
  (raw::jsonb->'report_blocks'->0->>'fraction_lost')::bigint as frac_lost,
  (raw::jsonb->'report_blocks'->0->>'ia_jitter')::bigint as jitter_ts
FROM hep_proto_5_default_TABLE_NAME
WHERE protocol_header->>'srcIp' = 'AGENT_IP'
  AND create_date BETWEEN 'TIMESTAMP_BEFORE' AND 'TIMESTAMP_AFTER'
ORDER BY create_date;
"

Key indicators:

  • Jitter spike > 8000 ts units (1 second+) = internet dropout
  • Multiple consecutive reports with high loss = sustained network issue
  • Gap in RTCP reports = complete connectivity loss

Step 4: Check Dead Channel Detection

-- Campaign dead_max setting (how long before dead channel triggers)
SELECT campaign_id, dead_max, dead_max_dispo
FROM vicidial_campaigns
WHERE campaign_id IN (SELECT DISTINCT campaign_id FROM vicidial_live_agents);
  • dead_max=0: Disabled (any dead channel stays forever — agents won't get auto-kicked for dead channels, only LAGGED heartbeat)
  • dead_max=10: 10 seconds of dead channel triggers auto-dispo

Step 5: Check Agent Connection Quality

# SIP peer status (latency)
ssh SERVER "asterisk -rx 'sip show peer AGENT_EXT'" | grep -E "Status|Addr|Useragent|qualify"

# If agent's qualify response time is >150ms, they're on a bad connection
# If qualify fails entirely, agent's firewall may be blocking it

Common Causes

  1. Office internet dropout: Agent IP shows jitter spike in Homer RTCP, multiple agents at same IP affected simultaneously
  2. WiFi issues: Single agent affected, intermittent. Recommend wired ethernet.
  3. VPN issues: If agent connects via VPN (10.0.2.200), check VPN tunnel stability
  4. Browser crash/freeze: Only that agent affected, no network correlation. Check browser/OS.
  5. CPU overload: Old PC can't keep up with WebRTC + browser. Check if agent uses heavy apps alongside.
  6. Server-side: If ALL agents LAGGED at once, check server load, MySQL, Apache. Check screen -ls for ViciDial processes.

Routing & Configuration

/replication

Check MariaDB replication status on the svr-replica server. Shows IO/SQL running, lag, errors.

Tools: Bash(ssh *)

Replication Status Check

Check MariaDB multi-source replication on the svr-replica server (10.0.2.101).

Run via SSH:

ssh svr-replica "mysql -u report_user -pReportPass -e 'SHOW ALL SLAVES STATUS\G'"

Parse and present:

  1. For each connection (svr-a, svr-c, svr-d):

    • Slave_IO_Running: Yes/No
    • Slave_SQL_Running: Yes/No
    • Seconds_Behind_Master
    • Last_Error (if any)
    • Last_SQL_Error (if any)
    • Master_Log_File and Read_Master_Log_Pos
  2. Flag any issues:

    • IO or SQL not running = CRITICAL
    • Lag > 60s = WARNING
    • Lag > 300s = CRITICAL
    • Any errors = show full error text
  3. If $ARGUMENTS contains "fix", also show:

    • SHOW SLAVE STATUS FOR 'connection_name'\G for the broken connection
    • Suggest fix commands (STOP SLAVE, SET GLOBAL sql_slave_skip_counter, START SLAVE)

Also check disk space on svr-replica: df -h /

/agent-ranks

Check and diagnose agent rank and call routing problems. Shows who gets calls first, rank mismatches, agents missing from ingroups, uneven call distribution. Use when calls go to wrong agent, agent claims they never get calls, or routing seems unfair.

Tools: Bash(ssh *)

Agent Rank & Routing Investigation

Check agent ranks, routing configuration, and call distribution. $ARGUMENTS can be: server name, agent ID, ingroup name, or "all".

Step 1: Get Current Agent Ranks Per Ingroup

-- On the target server (via SSH + mysql)
-- Shows which agents are assigned to which ingroups and their rank/weight
SELECT group_id, user_id, rank, weight, calls_today,
       (SELECT full_name FROM vicidial_users WHERE user=user_id) as name
FROM vicidial_inbound_group_agents
WHERE group_id IN (SELECT group_id FROM vicidial_inbound_groups WHERE active='Y')
ORDER BY group_id, rank DESC, user_id;

Step 2: Check Ingroup Routing Configuration

-- How does the ingroup route calls?
SELECT group_id, group_name, next_agent_call, active,
       in_group_rank_order, no_delay_call_route, agent_alert_delay,
       hold_recall_xfer_seconds, afterhours_action, afterhours_message_filename,
       no_agent_no_queue, no_agents_action
FROM vicidial_inbound_groups
WHERE active='Y'
ORDER BY group_id;

Routing methods (next_agent_call):

  • inbound_group_rank — Routes by rank in vicidial_inbound_group_agents (highest rank first). THIS IS WHAT ALL SERVERS USE.
  • longest_wait_time — Agent waiting longest gets next call
  • random — Random selection
  • campaign_rank — Uses campaign-level ranks
  • fewest_calls — Agent with fewest calls today

Step 3: Check Who's Actually Logged In and Selected the Ingroup

-- Agents must SELECT an ingroup as a closer group when logging in
-- If they didn't select it, they won't get calls from it
SELECT user, status, closer_campaigns, campaign_id, calls_today,
       (SELECT full_name FROM vicidial_users WHERE user=vla.user) as name
FROM vicidial_live_agents vla
ORDER BY user;

Common problem: Agent is ranked 9 in the ingroup but didn't select it in their closer campaigns → never gets calls from it.

Step 4: Check Call Distribution (Is Routing Working Fairly?)

-- Today's inbound call distribution by agent
SELECT user, COUNT(*) as calls,
       AVG(length_in_sec) as avg_duration,
       SUM(CASE WHEN status IN ('SALE','XFER') THEN 1 ELSE 0 END) as converted
FROM vicidial_closer_log
WHERE call_date >= CURDATE()
  AND campaign_id IN ('normal','dualcall','city_uk')  -- adjust per server
GROUP BY user ORDER BY calls DESC;

-- Hourly breakdown to see if routing changed during the day
SELECT HOUR(call_date) as hr, user, COUNT(*) as calls
FROM vicidial_closer_log
WHERE call_date >= CURDATE()
GROUP BY hr, user ORDER BY hr, calls DESC;

Step 5: Check for Ranking Issues

-- Find agents with different ranks across ingroups (inconsistency)
SELECT a.user_id, a.group_id, a.rank as rank_in_group,
       (SELECT full_name FROM vicidial_users WHERE user=a.user_id) as name
FROM vicidial_inbound_group_agents a
WHERE a.group_id IN (SELECT group_id FROM vicidial_inbound_groups WHERE active='Y')
ORDER BY a.user_id, a.group_id;

-- Find agents NOT assigned to any ingroup (won't get inbound calls)
SELECT user, full_name FROM vicidial_users
WHERE active='Y' AND user_level >= 1
  AND user NOT IN (SELECT DISTINCT user_id FROM vicidial_inbound_group_agents)
ORDER BY user;

-- Find agents with rank but not currently logged in (wasted ranking)
SELECT a.user_id, a.group_id, a.rank
FROM vicidial_inbound_group_agents a
WHERE a.user_id NOT IN (SELECT user FROM vicidial_live_agents)
ORDER BY a.rank DESC;

Step 6: Simulate Next-Call Routing

-- Who would get the next call right now?
-- For inbound_group_rank routing: highest rank + longest wait time wins
SELECT vla.user, vu.full_name, vla.status, vla.last_state_change,
       TIMESTAMPDIFF(SECOND, vla.last_state_change, NOW()) as wait_seconds,
       iga.rank, iga.weight
FROM vicidial_live_agents vla
JOIN vicidial_inbound_group_agents iga ON vla.user = iga.user_id
JOIN vicidial_users vu ON vla.user = vu.user
WHERE iga.group_id = 'INGROUP_NAME'
  AND vla.status = 'READY'
  AND FIND_IN_SET('INGROUP_NAME', REPLACE(vla.closer_campaigns, ' ', ','))
ORDER BY iga.rank DESC, wait_seconds DESC
LIMIT 5;

Server-Specific Config

Server-A (UK)

  • Ingroups: normal (first-time), dualcall (repeat), city_uk (legacy)
  • All use next_agent_call=inbound_group_rank
  • Ranks: 9 (top) → 5 (bottom), same across all 3 ingroups
  • Campaign: UKCAPM
  • MySQL: ssh svr-a "mysql asterisk -e '...'"

Italy (Server-B)

  • Multiple ingroups per region
  • Ring groups in customexte.conf: region_a, region_b, region_c, region_d, region_e
  • Campaign: CAMP_TEST
  • MySQL: ssh svr-b "mysql -u dbuser -pSecurePass asterisk -e '...'"

Server-C (Romania) / Server-D (France)

  • MySQL: ssh svr-c "mysql asterisk -e '...'" / ssh svr-d "mysql asterisk -e '...'"

ViciDial Report URLs for Ranks

# Real-time agent status with ingroup info
http://SERVER_IP/vicidial/realtime_report.php?RR=4&group[]=--ALL--&ALLINGROUPstats=1

# Agent performance by ingroup
http://SERVER_IP/vicidial/AST_agent_performance_detail.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&group[]=INGROUP&shift=ALL&SUBMIT=submit

Lookup & Reporting

📞

/did-lookup

Look up DID routing, company name mapping, call history for a DID number. Useful when a specific phone number has issues or you need to know which company/ingroup a DID routes to.

Tools: Bash(ssh *)

DID Lookup

Look up DID configuration, routing, company mapping, and call history. $ARGUMENTS: DID number (full or partial), company name, or "unrouted" for orphan DIDs.

Step 1: Find the DID

-- Search by number pattern
SELECT did_id, did_pattern, did_description, did_route, did_agent_a,
       extension, exten_context, group_id, menu_id, filter_action
FROM vicidial_inbound_dids
WHERE did_pattern LIKE '%NUMBER%'
ORDER BY did_pattern;

-- Count total active DIDs
SELECT COUNT(*) as total_dids FROM vicidial_inbound_dids WHERE did_active='Y';

Step 2: Check Company Name Mapping (Server-A — DualCall System)

-- Server-A has did_routing_map table for CRM display
SELECT did, company_name FROM did_routing_map
WHERE did LIKE '%NUMBER%' OR company_name LIKE '%SEARCH%'
ORDER BY company_name;

-- DIDs without company mapping (might need to be added)
SELECT d.did_pattern FROM vicidial_inbound_dids d
LEFT JOIN did_routing_map m ON d.did_pattern = m.did
WHERE m.did IS NULL AND d.did_active='Y'
ORDER BY d.did_pattern LIMIT 20;

Step 3: Check Call History for This DID

-- Recent calls to this DID
SELECT call_date, phone_number, length_in_sec, status, term_reason, user, queue_seconds
FROM vicidial_closer_log
WHERE campaign_id IN (
  SELECT group_id FROM vicidial_inbound_dids WHERE did_pattern LIKE '%NUMBER%'
)
  AND call_date >= DATE_SUB(CURDATE(), INTERVAL 7 DAY)
ORDER BY call_date DESC LIMIT 20;

-- Or if you know the exact DID, check via did_inbound_filter_log

Step 4: Trace Routing Path

On Server-A, all DIDs route through:

DID → did_route=EXTEN → extension 999000111 → custom_router.agi
  → First call: "standard" ingroup
  → Repeat (7 days): "dualcall" ingroup
  → Company name set in CALLERID(name)

On Italy (Server-B), DIDs route through TrunkVendorA SIP trunks to various ingroups/ring groups.

# Check Asterisk dialplan routing for the DID
ssh SERVER "asterisk -rx 'dialplan show CONTEXT' | grep DID_NUMBER"

# Check dualcall history for this DID (Server-A)
ssh svr-a "mysql asterisk -e \"SELECT * FROM custom_dual_calls WHERE did_number LIKE '%NUMBER%' ORDER BY call_date DESC LIMIT 10;\""

Step 5: Manage Company Mapping (Server-A)

-- Add or update company name for a DID
INSERT INTO did_routing_map (did, company_name) VALUES ('44XXXXXXXXXX', 'Company Name')
ON DUPLICATE KEY UPDATE company_name = VALUES(company_name);

-- Bulk view all mappings
SELECT did, company_name FROM did_routing_map ORDER BY company_name;

MySQL Access

  • Server-A: ssh svr-a "mysql asterisk -e '...'"
  • Server-B: ssh svr-b "mysql -u dbuser -pSecurePass asterisk -e '...'"
  • Server-C/Server-D/Server-E: ssh SERVER "mysql asterisk -e '...'"
📊

/reports

Generate ViciDial report URLs or pull report data directly via SQL. Quick access to agent performance, inbound stats, call exports, carrier logs, LAGGED events, and 90+ other built-in reports. Use when user asks for any ViciDial report.

Tools: Bash(ssh *), Bash(curl *)

ViciDial Reports

Generate report URLs or pull data directly. $ARGUMENTS: report type (e.g., "agent performance", "inbound", "drops", "carrier"), server, date range, agent ID.

Auth: HTTP Basic admin / password (all servers) (all servers)

Most Common Reports (with URL templates)

Agent Performance (THE KEY REPORT)

http://SERVER_IP/vicidial/AST_agent_performance_detail.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&group[]=--ALL--&shift=ALL&SUBMIT=submit

Shows: calls, talk time, pause time, wait time, dispo time per agent. Add &file_download=1 for call CSV, &file_download=2 for pause CSV.

Agent Time Detail (Most Accurate)

http://SERVER_IP/vicidial/AST_agent_time_detail.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&group[]=--ALL--&shift=ALL&time_in_sec=1&SUBMIT=submit

Single Agent Deep Dive

http://SERVER_IP/vicidial/user_stats.php?user=AGENT_ID&begin_date=YYYY-MM-DD&end_date=YYYY-MM-DD&pause_code_rpt=1&SUBMIT=submit

Inbound Stats (CLOSERstats)

http://SERVER_IP/vicidial/AST_CLOSERstats.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&group[]=--ALL--&shift=ALL&DID=Y&SUBMIT=submit

Real-Time Dashboard

http://SERVER_IP/vicidial/realtime_report.php?RR=4&group[]=--ALL--&ALLINGROUPstats=1

Carrier Log (SIP-level detail)

http://SERVER_IP/vicidial/AST_carrier_log_report.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&server_ip[]=--ALL--&SUBMIT=submit

Hangup Cause Distribution

http://SERVER_IP/vicidial/AST_hangup_cause_report.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&server_ip[]=--ALL--&SUBMIT=submit

LAGGED Agent Report

http://SERVER_IP/vicidial/AST_LAGGED_log_report.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&SUBMIT=submit

Call Export (CSV)

http://SERVER_IP/vicidial/call_report_export.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&campaign[]=--ALL--&group[]=--ALL--&status[]=--ALL--&header_row=YES&run_export=run_export

DID Stats

http://SERVER_IP/vicidial/AST_DIDstats.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&DID[]=--ALL--&SUBMIT=submit

Inbound Summary Hourly

http://SERVER_IP/vicidial/AST_CLOSERsummary_hourly.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&group[]=--ALL--&shift=ALL&SUBMIT=submit

Agent Status Detail (Disposition Breakdown)

http://SERVER_IP/vicidial/AST_agent_status_detail.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&group[]=--ALL--&shift=ALL&SUBMIT=submit

Server Performance

http://SERVER_IP/vicidial/AST_server_performance.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&SUBMIT=submit

Timeclock

http://SERVER_IP/vicidial/timeclock_report.php?query_date=YYYY-MM-DD&end_date=YYYY-MM-DD&SUBMIT=submit

Direct SQL (When Reports Aren't Enough)

Quick agent summary today

SELECT user, COUNT(*) as calls, SUM(length_in_sec) as total_talk,
       AVG(length_in_sec) as avg_talk,
       SUM(CASE WHEN status='SALE' THEN 1 ELSE 0 END) as sales
FROM vicidial_closer_log WHERE call_date >= CURDATE()
GROUP BY user ORDER BY calls DESC;

Pause code analysis

SELECT user, sub_status as pause_code, COUNT(*) as times, SUM(pause_sec) as total_sec
FROM vicidial_agent_log
WHERE event_time >= CURDATE() AND sub_status != ''
GROUP BY user, sub_status ORDER BY user, total_sec DESC;

Inbound by DID

SELECT d.did_pattern, d.did_description, COUNT(*) as calls,
       AVG(cl.length_in_sec) as avg_dur
FROM vicidial_closer_log cl
JOIN vicidial_inbound_dids d ON cl.campaign_id = d.group_id
WHERE cl.call_date >= CURDATE()
GROUP BY d.did_pattern, d.did_description ORDER BY calls DESC LIMIT 20;

Server IPs

Server IP Notes
Server-A 10.0.1.10 UK primary
Server-C 10.0.1.20 Romania
Server-D 10.0.1.30 France
Server-B 10.0.1.40 Italy
Server-E 10.0.1.50 UK standalone

Global Params for All Reports

  • &search_archived_data=checked — Include archived data
  • &DB=1 — Debug mode (shows SQL)
  • &file_download=1 — CSV download
  • &shift=ALL — All shifts
  • &group[]=--ALL-- — All groups/campaigns

Custom Dashboard

🛡

/audit-server

Deep audit of a specific ViciDial server - security, config, performance, disk, logs. Use with server name as argument.

Tools: Bash(ssh *)

Server Audit

Perform a comprehensive audit of the specified server. $ARGUMENTS must be a server name (svr-a, svr-c, svr-d, svr-b, svr-e, echo).

Run the following checks via SSH (combine into minimal SSH commands):

1. System

  • uptime, free -h, df -h, nproc
  • Top 5 processes by CPU: ps aux --sort=-%cpu | head -6
  • Top 5 processes by memory: ps aux --sort=-%mem | head -6
  • Load average trend: cat /proc/loadavg

2. Asterisk

  • asterisk -rx "core show channels" | tail -1
  • asterisk -rx "sip show peers" — count OK vs UNREACHABLE
  • asterisk -rx "sip show channelstats" — any high jitter/loss
  • Check for errors in last 100 lines: tail -100 /var/log/asterisk/messages | grep -i "error\|warning\|failed"

3. Database

  • mysqladmin status
  • mysql -e "SHOW PROCESSLIST" asterisk | wc -l (connection count)
  • Check for long queries: mysql -e "SELECT * FROM information_schema.PROCESSLIST WHERE TIME > 30 AND COMMAND != 'Sleep'" asterisk
  • Disk usage: du -sh /var/lib/mysql/

4. Security

  • fail2ban-client status (is it running? how many jails?)
  • fail2ban-client status sshd (banned IPs)
  • Check for recent auth failures: grep "Failed password" /var/log/auth.log 2>/dev/null | tail -5 or /var/log/secure
  • Open ports: ss -tlnp | grep -v 127.0.0

5. ViciDial

  • Screen sessions: screen -ls
  • ls /var/spool/asterisk/monitor/ | wc -l (recording queue)
  • Recording disk usage: du -sh /var/spool/asterisk/monitorDONE/ 2>/dev/null
  • Crontab: crontab -l | grep -v "^#" | grep -v "^$" | wc -l active cron jobs

6. Logs

  • Recent errors: journalctl -p err --since "1 hour ago" --no-pager | tail -20 (or /var/log/messages on CentOS)
  • Asterisk crashes: ls -lt /tmp/core.* 2>/dev/null | head -5

Present findings in organized sections. Flag CRITICAL (needs immediate action), WARNING (should fix soon), INFO (for awareness).

Server-specific notes:

  • svr-b: CentOS 7 (use yum, /var/log/secure, no journalctl --since), Asterisk 11, MeetMe
  • svr-a/svr-c/svr-d/svr-e: openSUSE, newer Asterisk, ConfBridge

Want these skills for your infrastructure?

I can build custom Claude Code skills tailored to your VoIP setup — same approach, your servers, your workflows.

Get a Free Consultation