← All Tutorials

SIP Trunk Failover & Load Balancing for ViciDial

Infrastructure & DevOps Intermediate 14 min read #50

Master multi-trunk redundancy, active-active load balancing, and failover logic to ensure zero-downtime calling in your ViciDial environment

Prerequisites

Before implementing SIP trunk failover and load balancing, ensure you have:

Understanding ViciDial Trunk Architecture

Current Trunk Configuration

ViciDial uses the vicidial_carrier_log and carrier configuration within Asterisk to manage outbound trunks. By default, ViciDial assigns trunks sequentially without native failover intelligence. This creates a single point of failure: if your primary trunk drops, calls queue up or fail until manual intervention occurs.

The key components you'll interact with:

Load Balancing vs. Failover

This tutorial implements active-active load balancing with intelligent failover.

Step 1: Prepare Your SIP Trunks

Verify Trunk Connectivity

First, confirm both trunks are provisioned and responsive:

# SSH into your ViciDial server
ssh root@your-vicidial-server

# Test trunk 1 connectivity (replace with your carrier IP/domain)
asterisk -rx "sip show peers" | grep -E "trunk1|trunk2"

# Expected output:
# trunk1/trunk1                          xxx.xxx.xxx.xxx:5060    OK (49 ms)
# trunk2/trunk2                          yyy.yyy.yyy.yyy:5060    OK (52 ms)

If either trunk shows "UNREACHABLE", contact your carrier and verify:

Document Trunk Details

Create a reference file with your trunk specifications:

cat > /root/trunk_config.txt << 'EOF'
TRUNK 1 (Primary)
Provider: VoIP Provider A
IP: 203.0.113.10
Port: 5060
Protocol: UDP
Username: your_account_1
Password: your_pass_1
Capacity: 100 concurrent calls

TRUNK 2 (Secondary)
Provider: VoIP Provider B
IP: 198.51.100.20
Port: 5060
Protocol: UDP
Username: your_account_2
Password: your_pass_2
Capacity: 100 concurrent calls
EOF

chmod 600 /root/trunk_config.txt

Step 2: Configure SIP Peers with Health Monitoring

Edit sip-vicidial.conf

Navigate to the Asterisk configuration:

cp /etc/asterisk/sip-vicidial.conf /etc/asterisk/sip-vicidial.conf.backup
nano /etc/asterisk/sip-vicidial.conf

Locate your trunk definitions and add health monitoring parameters. If they don't exist, add them:

; TRUNK 1 - Primary Carrier
[trunk1]
type=peer
host=203.0.113.10
port=5060
username=your_account_1
fromuser=your_account_1
secret=your_pass_1
fromhost=203.0.113.10
dtmfmode=rfc2833
disallow=all
allow=ulaw
allow=alaw
qualify=yes
qualifyfreq=60
qualifysmoothing=yes
qualifygap=3
maxretries=2
retryinterval=6
timert1=500
nat=force_rport,comedia
insecure=port,invite
context=from-external-vicidial

; TRUNK 2 - Secondary Carrier
[trunk2]
type=peer
host=198.51.100.20
port=5060
username=your_account_2
fromuser=your_account_2
secret=your_pass_2
fromhost=198.51.100.20
dtmfmode=rfc2833
disallow=all
allow=ulaw
allow=alaw
qualify=yes
qualifyfreq=60
qualifysmoothing=yes
qualifygap=3
maxretries=2
retryinterval=6
timert1=500
nat=force_rport,comedia
insecure=port,invite
context=from-external-vicidial

Critical parameters explained:

Reload SIP Configuration

asterisk -rx "sip reload"
asterisk -rx "sip show peers" | grep -E "trunk1|trunk2"

Expect output showing both trunks as "OK" (or "UNREACHABLE" if not connected):

trunk1/trunk1                          203.0.113.10:5060      OK (45 ms)
trunk2/trunk2                          198.51.100.20:5060     OK (48 ms)

Step 3: Configure Dialplan with Intelligent Failover

Edit extensions-vicidial.conf

Open the dialplan configuration:

cp /etc/asterisk/extensions-vicidial.conf /etc/asterisk/extensions-vicidial.conf.backup
nano /etc/asterisk/extensions-vicidial.conf

Find the outbound dialing context or create a new one. Here's a production-grade dialplan that implements load balancing with failover:

; Outbound Dialing Context with Load Balancing and Failover
[outbound-vicidial-lb]
exten => _1NXXNXXXXXX,1,NoOp(Load Balance Outbound: ${EXTEN})
exten => _1NXXNXXXXXX,n,Set(DIAL_ATTEMPTS=0)
exten => _1NXXNXXXXXX,n,Set(MAX_ATTEMPTS=2)
exten => _1NXXNXXXXXX,n,GoSub(try-trunk,1,1)
exten => _1NXXNXXXXXX,n,NoOp(Call Result: ${CALL_RESULT})
exten => _1NXXNXXXXXX,n,Hangup()

; Subroutine: Attempt call on next available trunk
[try-trunk]
exten => 1,1,Set(DIAL_ATTEMPTS=${MATH(${DIAL_ATTEMPTS}+1)})
exten => 1,n,NoOp(Dial Attempt ${DIAL_ATTEMPTS} of ${MAX_ATTEMPTS})

; Get current trunk selection based on load
exten => 1,n,Set(LOAD_TRUNK1=${CHANNEL(numchannels)})
exten => 1,n,Set(TRUNK_SELECT=trunk1)

; Check if trunk1 is available
exten => 1,n,GotoIf($["${DEVICESTATE(SIP/trunk1)}"="UNAVAILABLE"]?try-trunk2)
exten => 1,n,Set(CALL_RESULT=ATTEMPTING_TRUNK1)
exten => 1,n,Dial(SIP/${EXTEN}@trunk1,,gM(vicidial-record^${UNIQUEID}))
exten => 1,n,Set(CALL_RESULT=${DIALSTATUS})
exten => 1,n,GotoIf($["${DIALSTATUS}"="ANSWER"]?success)
exten => 1,n,GoTo(check-retry)

; Try trunk2 if trunk1 failed
exten => 1,n(try-trunk2),Set(CALL_RESULT=ATTEMPTING_TRUNK2)
exten => 1,n,Set(TRUNK_SELECT=trunk2)
exten => 1,n,GotoIf($["${DEVICESTATE(SIP/trunk2)}"="UNAVAILABLE"]?check-retry)
exten => 1,n,Dial(SIP/${EXTEN}@trunk2,,gM(vicidial-record^${UNIQUEID}))
exten => 1,n,Set(CALL_RESULT=${DIALSTATUS})
exten => 1,n,GotoIf($["${DIALSTATUS}"="ANSWER"]?success)

; Check if we should retry
exten => 1,n(check-retry),GotoIf($[${DIAL_ATTEMPTS}<${MAX_ATTEMPTS}]?retry)
exten => 1,n,NoOp(All trunks exhausted)
exten => 1,n,Return()

exten => 1,n(retry),Wait(2)
exten => 1,n,Return()

exten => 1,n(success),NoOp(Call connected via ${TRUNK_SELECT})
exten => 1,n,Return()

Important: Replace vicidial-record with your actual recording macro name. Check your existing extensions-vicidial.conf for the correct macro.

Load Balancing with Round-Robin

For true load balancing (distributing calls across both trunks), add this alternative context:

[outbound-vicidial-rr]
exten => _1NXXNXXXXXX,1,NoOp(Round-Robin Load Balance: ${EXTEN})
exten => _1NXXNXXXXXX,n,Set(CALL_COUNT=${EVAL(${ASTDB(vicidial,call_count)} + 1)})
exten => _1NXXNXXXXXX,n,Set(ASTDB(vicidial,call_count)=${CALL_COUNT})

; Determine which trunk based on modulo (odd/even calls)
exten => _1NXXNXXXXXX,n,Set(TRUNK_MOD=${MATH(${CALL_COUNT} % 2)})
exten => _1NXXNXXXXXX,n,GotoIf($[${TRUNK_MOD}=0]?use-trunk1:use-trunk2)

exten => _1NXXNXXXXXX,n(use-trunk1),Dial(SIP/${EXTEN}@trunk1)
exten => _1NXXNXXXXXX,n,GoTo(fail-to-trunk2)

exten => _1NXXNXXXXXX,n(use-trunk2),Dial(SIP/${EXTEN}@trunk2)
exten => _1NXXNXXXXXX,n,GoTo(failover)

exten => _1NXXNXXXXXX,n(fail-to-trunk2),GotoIf($["${DIALSTATUS}"="ANSWER"]?end)
exten => _1NXXNXXXXXX,n,Dial(SIP/${EXTEN}@trunk1)
exten => _1NXXNXXXXXX,n,GoTo(end)

exten => _1NXXNXXXXXX,n(end),Hangup()

Reload Dialplan

asterisk -rx "dialplan reload"
asterisk -rx "dialplan show outbound-vicidial-lb" | head -20

You should see your dialplan extensions listed without errors.

Step 4: Configure ViciDial Carrier Management

Access the Database Directly

You can configure carriers via the web interface, but for automation and precision, use the database:

mysql -u root -p asterisk

Check existing carriers:

SELECT carrier_id, carrier_name, active, failover_carrier_id FROM vicidial_carrier_log LIMIT 10;

Example output:

+-----------+-----------+--------+---------------------+
| carrier_id| carrier_n | active | failover_carrier_id |
+-----------+-----------+--------+---------------------+
| 1         | Carrier A | Y      | 2                   |
| 2         | Carrier B | Y      | 1                   |
+-----------+-----------+--------+---------------------+

Create Carrier Entries if Missing

INSERT INTO vicidial_carrier_log 
(carrier_id, carrier_name, active, failover_carrier_id, calls_today, concurrent_limit, cost_per_minute)
VALUES 
(1, 'Carrier A - Trunk1', 'Y', 2, 0, 100, 0.02),
(2, 'Carrier B - Trunk2', 'Y', 1, 0, 100, 0.025);

Configure Campaign to Use Multiple Carriers

Update your campaign to use both carriers:

UPDATE vicidial_campaigns 
SET carrier_id = '1,2'
WHERE campaign_id = 'YOUR_CAMPAIGN_ID';

The comma-separated carrier IDs tell ViciDial to use both carriers.

Step 5: Implement Real-Time Trunk Health Monitoring

Create Monitoring Script

This script checks trunk health and updates the database:

cat > /usr/local/bin/check_trunk_health.sh << 'SCRIPT'
#!/bin/bash

# SIP Trunk Health Monitor for ViciDial
# Runs every 5 minutes via cron

LOG_FILE="/var/log/asterisk/trunk_monitor.log"
DB_USER="root"
DB_PASS="your_mysql_password"
DB_NAME="asterisk"

log_message() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}

# Check trunk1 status via Asterisk
TRUNK1_STATUS=$(asterisk -rx "sip show peer trunk1" 2>/dev/null | grep "Status" | awk '{print $3}')
TRUNK2_STATUS=$(asterisk -rx "sip show peer trunk2" 2>/dev/null | grep "Status" | awk '{print $3}')

log_message "TRUNK1 Status: $TRUNK1_STATUS | TRUNK2 Status: $TRUNK2_STATUS"

# Update database based on trunk status
if [ "$TRUNK1_STATUS" = "OK" ]; then
    mysql -u $DB_USER -p$DB_PASS $DB_NAME -e \
        "UPDATE vicidial_carrier_log SET active='Y' WHERE carrier_id='1';" 2>/dev/null
    log_message "Trunk1 marked as ACTIVE"
else
    mysql -u $DB_USER -p$DB_PASS $DB_NAME -e \
        "UPDATE vicidial_carrier_log SET active='N' WHERE carrier_id='1';" 2>/dev/null
    log_message "Trunk1 marked as INACTIVE - FAILOVER ENGAGED"
fi

if [ "$TRUNK2_STATUS" = "OK" ]; then
    mysql -u $DB_USER -p$DB_PASS $DB_NAME -e \
        "UPDATE vicidial_carrier_log SET active='Y' WHERE carrier_id='2';" 2>/dev/null
    log_message "Trunk2 marked as ACTIVE"
else
    mysql -u $DB_USER -p$DB_PASS $DB_NAME -e \
        "UPDATE vicidial_carrier_log SET active='N' WHERE carrier_id='2';" 2>/dev/null
    log_message "Trunk2 marked as INACTIVE - FAILOVER ENGAGED"
fi
SCRIPT

chmod +x /usr/local/bin/check_trunk_health.sh

Schedule Health Checks via Cron

crontab -e

Add this line:

*/5 * * * * /usr/local/bin/check_trunk_health.sh > /dev/null 2>&1

This runs health checks every 5 minutes.

Step 6: ViciDial Web Interface Configuration

Configure Carriers via Admin Panel

  1. Log into ViciDial at https://your-vicidial-server/vicidial/admin.php

  2. Navigate to Admin → Carriers (or Admin → Carrier Management)

  3. For each carrier, set:

    • Carrier Name: Match your dialplan (e.g., "trunk1", "trunk2")
    • Failover Carrier: Select the backup carrier
    • Active: Yes
    • Concurrent Call Limit: 100 (or your trunk capacity)
  4. Go to Campaigns and select your campaign

  5. In Carrier Selection, set to use both carriers (usually "1,2" or select multi-check)

Monitor Real-Time Trunk Status

In the ViciDial interface, check Reports → Carrier Report to see:

Step 7: Load Balancing Strategy Selection

Strategy 1: Priority-Based Failover (Recommended for Most Setups)

Primary trunk handles all calls; secondary only activates on failure:

[outbound-priority-failover]
exten => _1NXXNXXXXXX,1,NoOp(Priority Failover: Primary then Secondary)
exten => _1NXXNXXXXXX,n,Dial(SIP/${EXTEN}@trunk1,,gM(vicidial-record^${UNIQUEID}))
exten => _1NXXNXXXXXX,n,GotoIf($["${DIALSTATUS}"="ANSWER"]?end)
exten => _1NXXNXXXXXX,n,Dial(SIP/${EXTEN}@trunk2,,gM(vicidial-record^${UNIQUEID}))
exten => _1NXXNXXXXXX,n(end),Hangup()

Advantages: Simpler, works with most carrier billing models
Disadvantages: Doesn't distribute load

Strategy 2: Active-Active Round-Robin (Best for Load Distribution)

Alternates calls between trunks:

[outbound-active-active]
exten => _1NXXNXXXXXX,1,NoOp(Active-Active Load Balance)
exten => _1NXXNXXXXXX,n,Set(CALL_COUNT=${EVAL(${ASTDB(vicidial,call_index)} + 1)})
exten => _1NXXNXXXXXX,n,Set(ASTDB(vicidial,call_index)=${CALL_COUNT})

exten => _1NXXNXXXXXX,n,GotoIf($[${MATH(${CALL_COUNT} % 2)}=0]?try1:try2)

exten => _1NXXNXXXXXX,n(try1),Dial(SIP/${EXTEN}@trunk1)
exten => _1NXXNXXXXXX,n,GotoIf($["${DIALSTATUS}"="ANSWER"]?end:try2)

exten => _1NXXNXXXXXX,n(try2),Dial(SIP/${EXTEN}@trunk2)
exten => _1NXXNXXXXXX,n,GotoIf($["${DIALSTATUS}"="ANSWER"]?end:fallback)

exten => _1NXXNXXXXXX,n(fallback),NoOp(Both trunks failed)
exten => _1NXXNXXXXXX,n(end),Hangup()

Advantages: Even distribution, maximizes trunk utilization
Disadvantages: Requires both trunks to be reliable; some carriers may not support simultaneous calls from same account

Strategy 3: Weighted Load Balancing (For Asymmetric Capacity)

If one trunk has higher capacity, send more calls to it:

[outbound-weighted-lb]
exten => _1NXXNXXXXXX,1,NoOp(Weighted Load: 70% Trunk1, 30% Trunk2)
exten => _1NXXNXXXXXX,n,Set(CALL_COUNT=${EVAL(${ASTDB(vicidial,weighted_count)} + 1)})
exten => _1NXXNXXXXXX,n,Set(ASTDB(vicidial,weighted_count)=${CALL_COUNT})

exten => _1NXXNXXXXXX,n,GotoIf($[${MATH(${CALL_COUNT} % 10)} < 7]?try1:try2)

exten => _1NXXNXXXXXX,n(try1),Dial(SIP/${EXTEN}@trunk1)
exten => _1NXXNXXXXXX,n,GotoIf($["${DIALSTATUS}"="ANSWER"]?end:try2)

exten => _1NXXNXXXXXX,n(try2),Dial(SIP/${EXTEN}@trunk2)
exten => _1NXXNXXXXXX,n(end),Hangup()

Step 8: Testing and Validation

Test Failover Manually

# Test trunk1 directly
asterisk -rx "sip show peer trunk1"

# Force a test call on trunk1
asterisk -rx "originate SIP/trunk1/12125551234 extension 6000@demo"

# Check call status
asterisk -rx "core show calls verbose"

# Simulate trunk1 failure
asterisk -rx "sip set peer trunk1 offline"
asterisk -rx "sip show peers | grep trunk1"

# Expected: trunk1 now shows UNREACHABLE

# Trigger a call - should route to trunk2
asterisk -rx "originate SIP/trunk2/12125551234 extension 6000@demo"

# Restore trunk1
asterisk -rx "sip set peer trunk1 online"

Monitor Real-Time Call Routing

Watch Asterisk logs during a test call:

tail -f /var/log/asterisk/messages | grep -i "dial\|trunk1\|trunk2"

During an outbound call, you should see:

[2024-01-15 10:23:45] VERBOSE[12345]: app_dial.c: TRUNK SELECTION: Attempting trunk1
[2024-01-15 10:23:46] VERBOSE[12345]: app_dial.c: ANSWER on trunk1

Or if failover triggers:

[2024-01-15 10:24:12] VERBOSE[12346]: app_dial.c: TRUNK SELECTION: Attempting trunk1
[2024-01-15 10:24:14] WARNING[12346]: app_dial.c: TRUNK UNAVAILABLE: trunk1
[2024-01-15 10:24:14] VERBOSE[12346]: app_dial.c: TRUNK SELECTION: Attempting trunk2
[2024-01-15 10:24:15] VERBOSE[12346]: app_dial.c: ANSWER on trunk2

Database Verification

Check the carrier log for call routing:

SELECT DATE_FORMAT(call_date, '%Y-%m-%d %H:%i:%S') as call_time, 
       carrier_id, 
       call_duration, 
       status
FROM vicidial_carrier_log 
WHERE call_date > DATE_SUB(NOW(), INTERVAL 1 HOUR)
ORDER BY call_date DESC LIMIT 20;

Step 9: Performance Tuning and Optimization

Adjust Asterisk RTP Settings

For better call quality under high load, edit /etc/asterisk/rtp.conf:

[general]
rtpstart=10000
rtpend=20000
rtcpinterval=5000
dtmfmode=rfc2833

Reload:

asterisk -rx "rtp set debug on"

Optimize Dialplan Performance

Cache trunk status to reduce database queries:

[set-trunk-vars]
exten => s,1,NoOp(Initialize trunk variables)
exten => s,n,Set(TRUNK1_STATUS=${ASTDB(trunk_status,trunk1)})
exten => s,n,Set(TRUNK2_STATUS=${ASTDB(trunk_status,trunk2)})
exten => s,n,Return()

Monitor Asterisk Memory and CPU

# Check Asterisk process resource usage
ps aux | grep asterisk | grep -v grep

# Expected output shows reasonable CPU and MEM usage
# Example: asterisk  2890  1.2 15.4 ...

If CPU exceeds 80%, increase Asterisk priority or reduce dialplan complexity.

Troubleshooting

Symptoms: Calls Failing on Primary Trunk But Secondary Seems OK

Diagnosis:

# Check if failover is actually triggering
asterisk -rx "sip show peers" | grep -E "trunk1|trunk2"

# Look for specific error messages
grep -i "busy\|unavailable\|congestion" /var/log/asterisk/messages | tail -20

# Check MySQL database for carrier status
mysql -u root -p asterisk -e "SELECT carrier_id, active FROM vicidial_carrier_log LIMIT 5;"

Solutions:

  1. Verify SIP connectivity — Retest from Step 1
  2. Check firewall rules — Ensure UDP 5060 is open to trunk IPs
  3. Inspect carrier logs — Request CDR from your provider
  4. Adjust qualify parameters — Reduce qualifyfreq to 30 seconds if detection lag is critical

Symptoms: Both Trunks Show OK but Calls Not Routing

Diagnosis:

# Verify dialplan is loaded
asterisk -rx "dialplan show outbound-vicidial-lb" | head -5

# Check for dialplan syntax errors
asterisk -vvv -g 2>&1 | grep -i "error\|parsing" | tail -10

# Test extension manually
asterisk -rx "originate SIP/trunk1/12125551234 extension 6000@demo"
asterisk -rx "core show calls verbose"

Solutions:

  1. Reload dialplanasterisk -rx "dialplan reload"
  2. Verify context name — Ensure campaign references correct context
  3. Check extension matching — Confirm called number matches your regex pattern
  4. Review web interface logs — Check /vicidial/admin.php for errors

Symptoms: High Latency or One-Way Audio

Diagnosis:

# Check RTP port range availability
netstat -un | grep "1[0-2][0-9][0-9][0-9]" | wc -l
# Should be less than your rtpend - rtpstart (10,000 in default config)

# Test NAT/codec issues
asterisk -rx "sip show peer trunk1" | grep -E "Codec|NAT"

Solutions:

  1. Add codec restrictions in sip-vicidial.conf:
    allow=ulaw
    disallow=all
    
  2. Enable SRTP if carrier supports it to reduce latency
  3. Adjust timert1 upward (to 1000ms) for carriers with high latency

Symptoms: Carrier Health Monitoring Script Not Running

Diagnosis:

# Check cron execution
grep check_trunk_health /var/log/syslog | tail -5

# Run script manually to test
/usr/local/bin/check_trunk_health.sh

# Check for errors
cat /var/log/asterisk/trunk_monitor.log | tail -10

Solutions:

  1. Fix script path — Ensure MySQL password is correct
  2. Verify file permissionschmod 755 /usr/local/bin/check_trunk_health.sh
  3. Check cron servicesystemctl restart cron

Symptoms: Calls Alternating Between Trunks Unpredictably

Diagnosis:

# Check trunk status oscillation
for i in {1..10}; do
  asterisk -rx "sip show peer trunk1" | grep Status
  sleep 1
done

# Check Asterisk qualify counters
asterisk -rx "sip show peer trunk1" | grep -E "Qualify|Keepalive"

Solutions:

  1. Increase qualifyfreq — Change from 60 to 120+ seconds to reduce false-positive failures
  2. Increase qualifygap — Change from 3 to 5+ to require more consecutive failures before marking down
  3. Contact carrier — Ask if they're sending periodic disconnects for load balancing

Summary

You now have a production-grade SIP trunk failover and load balancing system for ViciDial with:

Multi-trunk redundancy — Automatic failover to secondary trunks
Real-time health monitoring — Asterisk qualify pings every 60 seconds
Multiple load balancing strategies — Priority-based, round-robin, and weighted options
Database-driven configuration — Persistent carrier management across restarts
Monitoring and alerting — Health check script runs every 5 minutes
Testing procedures — Manual failover testing included
Troubleshooting guides — Common issues and resolutions documented

Key takeaways for production deployment:

  1. Always test failover in non-peak hours before full rollout
  2. Monitor trunk status for 48 hours after deployment to catch edge cases
  3. Keep carrier contact information accessible for emergencies
  4. Run health checks at least every 5 minutes for sub-minute failover detection
  5. Document your specific trunk IPs, usernames, and failover carrier IDs in a secure location
  6. Review logs daily during the first week, then weekly thereafter

Your ViciDial system is now protected against single-trunk failures, ensuring your call center operates reliably even when one carrier experiences outages.

Need expert help with your setup?

VoIP infrastructure consulting, AI voice agent integration, monitoring stacks, scaling — I've done it all in production.

Get a Free Consultation