MariaDB Multi-Source Replication for VoIP Call Centers
3 Masters to 1 Read-Only Replica with Database Renaming
Table of Contents
- Introduction
- Architecture Overview
- Prerequisites
- Master-Side Configuration
- Replica-Side Configuration
- Multi-Source Filter Configuration
- Initial Data Load
- Starting Replication
- Health Monitoring
- Cron Setup
- Firewall and Security
- Parallel Replication Tuning
- Handling Schema Differences
- Error Skipping Strategy
- Troubleshooting
- Maintenance Operations
- Performance Considerations
Introduction
VoIP call centers running ViciDial or similar Asterisk-based dialers present a unique database challenge: each production server maintains its own asterisk database containing call detail records (CDRs), agent performance logs, recording metadata, and campaign statistics. These databases are under constant heavy write pressure from live call traffic -- thousands of INSERT and UPDATE operations per minute during peak hours.
Running analytical queries directly on production is dangerous. A slow report query can lock MyISAM tables, block agent screen updates, and cause call handling delays. Multiply that risk across three or more production servers and you have a recipe for outages.
Multi-source replication solves this cleanly. By replicating all production databases into a single read-only replica, you get:
- Centralized reporting -- One database server to query for cross-site analytics, agent comparisons, and unified dashboards (Grafana, Metabase, custom PHP reports).
- Zero production impact -- Reports never touch production. The replica absorbs all read traffic.
- Disaster recovery data -- The replica holds a near-real-time copy of all critical tables. Not a backup, but a valuable secondary copy.
- Operational isolation -- Report users get restricted accounts on the replica with query timeouts, so a runaway JOIN cannot harm anything.
MariaDB's multi-source replication (available since 10.0) is purpose-built for this fan-in topology. Each master connection is independently configured, monitored, and recoverable. Combined with replicate_rewrite_db, you can map every master's identically-named asterisk database to a unique name on the replica -- no schema changes needed on production.
This tutorial walks through the complete setup as deployed in a production VoIP environment: three ViciDial servers (running MariaDB 10.6 through 10.11) replicating to a Debian 12 replica running MariaDB 10.11. Every configuration file, monitoring script, and troubleshooting procedure comes from a working deployment handling tens of millions of replicated events.
Architecture Overview
┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────────┐
│ MASTER 1 (master1) │ │ MASTER 2 (master2) │ │ MASTER 3 (master3) │
│ MariaDB 10.11 │ │ MariaDB 10.11 │ │ MariaDB 10.6 │
│ server_id = 1 │ │ server_id = 2 │ │ server_id = 3 │
│ DB: asterisk │ │ DB: asterisk │ │ DB: asterisk │
│ binlog: MIXED │ │ binlog: MIXED │ │ binlog: MIXED │
│ │ │ │ │ │
│ ViciDial + Asterisk │ │ ViciDial + Asterisk │ │ ViciDial + Asterisk │
│ (Live call traffic) │ │ (Live call traffic) │ │ (Live call traffic) │
└──────────┬────────────┘ └──────────┬────────────┘ └──────────┬────────────┘
│ :3306 │ :3306 │ :3306
│ binlog stream │ binlog stream │ binlog stream
│ (compressed) │ (compressed) │ (compressed)
│ │ │
└────────────┐ │ ┌───────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────┐
│ REPLICA SERVER │
│ MariaDB 10.11.14 │
│ Debian 12 │
│ server_id = 10 │
│ read_only = ON │
│ │
│ Connection "master1": │
│ asterisk ──rewrite──▶ master1_asterisk │
│ IO thread + 4 parallel SQL threads │
│ │
│ Connection "master2": │
│ asterisk ──rewrite──▶ master2_asterisk │
│ IO thread + 4 parallel SQL threads │
│ │
│ Connection "master3": │
│ asterisk ──rewrite──▶ master3_asterisk │
│ IO thread + 4 parallel SQL threads │
│ │
│ No binary log (terminal slave) │
│ log_slave_updates = OFF │
│ │
│ Consumers: │
│ • Grafana dashboards │
│ • Report scripts (PHP/Python) │
│ • Read-only user accounts │
│ │
└──────────────────────────────────────────────────────┘
Key design decisions:
| Decision | Rationale |
|---|---|
| Position-based replication (not GTID) | Simpler to set up; GTID adds complexity with no benefit for a terminal read-only slave |
log_slave_updates = OFF |
Replica never needs to act as a master; disabling saves disk I/O |
skip_slave_start = ON |
Prevents broken connections from auto-starting after a crash; manual start allows inspection |
slave_compressed_protocol = ON |
Reduces bandwidth 60-70% for WAN replication between data centers |
| 4 parallel SQL threads, optimistic mode | Maximizes apply throughput on multi-core hardware without strict ordering overhead |
slave_skip_errors for transient errors |
ViciDial's MEMORY tables and non-transactional MyISAM writes produce harmless conflicts |
| 12 tables filtered per connection | Transient/MEMORY tables that reset on restart; replicating them wastes resources and causes errors |
Prerequisites
Software Versions
- Replica: MariaDB 10.5+ (10.11 recommended for best parallel replication support)
- Masters: MariaDB 10.0+ (multi-source is a MariaDB feature; MySQL does not support it natively)
- OS: Any Linux distribution. This guide uses Debian 12 for the replica.
Version compatibility note: MariaDB handles cross-version replication well when the replica runs the same or newer version than the masters. Replicating from 10.6 masters to a 10.11 replica works without issues. Avoid replicating from a newer master to an older replica.
Hardware Sizing (Replica)
For a call center replicating ~100 million events across 3 connections:
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8 cores |
| RAM | 8 GB | 16 GB |
| Storage | 2x production data size | 3x production data size (SSD) |
| Network | 100 Mbps | 1 Gbps |
The replica needs enough RAM for MyISAM key buffers (ViciDial uses MyISAM extensively) and InnoDB buffer pool (for any InnoDB tables). Budget 2 GB for key_buffer_size and 1 GB for innodb_buffer_pool_size as starting points.
Network Requirements
- The replica must be able to reach each master on port 3306 (or your configured MariaDB port).
- Each master's firewall must allow inbound connections from the replica IP on port 3306.
- Latency under 100ms is ideal; multi-source replication works fine across continents but higher latency increases
Seconds_Behind_Master.
Master-Side Configuration
Each master server needs three things: binary logging enabled, a unique server ID, and a dedicated replication user.
1. Enable Binary Logging
Add or verify these settings in your master's MariaDB configuration (typically /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf):
[mysqld]
server-id = 1 # Unique per master: 1, 2, 3, etc.
log_bin = mysql-bin
binlog_format = MIXED # MIXED is the ViciDial default
expire_logs_days = 7 # Keep binlogs for 7 days
log_bin_compress = ON # Compress binlog events (10.2.3+)
log_bin_compress_min_len = 256 # Only compress events > 256 bytes
Important: Every server in the replication topology (masters and replica) must have a globally unique server_id. A common scheme:
| Server | server_id |
|---|---|
| Master 1 | 1 |
| Master 2 | 2 |
| Master 3 | 3 |
| Replica | 10 |
Restart MariaDB after changing the configuration:
systemctl restart mariadb
Verify binary logging is active:
SHOW MASTER STATUS\G
You should see output like:
File: mysql-bin.000042
Position: 12345678
Binlog_Do_DB:
Binlog_Ignore_DB:
2. Create the Replication User
On each master, create a dedicated user with only REPLICATION SLAVE privilege. Lock it down to the replica's IP address:
CREATE USER 'repl_replica'@'REPLICA_IP'
IDENTIFIED BY 'REPL_PASSWORD';
GRANT REPLICATION SLAVE ON *.*
TO 'repl_replica'@'REPLICA_IP';
FLUSH PRIVILEGES;
Security notes:
- Use a strong, unique password (32+ characters recommended).
- The
REPLICATION SLAVEprivilege is the minimum required. Do not grantSELECT,SUPER, or any other privilege. - Restrict the user to the replica's exact IP address, not
%. - Use the same username on all masters for consistency, but each master has its own independent user entry.
Verify the grant:
SHOW GRANTS FOR 'repl_replica'@'REPLICA_IP';
Expected output:
GRANT REPLICATION SLAVE ON *.* TO `repl_replica`@`REPLICA_IP`
3. Record Binary Log Position
Before setting up replication, note each master's current binary log file and position. You will need these values during the CHANGE MASTER step.
SHOW MASTER STATUS\G
Record the File and Position values for each master. If you are doing an initial data load with mysqldump --master-data, the dump file will contain the position automatically.
Replica-Side Configuration
The replica configuration is split across two files for clarity:
50-server.cnf-- Server identity, resource tuning, replication engine settings60-multisource.cnf-- Per-connection rewrite rules and table filters
Main Server Configuration: 50-server.cnf
Place this file at /etc/mysql/mariadb.conf.d/50-server.cnf (Debian/Ubuntu) or include it from your main my.cnf.
[mysqld]
# ─── Identity ───
server-id = 10
pid-file = /run/mysqld/mysqld.pid
basedir = /usr
# ─── Network ───
bind-address = 0.0.0.0
skip-name-resolve
port = 3306
max_connections = 200
connect_timeout = 60
wait_timeout = 28800
interactive_timeout = 28800
max_allowed_packet = 16M
# ─── Security ───
read_only = ON
# ─── Character Set ───
character_set_server = utf8mb4
collation_server = utf8mb4_unicode_520_ci
# ─── MyISAM Tuning ───
# ViciDial uses MyISAM extensively. Size key_buffer to ~25% of RAM.
default_storage_engine = MyISAM
key_buffer_size = 2G
myisam_sort_buffer_size = 128M
myisam_recover_options = BACKUP,FORCE
myisam_use_mmap = 1
# ─── InnoDB (for any InnoDB tables that replicate) ───
innodb_buffer_pool_size = 1G
innodb_file_per_table = ON
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 8M
# ─── Memory & Cache ───
table_open_cache = 4096
table_definition_cache = 16384
sort_buffer_size = 4M
read_buffer_size = 4M
read_rnd_buffer_size = 16M
join_buffer_size = 1M
thread_cache_size = 50
max_heap_table_size = 512M
tmp_table_size = 256M
open_files_limit = 24576
query_cache_size = 0
query_cache_type = 0
# ─── Replication — Relay Logs ───
relay_log = /var/lib/mysql/relay-bin
relay_log_recovery = ON
relay_log_purge = ON
max_relay_log_size = 256M
# ─── Replication — Parallel Apply ───
slave_parallel_threads = 4
slave_parallel_mode = optimistic
# ─── Replication — Network ───
slave_net_timeout = 60
slave_compressed_protocol = ON
# ─── Replication — Safety ───
skip_slave_start = ON
slave_skip_errors = 1032,1062,1146,1269,1396,1932
# ─── Replication — GTID ───
gtid_strict_mode = OFF
# ─── Replication — No binlog on replica (terminal slave) ───
log_slave_updates = OFF
# ─── Binary Log Expiry ───
expire_logs_days = 3
# ─── Slow Query Log ───
slow_query_log = 1
slow_query_log_file = /var/lib/mysql/replica-slow.log
long_query_time = 5
log_slow_verbosity = query_plan,explain
# ─── Query Timeout Safety Net ───
max_statement_time = 300
# ─── Error Log ───
log_error = /var/log/mysql/error.log
log_warnings = 2
Key settings explained:
| Setting | Value | Purpose |
|---|---|---|
read_only = ON |
Prevents accidental writes from non-SUPER users | Safety net for a reporting replica |
skip_slave_start = ON |
Replication does not auto-start on server boot | Allows you to inspect state before starting |
slave_compressed_protocol = ON |
Compresses the replication stream | Reduces WAN bandwidth by 60-70% |
slave_parallel_threads = 4 |
4 SQL applier threads per connection | Increases apply throughput |
slave_parallel_mode = optimistic |
Threads apply in parallel, roll back on conflict | Best throughput for MyISAM workloads |
log_slave_updates = OFF |
Do not write applied events to local binlog | Saves disk I/O; this is a terminal slave |
relay_log_recovery = ON |
Reprocess relay logs after crash | Prevents corruption from incomplete relay logs |
max_statement_time = 300 |
Kill queries running longer than 5 minutes | Protects against runaway report queries |
key_buffer_size = 2G |
MyISAM index cache | Must be large enough for ViciDial's index-heavy tables |
query_cache_size = 0 |
Disabled | Query cache causes mutex contention under replication writes |
slave_skip_errors |
1032,1062,1146,... | Skip transient errors from MEMORY tables and MyISAM quirks |
Multi-Source Filter Configuration
Filter Configuration: 60-multisource.cnf
Place this file at /etc/mysql/mariadb.conf.d/60-multisource.cnf:
[mysqld]
# ─── Multi-Source Replication Filters ───
# Map each master's 'asterisk' database to a unique local database
master1.replicate_rewrite_db = asterisk->master1_asterisk
master2.replicate_rewrite_db = asterisk->master2_asterisk
master3.replicate_rewrite_db = asterisk->master3_asterisk
# Skip transient/MEMORY tables for all connections
master1.replicate_ignore_table = asterisk.vicidial_live_agents
master1.replicate_ignore_table = asterisk.live_sip_channels
master1.replicate_ignore_table = asterisk.live_channels
master1.replicate_ignore_table = asterisk.vicidial_auto_calls
master1.replicate_ignore_table = asterisk.server_updater
master1.replicate_ignore_table = asterisk.web_client_sessions
master1.replicate_ignore_table = asterisk.vicidial_hopper
master1.replicate_ignore_table = asterisk.vicidial_campaign_server_status
master1.replicate_ignore_table = asterisk.parked_channels
master1.replicate_ignore_table = asterisk.vicidial_manager
master1.replicate_ignore_table = asterisk.cid_channels_recent
master1.replicate_ignore_table = asterisk.sip_packets
master2.replicate_ignore_table = asterisk.vicidial_live_agents
master2.replicate_ignore_table = asterisk.live_sip_channels
master2.replicate_ignore_table = asterisk.live_channels
master2.replicate_ignore_table = asterisk.vicidial_auto_calls
master2.replicate_ignore_table = asterisk.server_updater
master2.replicate_ignore_table = asterisk.web_client_sessions
master2.replicate_ignore_table = asterisk.vicidial_hopper
master2.replicate_ignore_table = asterisk.vicidial_campaign_server_status
master2.replicate_ignore_table = asterisk.parked_channels
master2.replicate_ignore_table = asterisk.vicidial_manager
master2.replicate_ignore_table = asterisk.cid_channels_recent
master2.replicate_ignore_table = asterisk.sip_packets
master3.replicate_ignore_table = asterisk.vicidial_live_agents
master3.replicate_ignore_table = asterisk.live_sip_channels
master3.replicate_ignore_table = asterisk.live_channels
master3.replicate_ignore_table = asterisk.vicidial_auto_calls
master3.replicate_ignore_table = asterisk.server_updater
master3.replicate_ignore_table = asterisk.web_client_sessions
master3.replicate_ignore_table = asterisk.vicidial_hopper
master3.replicate_ignore_table = asterisk.vicidial_campaign_server_status
master3.replicate_ignore_table = asterisk.parked_channels
master3.replicate_ignore_table = asterisk.vicidial_manager
master3.replicate_ignore_table = asterisk.cid_channels_recent
master3.replicate_ignore_table = asterisk.sip_packets
Understanding the Filters
replicate_rewrite_db is the cornerstone of this setup. Every master has a database called asterisk. Without rewriting, all three masters would try to write into the same asterisk database on the replica, causing collisions. The rewrite rule transparently maps:
master1: asterisk → master1_asterisk
master2: asterisk → master2_asterisk
master3: asterisk → master3_asterisk
The connection name prefix (e.g., master1.) scopes each filter to its specific replication connection. This is the MariaDB multi-source syntax -- each connection has its own independent filter set.
replicate_ignore_table filters out tables that should never be replicated:
| Table | Type | Why Skip |
|---|---|---|
vicidial_live_agents |
MEMORY | Volatile agent state; rebuilt on Asterisk restart |
live_sip_channels |
MEMORY | Active SIP channel tracking; changes thousands of times per minute |
live_channels |
MEMORY | Active Asterisk channel list |
vicidial_auto_calls |
MEMORY | Predictive dialer's active call queue |
server_updater |
MyISAM | Server heartbeat timestamps; master-specific |
web_client_sessions |
MEMORY | Agent web UI sessions |
vicidial_hopper |
MyISAM | Dialer lead hopper; extremely high churn |
vicidial_campaign_server_status |
MEMORY | Campaign runtime stats |
parked_channels |
MEMORY | Parked call tracking |
vicidial_manager |
MEMORY | Asterisk AMI command queue |
cid_channels_recent |
MEMORY | Recent caller ID cache |
sip_packets |
MyISAM | SIP capture data; extremely high volume (can generate GB/day) |
Critical detail: The
replicate_ignore_tablefilter uses the original database name (asterisk), not the rewritten name. MariaDB applies ignore filters before the rewrite. If you usedmaster1_asterisk.vicidial_live_agents, the filter would not match.
Why These Tables Cause Replication Errors
MEMORY tables are emptied when the master's MariaDB restarts. The master then logs DELETE or TRUNCATE statements in the binlog. If the replica has already lost those rows (or never had them), you get:
- Error 1032 (
Can't find record in ...) -- Trying to delete or update a row that does not exist on the replica. - Error 1062 (
Duplicate entry ...) -- Trying to insert a row that already exists.
The vicidial_hopper table sees thousands of INSERT/DELETE cycles per minute as the dialer loads and processes leads. Replicating it is pointless for reporting and a constant source of errors.
Initial Data Load
Before starting replication, you need to seed the replica with a snapshot of each master's data. This establishes the baseline from which replication will continue.
Step 1: Dump Each Master
On each master, run mysqldump with --master-data to embed the binary log position in the dump file:
# On Master 1
mysqldump \
--master-data=2 \
--single-transaction \
--routines \
--triggers \
--databases asterisk \
--ignore-table=asterisk.vicidial_live_agents \
--ignore-table=asterisk.live_sip_channels \
--ignore-table=asterisk.live_channels \
--ignore-table=asterisk.vicidial_auto_calls \
--ignore-table=asterisk.server_updater \
--ignore-table=asterisk.web_client_sessions \
--ignore-table=asterisk.vicidial_hopper \
--ignore-table=asterisk.vicidial_campaign_server_status \
--ignore-table=asterisk.parked_channels \
--ignore-table=asterisk.vicidial_manager \
--ignore-table=asterisk.cid_channels_recent \
--ignore-table=asterisk.sip_packets \
| gzip > /tmp/master1_dump.sql.gz
Important flags:
--master-data=2-- Writes theCHANGE MASTER TOstatement as a comment in the dump. You will extract the log file and position from this.--single-transaction-- For InnoDB tables, provides a consistent snapshot without locking. MyISAM tables will still lock briefly.--ignore-table-- Skip the same MEMORY and transient tables we filter in replication. No point dumping them.
For large databases (50GB+): Consider using
mariadb-backup(Mariabackup) instead ofmysqldump. It performs a physical backup that is significantly faster for large datasets. The binary log position is recorded inxtrabackup_binlog_info.
Step 2: Transfer Dumps to Replica
scp -P 9322 /tmp/master1_dump.sql.gz REPLICA_IP:/tmp/
Repeat for each master.
Step 3: Create Target Databases and Import
On the replica, create the renamed databases and import each dump:
# Create the target databases
mariadb -e "CREATE DATABASE IF NOT EXISTS master1_asterisk;"
mariadb -e "CREATE DATABASE IF NOT EXISTS master2_asterisk;"
mariadb -e "CREATE DATABASE IF NOT EXISTS master3_asterisk;"
# Import Master 1 (rewrite database name during import)
zcat /tmp/master1_dump.sql.gz \
| sed 's/`asterisk`/`master1_asterisk`/g' \
| mariadb
# Import Master 2
zcat /tmp/master2_dump.sql.gz \
| sed 's/`asterisk`/`master2_asterisk`/g' \
| mariadb
# Import Master 3
zcat /tmp/master3_dump.sql.gz \
| sed 's/`asterisk`/`master3_asterisk`/g' \
| mariadb
Step 4: Extract Binary Log Positions
Find the binary log coordinates embedded in each dump file:
zcat /tmp/master1_dump.sql.gz | head -30 | grep "CHANGE MASTER"
You will see a line like:
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000042', MASTER_LOG_POS=12345678;
Record the MASTER_LOG_FILE and MASTER_LOG_POS values for each master. You will need them in the next section.
Starting Replication
Step 1: Configure Each Connection
MariaDB multi-source replication uses named connections. Each CHANGE MASTER statement includes a connection name that matches the prefixes in your 60-multisource.cnf filter configuration.
-- Connection: master1
CHANGE MASTER 'master1' TO
MASTER_HOST = 'MASTER1_IP',
MASTER_PORT = 3306,
MASTER_USER = 'repl_replica',
MASTER_PASSWORD = 'REPL_PASSWORD',
MASTER_LOG_FILE = 'mysql-bin.000042',
MASTER_LOG_POS = 12345678,
MASTER_CONNECT_RETRY = 10,
MASTER_HEARTBEAT_PERIOD = 30;
-- Connection: master2
CHANGE MASTER 'master2' TO
MASTER_HOST = 'MASTER2_IP',
MASTER_PORT = 3306,
MASTER_USER = 'repl_replica',
MASTER_PASSWORD = 'REPL_PASSWORD',
MASTER_LOG_FILE = 'mysql-bin.000055',
MASTER_LOG_POS = 87654321,
MASTER_CONNECT_RETRY = 10,
MASTER_HEARTBEAT_PERIOD = 30;
-- Connection: master3
CHANGE MASTER 'master3' TO
MASTER_HOST = 'MASTER3_IP',
MASTER_PORT = 3306,
MASTER_USER = 'repl_replica',
MASTER_PASSWORD = 'REPL_PASSWORD',
MASTER_LOG_FILE = 'mysql-bin.000371',
MASTER_LOG_POS = 44556677,
MASTER_CONNECT_RETRY = 10,
MASTER_HEARTBEAT_PERIOD = 30;
Parameters explained:
| Parameter | Value | Purpose |
|---|---|---|
MASTER_CONNECT_RETRY |
10 | Retry connection every 10 seconds if it drops |
MASTER_HEARTBEAT_PERIOD |
30 | Master sends a heartbeat every 30 seconds if idle, preventing timeout disconnects |
MASTER_LOG_FILE / MASTER_LOG_POS |
From dump | The exact position to start reading binlog events |
Step 2: Start All Connections
-- Start all replication connections at once
START ALL SLAVES;
Or start them individually for more control:
START SLAVE 'master1';
START SLAVE 'master2';
START SLAVE 'master3';
Step 3: Verify Replication Status
Check the status of all connections:
SHOW ALL SLAVES STATUS\G
For each connection, verify:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
Last_IO_Error:
Last_SQL_Error:
Replicate_Rewrite_DB: asterisk->master1_asterisk
The key indicators:
| Field | Expected | Problem If |
|---|---|---|
Slave_IO_Running |
Yes |
Connecting = network/auth issue; No = fatal error |
Slave_SQL_Running |
Yes |
No = SQL apply error (check Last_SQL_Error) |
Seconds_Behind_Master |
0 |
High number = replica is lagging |
Last_IO_Error |
(empty) | Any text here indicates a connection problem |
Last_SQL_Error |
(empty) | Any text here indicates a data apply problem |
You can also check a single connection:
SHOW SLAVE 'master1' STATUS\G
Health Monitoring
Replication Health Check Script
Save this script as /usr/local/bin/check_replication.sh:
#!/bin/bash
# ─────────────────────────────────────────────────────────
# MariaDB Multi-Source Replication Health Monitor
# Checks all named replication connections and logs alerts
# ─────────────────────────────────────────────────────────
# Configuration
CONNECTIONS=("master1" "master2" "master3")
ALERT_LAG=60 # seconds behind master before alerting
LOG_FILE="/var/log/replication_alerts.log"
ERRORS=0
for CONN in "${CONNECTIONS[@]}"; do
STATUS=$(mariadb -e "SHOW SLAVE '$CONN' STATUS\G" 2>/dev/null)
if [ -z "$STATUS" ]; then
continue # Connection not configured yet
fi
IO=$(echo "$STATUS" | grep 'Slave_IO_Running:' | awk '{print $2}')
SQL=$(echo "$STATUS" | grep 'Slave_SQL_Running:' | awk '{print $2}')
LAG=$(echo "$STATUS" | grep 'Seconds_Behind_Master:' | awk '{print $2}')
ERR=$(echo "$STATUS" | grep 'Last_SQL_Error:' | sed 's/.*Last_SQL_Error: //')
# Check IO and SQL threads
if [ "$IO" != "Yes" ] || [ "$SQL" != "Yes" ]; then
echo "$(date): REPLICATION BROKEN [$CONN] — IO=$IO SQL=$SQL Error: $ERR" \
>> "$LOG_FILE"
ERRORS=1
fi
# Check replication lag
if [ "$LAG" != "NULL" ] && [ "$LAG" -gt "$ALERT_LAG" ] 2>/dev/null; then
echo "$(date): REPLICATION LAG [$CONN] — ${LAG}s behind" \
>> "$LOG_FILE"
fi
done
if [ $ERRORS -eq 1 ]; then
exit 1
fi
exit 0
Make it executable:
chmod +x /usr/local/bin/check_replication.sh
What the Script Checks
For each named connection:
- IO Thread -- Is the connection to the master alive and receiving binlog events?
- SQL Thread -- Is the replica successfully applying events to local tables?
- Replication Lag -- Is the replica falling behind? Alerts if lag exceeds 60 seconds.
- SQL Errors -- Captures and logs the specific error text for broken connections.
The script exits with code 1 if any connection is broken, making it compatible with monitoring systems (Nagios, Zabbix, Prometheus node_exporter textfile collector) that check exit codes.
Extending the Monitor
For production deployments, consider adding:
# Email alert (requires mailutils or msmtp)
if [ $ERRORS -eq 1 ]; then
tail -5 "$LOG_FILE" | mail -s "REPLICATION ALERT: $(hostname)" [email protected]
fi
# Prometheus textfile metric
echo "mariadb_replication_ok{connection=\"$CONN\"} $([ \"$IO\" = \"Yes\" ] && [ \"$SQL\" = \"Yes\" ] && echo 1 || echo 0)" \
> /var/lib/prometheus/node-exporter/replication.prom
Cron Setup
Schedule the health monitor to run every 5 minutes:
crontab -e
Add the following line:
*/5 * * * * /usr/local/bin/check_replication.sh
Log Rotation
Create /etc/logrotate.d/replication-alerts:
/var/log/replication_alerts.log {
weekly
rotate 8
compress
missingok
notifempty
create 640 root root
}
This keeps 8 weeks of alert history before rotating out old logs.
Firewall and Security
Replica-Side Firewall (UFW)
The replica should only accept MariaDB connections from the master servers and your monitoring/reporting infrastructure:
# Allow SSH (adjust port as needed)
ufw allow 9322/tcp comment 'SSH'
# Allow MariaDB from each master
ufw allow from MASTER1_IP to any port 3306 comment 'Master1 MariaDB'
ufw allow from MASTER2_IP to any port 3306 comment 'Master2 MariaDB'
ufw allow from MASTER3_IP to any port 3306 comment 'Master3 MariaDB'
# Allow MariaDB from reporting tools (Grafana, report servers, etc.)
ufw allow from GRAFANA_IP to any port 3306 comment 'Grafana'
# Enable the firewall
ufw enable
Verify:
ufw status numbered
Expected output:
To Action From
-- ------ ----
[ 1] 9322/tcp ALLOW IN Anywhere
[ 2] 3306 ALLOW IN MASTER1_IP
[ 3] 3306 ALLOW IN MASTER2_IP
[ 4] 3306 ALLOW IN MASTER3_IP
[ 5] 3306 ALLOW IN GRAFANA_IP
Master-Side Firewall
On each master, allow the replica to connect on port 3306:
# On each master server
ufw allow from REPLICA_IP to any port 3306 comment 'Replication replica'
Database User Security
Create separate users on the replica for different consumers:
-- Read-only reporting user with query timeout
CREATE USER 'report_user'@'ALLOWED_IP'
IDENTIFIED BY 'REPORT_PASSWORD';
GRANT SELECT ON master1_asterisk.* TO 'report_user'@'ALLOWED_IP';
GRANT SELECT ON master2_asterisk.* TO 'report_user'@'ALLOWED_IP';
GRANT SELECT ON master3_asterisk.* TO 'report_user'@'ALLOWED_IP';
-- Apply a per-user query timeout (prevents runaway queries)
ALTER USER 'report_user'@'ALLOWED_IP'
WITH MAX_STATEMENT_TIME 5;
FLUSH PRIVILEGES;
User separation guidelines:
| User | Access | Timeout | Purpose |
|---|---|---|---|
repl_replica (on masters) |
REPLICATION SLAVE only |
N/A | Replication connection |
report_user (on replica) |
SELECT on all replica DBs |
5 seconds | Automated report scripts |
grafana_ro (on replica) |
SELECT on all replica DBs |
5 seconds | Grafana dashboards |
root (on replica) |
Full local access | 300 seconds | Administration only |
Parallel Replication Tuning
MariaDB's parallel replication applies binlog events using multiple threads, significantly improving throughput for write-heavy workloads like ViciDial.
Configuration
The two key settings in 50-server.cnf:
slave_parallel_threads = 4
slave_parallel_mode = optimistic
Parallel Mode Options
| Mode | Behavior | Best For |
|---|---|---|
conservative |
Only parallelizes transactions from the same group commit | InnoDB-heavy workloads; safest |
optimistic |
Applies transactions in parallel, retries on conflict | Mixed MyISAM/InnoDB; best throughput |
aggressive |
Maximizes parallelism; higher conflict rate | Benchmarking; not recommended for production |
minimal |
Very limited parallelism | Debugging |
Why optimistic for ViciDial:
ViciDial's workload is predominantly MyISAM with non-transactional writes. The optimistic mode allows the replica to apply events in parallel and only serializes when it detects a conflict (e.g., two events modifying the same row). With ViciDial's write patterns -- different agents, different calls, different campaigns -- conflicts are rare, and throughput is excellent.
Thread Count Guidelines
| Replica CPU Cores | Suggested slave_parallel_threads |
Notes |
|---|---|---|
| 2-4 | 2 | Conservative; leave cores for queries |
| 4-8 | 4 | Good balance for 3 master connections |
| 8-16 | 8 | Higher throughput if lag is a concern |
| 16+ | 12 | Diminishing returns above 12 |
The slave_parallel_threads setting is global, not per-connection. With 3 connections and 4 threads, each connection's events are processed by the shared thread pool. In practice, this provides excellent throughput because the connections rarely all have heavy traffic simultaneously.
Monitoring Parallel Performance
Check how many transactions have been retried due to conflicts:
SHOW ALL SLAVES STATUS\G
Look at the Retried_transactions field. A small number of retries is normal and expected in optimistic mode. If retries are growing rapidly (thousands per hour), consider switching to conservative mode or investigating the conflicting queries.
Handling Schema Differences
Cross-Version Replication
When masters run different MariaDB versions (e.g., 10.6 and 10.11), keep these points in mind:
The replica should run the same or newer version than the oldest master. MariaDB maintains backward compatibility in the binlog format, so a 10.11 replica can apply events from a 10.6 master.
New system table columns introduced in newer MariaDB versions may appear in DDL statements. The
slave_skip_errorssetting handles most of these gracefully.Storage engine differences are generally not an issue. ViciDial uses MyISAM on all servers. If a master has some InnoDB tables, they replicate as InnoDB on the replica.
Table Structure Differences Between Masters
In a multi-server call center, each server may be running a slightly different ViciDial version or have custom tables. This is fine:
- Tables that exist on Master 1 but not Master 2 simply appear in
master1_asteriskand not inmaster2_asterisk. - If a table has extra columns on one master, the replica's copy of that table will match whichever master it replicates from.
- DDL statements (
ALTER TABLE,CREATE TABLE) replicate and apply to the correct renamed database.
Adding Custom Tables
If you create custom tables on a master (e.g., for a CRM integration), they will automatically replicate to the corresponding database on the replica, as long as they are in the asterisk database and not in the ignore list.
Error Skipping Strategy
Configured Skip Errors
The slave_skip_errors setting in 50-server.cnf tells MariaDB to silently skip specific error codes instead of stopping replication:
slave_skip_errors = 1032,1062,1146,1269,1396,1932
| Error Code | Message | Common Cause |
|---|---|---|
| 1032 | Can't find record |
DELETE/UPDATE for a row that does not exist on the replica (MEMORY table cleared, or initial sync mismatch) |
| 1062 | Duplicate entry |
INSERT for a row that already exists (MEMORY table rebuild, or initial sync overlap) |
| 1146 | Table doesn't exist |
Statement references a table not present on the replica (custom table on one master, or filtered table) |
| 1269 | Can't revoke all privileges |
User management statement that does not apply cleanly |
| 1396 | Operation CREATE USER failed |
User already exists or doesn't exist for CREATE/DROP |
| 1932 | Table ... doesn't exist in engine |
Table file missing on disk but exists in metadata (crash recovery artifact) |
When Is Error Skipping Safe?
Error skipping is safe when:
- The affected tables are transient (MEMORY tables, hopper, live tracking tables).
- The replica is read-only and used only for reporting.
- You are not using the replica for failover or promotion to master.
Error skipping is NOT safe when:
- The replica might be promoted to master.
- You need bit-for-bit consistency between master and replica.
- The errors involve critical data tables (CDRs, agent logs, recording metadata).
Manual Error Skipping
If replication stops on an error that is not in the skip list, investigate first:
-- Check the error
SHOW SLAVE 'master1' STATUS\G
If the error is safe to skip (e.g., a one-time DDL issue), skip it manually:
-- Stop the specific connection
STOP SLAVE 'master1';
-- Skip one event
SET GLOBAL master1.sql_slave_skip_counter = 1;
-- Restart
START SLAVE 'master1';
Warning: Never blindly skip errors on critical data tables. If
vicidial_closer_logorvicidial_log(your CDR tables) are throwing errors, investigate the root cause. Skipping CDR events means lost call records.
Troubleshooting
Problem: Slave_IO_Running: Connecting
The IO thread is trying to connect to the master but cannot establish a connection.
Diagnostic steps:
# 1. Test network connectivity from replica to master
mariadb -h MASTER1_IP -P 3306 -u repl_replica -p -e "SELECT 1;"
# 2. Check if MariaDB is listening on the master
ssh MASTER1 "ss -tlnp | grep 3306"
# 3. Check master's bind-address (must be 0.0.0.0, not 127.0.0.1)
ssh MASTER1 "grep bind-address /etc/my.cnf /etc/mysql/mariadb.conf.d/*.cnf"
# 4. Check firewall on the master
ssh MASTER1 "ufw status | grep 3306"
# 5. Verify the replication user exists and has the right password
ssh MASTER1 "mariadb -e \"SELECT User, Host FROM mysql.user WHERE User = 'repl_replica';\""
Common fixes:
| Cause | Fix |
|---|---|
Master's bind-address = 127.0.0.1 |
Change to 0.0.0.0 and restart MariaDB |
| Firewall blocking port 3306 | ufw allow from REPLICA_IP to any port 3306 |
| Wrong password | Recreate the user with the correct password |
| Master not running | systemctl start mariadb on the master |
| DNS resolution failure | Use IP addresses, not hostnames; ensure skip-name-resolve is set |
Problem: Slave_SQL_Running: No
The SQL thread has stopped due to an error applying a binlog event.
SHOW SLAVE 'master1' STATUS\G
-- Look at: Last_SQL_Error, Last_SQL_Errno
Common errors and fixes:
| Error | Typical Cause | Fix |
|---|---|---|
| 1032: Can't find record | Row deleted on master, never existed on replica | Add 1032 to slave_skip_errors or skip manually |
| 1062: Duplicate entry | Row inserted on master, already exists on replica | Add 1062 to slave_skip_errors or skip manually |
| 1146: Table doesn't exist | Table exists on master but not on replica | Create the table on the replica, or add 1146 to skip list |
| 1007: Database already exists | CREATE DATABASE replayed |
Skip the event |
| Schema mismatch | Column added on master but not on replica | Usually self-resolves (the ALTER will replicate); skip if stuck |
Problem: High Seconds_Behind_Master
The replica is lagging behind one or more masters.
Diagnostic steps:
-- Check which connection is lagging
SHOW ALL SLAVES STATUS\G
-- Check for long-running queries on the replica that might be blocking
SHOW PROCESSLIST;
-- Check for lock waits
SHOW ENGINE INNODB STATUS\G
Common causes and fixes:
| Cause | Fix |
|---|---|
| Slow disk I/O on replica | Move to SSD; check iostat -x 1 |
| Report query holding table locks | Add max_statement_time to limit query duration |
| Single-threaded apply bottleneck | Increase slave_parallel_threads |
| Master had a burst of large operations | Wait; the replica will catch up |
| Network saturation | Check bandwidth; slave_compressed_protocol helps |
Problem: Relay Log Corruption After Crash
If the replica crashes (power loss, OOM kill), relay logs may be incomplete.
-- relay_log_recovery = ON handles this automatically
-- But if it doesn't, manually reset:
STOP SLAVE 'master1';
-- Reset and re-fetch from the last known good position
RESET SLAVE 'master1';
-- Reconfigure with the current master position
CHANGE MASTER 'master1' TO
MASTER_HOST = 'MASTER1_IP',
MASTER_USER = 'repl_replica',
MASTER_PASSWORD = 'REPL_PASSWORD',
MASTER_LOG_FILE = 'mysql-bin.000042',
MASTER_LOG_POS = 12345678;
START SLAVE 'master1';
To find the correct position after a crash, check the relay log info:
-- The Exec_Master_Log_Pos from before the crash
-- is stored in the relay-log.info file or the mysql.slave_master_info table
SELECT * FROM mysql.slave_master_info WHERE Connection_name = 'master1'\G
Problem: Binlog File No Longer Available on Master
If the replica was offline too long, the master may have purged the binlog file that the replica needs.
Last_IO_Error: Got fatal error 1236 from master ... binlog truncated
Fix: You need to re-initialize that connection with a fresh dump:
- Stop the affected connection:
STOP SLAVE 'master1'; - Take a new dump from the master with
--master-data=2 - Import it into the replica database
CHANGE MASTERwith the new position from the dumpSTART SLAVE 'master1';
Prevention: Set expire_logs_days = 7 (or higher) on masters so binlogs are retained long enough to survive replica downtime.
Maintenance Operations
Stopping and Starting Replication
-- Stop all connections (e.g., before replica maintenance)
STOP ALL SLAVES;
-- Start all connections
START ALL SLAVES;
-- Stop/start a single connection
STOP SLAVE 'master1';
START SLAVE 'master1';
After Replica Server Reboot
Because skip_slave_start = ON, replication does not auto-start. After rebooting the replica:
# 1. Check that MariaDB is running
systemctl status mariadb
# 2. Inspect replication state before starting
mariadb -e "SHOW ALL SLAVES STATUS\G" | grep -E 'Connection_name|IO_Running|SQL_Running|Seconds_Behind|Last.*Error'
# 3. Start all connections
mariadb -e "START ALL SLAVES;"
# 4. Verify
mariadb -e "SHOW ALL SLAVES STATUS\G" | grep -E 'Connection_name|IO_Running|SQL_Running|Seconds_Behind'
Adding a New Master
To add a fourth master to an existing setup:
- Add filter rules to
60-multisource.cnf:
master4.replicate_rewrite_db = asterisk->master4_asterisk
master4.replicate_ignore_table = asterisk.vicidial_live_agents
master4.replicate_ignore_table = asterisk.live_sip_channels
# ... (same 12 tables as other connections)
- Restart MariaDB on the replica:
systemctl restart mariadb
- Create the database, import the initial dump, and configure the connection:
CREATE DATABASE IF NOT EXISTS master4_asterisk;
CHANGE MASTER 'master4' TO
MASTER_HOST = 'MASTER4_IP',
MASTER_USER = 'repl_replica',
MASTER_PASSWORD = 'REPL_PASSWORD',
MASTER_LOG_FILE = 'mysql-bin.000001',
MASTER_LOG_POS = 12345;
START SLAVE 'master4';
- Update the monitoring script's
CONNECTIONSarray.
Removing a Master
STOP SLAVE 'master3';
RESET SLAVE 'master3' ALL;
The ALL keyword removes the connection configuration entirely. The replicated data in master3_asterisk remains on disk until you explicitly drop it.
Performance Considerations
Disk I/O
The replica receives write traffic from all masters simultaneously. With 3 masters, expect 3x the write I/O of a single server. Key recommendations:
- Use SSDs. Spinning disks cannot keep up with multi-source write loads from busy call centers.
- Separate relay logs and data files onto different disks if possible (use a symlink or mount point for
/var/lib/mysql/relay-bin*). - Monitor I/O wait:
iostat -x 1should show%utilbelow 80% sustained.
Memory
With ViciDial's MyISAM tables, the key_buffer_size is critical. The key buffer caches index blocks for all MyISAM tables across all databases:
-- Check key buffer utilization
SHOW STATUS LIKE 'Key%';
Calculate hit ratio:
Key cache hit ratio = 1 - (Key_reads / Key_read_requests) * 100
Target 99%+ hit ratio. If Key_reads is growing relative to Key_read_requests, increase key_buffer_size.
Network Bandwidth
With slave_compressed_protocol = ON, a busy ViciDial server generating 10 MB/s of binlog data compresses to roughly 3-4 MB/s over the wire. For 3 masters, budget ~10-12 MB/s sustained bandwidth to the replica.
Table Open Cache
Each replicated database has its own set of tables. With 3 databases of ~400 tables each, you need table_open_cache = 4096 or higher to avoid table open/close churn:
-- Check if the cache is too small
SHOW STATUS LIKE 'Opened_tables';
If Opened_tables is growing rapidly, increase table_open_cache.
Query Performance on Replica
Report queries on the replica compete with replication writes for disk I/O and locks. Best practices:
- Set
max_statement_time = 300globally to kill runaway queries. - Create per-user timeouts (
ALTER USER ... WITH MAX_STATEMENT_TIME 5) for automated report scripts. - Schedule heavy reports during off-peak hours (early morning, weekends).
- Add indexes on the replica for your specific report queries -- these do not need to exist on the masters.
Replica-only indexes: You can safely
CREATE INDEXon the replica for report optimization. Sinceread_only = ONprevents application writes andlog_slave_updates = OFFprevents the DDL from propagating, the index stays local. However, be aware that if the master runsALTER TABLEon the same table, the replica's version of the table will be rebuilt, potentially dropping your custom index. Monitor this and recreate indexes as needed.
Appendix A: Complete File Reference
Files on Each Master
| File | Contents |
|---|---|
/etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf |
server_id, log_bin, binlog_format, expire_logs_days |
Files on Replica
| File | Contents |
|---|---|
/etc/mysql/mariadb.conf.d/50-server.cnf |
Server config, replication engine, resource tuning |
/etc/mysql/mariadb.conf.d/60-multisource.cnf |
Per-connection rewrite and ignore rules |
/usr/local/bin/check_replication.sh |
Health monitoring script |
/var/log/replication_alerts.log |
Alert log written by health monitor |
/var/lib/mysql/replica-slow.log |
Slow query log |
/var/log/mysql/error.log |
MariaDB error log |
/etc/logrotate.d/replication-alerts |
Log rotation config |
User Accounts
| User | Server | Privileges | Purpose |
|---|---|---|---|
repl_replica |
Each master | REPLICATION SLAVE |
Binlog streaming |
report_user |
Replica | SELECT on all *_asterisk DBs |
Report scripts |
grafana_ro |
Replica | SELECT on all *_asterisk DBs |
Grafana dashboards |
Appendix B: Quick Reference Commands
-- Show all replication connections
SHOW ALL SLAVES STATUS\G
-- Show one connection
SHOW SLAVE 'master1' STATUS\G
-- Start/stop all
START ALL SLAVES;
STOP ALL SLAVES;
-- Start/stop one
START SLAVE 'master1';
STOP SLAVE 'master1';
-- Skip one event on a connection
STOP SLAVE 'master1';
SET GLOBAL master1.sql_slave_skip_counter = 1;
START SLAVE 'master1';
-- Remove a connection entirely
STOP SLAVE 'master3';
RESET SLAVE 'master3' ALL;
-- Check relay log space usage
SHOW ALL SLAVES STATUS\G
-- Look at: Relay_Log_Space (bytes per connection)
-- Check replication filters in effect
SHOW ALL SLAVES STATUS\G
-- Look at: Replicate_Rewrite_DB, Replicate_Ignore_Table
-- Quick health one-liner
mariadb -e "SHOW ALL SLAVES STATUS\G" \
| grep -E 'Connection_name|IO_Running|SQL_Running|Seconds_Behind|Last.*Error:'
Appendix C: Capacity Planning
Estimating Replica Storage
Replica disk = (Master1 data size) + (Master2 data size) + (Master3 data size)
+ relay log space (~1 GB per connection)
+ temp tables for queries
+ 20% headroom
For a typical ViciDial deployment with 2 years of CDR history:
| Component | Size |
|---|---|
master1_asterisk |
15-40 GB |
master2_asterisk |
15-40 GB |
master3_asterisk |
15-40 GB |
| Relay logs | ~3 GB |
| Slow query log | < 1 GB |
| System + overhead | ~5 GB |
| Total | 60-130 GB |
When to Archive
ViciDial's vicidial_log, vicidial_closer_log, and recording_log tables grow indefinitely. On the replica, consider:
- Partitioning by month (replica-only) for fast
DROP PARTITIONcleanup. - Archive cron that moves records older than N days to archive tables.
- Setting up data retention on the masters (ViciDial's built-in archiving moves old records to
_archivetables).
This tutorial documents a production multi-source replication deployment for VoIP call centers. Adapt connection names, IP addresses, and table lists to your specific environment. Always test configuration changes on a staging system before applying to production.