ViciDial's vicidiallog, vicidialcloserlog, and recordinglog tables grow indefinitely. On the replica, consider: 1. Partitioning by month (replica-only) for fast DROP PARTITION cleanup. 2. Archive cron that moves records older than N days to archive tables.

MariaDB Multi-Source Replication for VoIP Call Centers

3 Masters to 1 Read-Only Replica with Database Renaming

Introduction
Architecture Overview
Prerequisites
Master-Side Configuration
Replica-Side Configuration
Multi-Source Filter Configuration
Initial Data Load
Starting Replication
Health Monitoring
Cron Setup
Firewall and Security
Parallel Replication Tuning
Handling Schema Differences
Error Skipping Strategy
Troubleshooting
Maintenance Operations
Performance Considerations

Introduction

VoIP call centers running ViciDial or similar Asterisk-based dialers present a unique database challenge: each production server maintains its own asterisk database containing call detail records (CDRs), agent performance logs, recording metadata, and campaign statistics. These databases are under constant heavy write pressure from live call traffic -- thousands of INSERT and UPDATE operations per minute during peak hours.

Running analytical queries directly on production is dangerous. A slow report query can lock MyISAM tables, block agent screen updates, and cause call handling delays. Multiply that risk across three or more production servers and you have a recipe for outages.

Multi-source replication solves this cleanly. By replicating all production databases into a single read-only replica, you get:

Centralized reporting -- One database server to query for cross-site analytics, agent comparisons, and unified dashboards (Grafana, Metabase, custom PHP reports).
Zero production impact -- Reports never touch production. The replica absorbs all read traffic.
Disaster recovery data -- The replica holds a near-real-time copy of all critical tables. Not a backup, but a valuable secondary copy.
Operational isolation -- Report users get restricted accounts on the replica with query timeouts, so a runaway JOIN cannot harm anything.

MariaDB's multi-source replication (available since 10.0) is purpose-built for this fan-in topology. Each master connection is independently configured, monitored, and recoverable. Combined with replicate_rewrite_db, you can map every master's identically-named asterisk database to a unique name on the replica -- no schema changes needed on production.

This tutorial walks through the complete setup as deployed in a production VoIP environment: three ViciDial servers (running MariaDB 10.6 through 10.11) replicating to a Debian 12 replica running MariaDB 10.11. Every configuration file, monitoring script, and troubleshooting procedure comes from a working deployment handling tens of millions of replicated events.

Architecture Overview

 ┌──────────────────────┐    ┌──────────────────────┐    ┌──────────────────────┐
 │   MASTER 1 (master1) │    │   MASTER 2 (master2) │    │   MASTER 3 (master3) │
 │   MariaDB 10.11      │    │   MariaDB 10.11      │    │   MariaDB 10.6       │
 │   server_id = 1      │    │   server_id = 2      │    │   server_id = 3      │
 │   DB: asterisk        │    │   DB: asterisk        │    │   DB: asterisk        │
 │   binlog: MIXED       │    │   binlog: MIXED       │    │   binlog: MIXED       │
 │                       │    │                       │    │                       │
 │  ViciDial + Asterisk  │    │  ViciDial + Asterisk  │    │  ViciDial + Asterisk  │
 │  (Live call traffic)  │    │  (Live call traffic)  │    │  (Live call traffic)  │
 └──────────┬────────────┘    └──────────┬────────────┘    └──────────┬────────────┘
            │ :3306                      │ :3306                      │ :3306
            │ binlog stream              │ binlog stream              │ binlog stream
            │ (compressed)               │ (compressed)               │ (compressed)
            │                            │                            │
            └────────────┐               │               ┌───────────┘
                         │               │               │
                         ▼               ▼               ▼
              ┌──────────────────────────────────────────────────┐
              │              REPLICA SERVER                       │
              │              MariaDB 10.11.14                     │
              │              Debian 12                            │
              │              server_id = 10                       │
              │              read_only = ON                       │
              │                                                   │
              │   Connection "master1":                            │
              │     asterisk  ──rewrite──▶  master1_asterisk      │
              │     IO thread + 4 parallel SQL threads            │
              │                                                   │
              │   Connection "master2":                            │
              │     asterisk  ──rewrite──▶  master2_asterisk      │
              │     IO thread + 4 parallel SQL threads            │
              │                                                   │
              │   Connection "master3":                            │
              │     asterisk  ──rewrite──▶  master3_asterisk      │
              │     IO thread + 4 parallel SQL threads            │
              │                                                   │
              │   No binary log (terminal slave)                  │
              │   log_slave_updates = OFF                          │
              │                                                   │
              │   Consumers:                                       │
              │     • Grafana dashboards                           │
              │     • Report scripts (PHP/Python)                  │
              │     • Read-only user accounts                      │
              │                                                   │
              └──────────────────────────────────────────────────────┘

Key design decisions:

Decision	Rationale
Position-based replication (not GTID)	Simpler to set up; GTID adds complexity with no benefit for a terminal read-only slave
`log_slave_updates = OFF`	Replica never needs to act as a master; disabling saves disk I/O
`skip_slave_start = ON`	Prevents broken connections from auto-starting after a crash; manual start allows inspection
`slave_compressed_protocol = ON`	Reduces bandwidth 60-70% for WAN replication between data centers
4 parallel SQL threads, optimistic mode	Maximizes apply throughput on multi-core hardware without strict ordering overhead
`slave_skip_errors` for transient errors	ViciDial's MEMORY tables and non-transactional MyISAM writes produce harmless conflicts
12 tables filtered per connection	Transient/MEMORY tables that reset on restart; replicating them wastes resources and causes errors

Prerequisites

Software Versions

Replica: MariaDB 10.5+ (10.11 recommended for best parallel replication support)
Masters: MariaDB 10.0+ (multi-source is a MariaDB feature; MySQL does not support it natively)
OS: Any Linux distribution. This guide uses Debian 12 for the replica.

Version compatibility note: MariaDB handles cross-version replication well when the replica runs the same or newer version than the masters. Replicating from 10.6 masters to a 10.11 replica works without issues. Avoid replicating from a newer master to an older replica.

Hardware Sizing (Replica)

For a call center replicating ~100 million events across 3 connections:

Resource	Minimum	Recommended
CPU	4 cores	8 cores
RAM	8 GB	16 GB
Storage	2x production data size	3x production data size (SSD)
Network	100 Mbps	1 Gbps

The replica needs enough RAM for MyISAM key buffers (ViciDial uses MyISAM extensively) and InnoDB buffer pool (for any InnoDB tables). Budget 2 GB for key_buffer_size and 1 GB for innodb_buffer_pool_size as starting points.

Network Requirements

The replica must be able to reach each master on port 3306 (or your configured MariaDB port).
Each master's firewall must allow inbound connections from the replica IP on port 3306.
Latency under 100ms is ideal; multi-source replication works fine across continents but higher latency increases Seconds_Behind_Master.

Master-Side Configuration

Each master server needs three things: binary logging enabled, a unique server ID, and a dedicated replication user.

1. Enable Binary Logging

Add or verify these settings in your master's MariaDB configuration (typically /etc/my.cnf or /etc/mysql/mariadb.conf.d/50-server.cnf):

[mysqld]
server-id              = 1                  # Unique per master: 1, 2, 3, etc.
log_bin                = mysql-bin
binlog_format          = MIXED              # MIXED is the ViciDial default
expire_logs_days       = 7                  # Keep binlogs for 7 days
log_bin_compress       = ON                 # Compress binlog events (10.2.3+)
log_bin_compress_min_len = 256              # Only compress events > 256 bytes

Important: Every server in the replication topology (masters and replica) must have a globally unique server_id. A common scheme:

Server	server_id
Master 1	1
Master 2	2
Master 3	3
Replica	10

Restart MariaDB after changing the configuration:

systemctl restart mariadb

Verify binary logging is active:

SHOW MASTER STATUS\G

You should see output like:

File: mysql-bin.000042
Position: 12345678
Binlog_Do_DB:
Binlog_Ignore_DB:

2. Create the Replication User

On each master, create a dedicated user with only REPLICATION SLAVE privilege. Lock it down to the replica's IP address:

CREATE USER 'repl_replica'@'REPLICA_IP'
  IDENTIFIED BY 'REPL_PASSWORD';

GRANT REPLICATION SLAVE ON *.*
  TO 'repl_replica'@'REPLICA_IP';

FLUSH PRIVILEGES;

Security notes:

Use a strong, unique password (32+ characters recommended).
The REPLICATION SLAVE privilege is the minimum required. Do not grant SELECT, SUPER, or any other privilege.
Restrict the user to the replica's exact IP address, not %.
Use the same username on all masters for consistency, but each master has its own independent user entry.

Verify the grant:

SHOW GRANTS FOR 'repl_replica'@'REPLICA_IP';

Expected output:

GRANT REPLICATION SLAVE ON *.* TO `repl_replica`@`REPLICA_IP`

3. Record Binary Log Position

Before setting up replication, note each master's current binary log file and position. You will need these values during the CHANGE MASTER step.

SHOW MASTER STATUS\G

Record the File and Position values for each master. If you are doing an initial data load with mysqldump --master-data, the dump file will contain the position automatically.

Replica-Side Configuration

The replica configuration is split across two files for clarity:

50-server.cnf -- Server identity, resource tuning, replication engine settings
60-multisource.cnf -- Per-connection rewrite rules and table filters

Main Server Configuration: `50-server.cnf`

Place this file at /etc/mysql/mariadb.conf.d/50-server.cnf (Debian/Ubuntu) or include it from your main my.cnf.

[mysqld]

# ─── Identity ───
server-id               = 10
pid-file                = /run/mysqld/mysqld.pid
basedir                 = /usr

# ─── Network ───
bind-address            = 0.0.0.0
skip-name-resolve
port                    = 3306
max_connections         = 200
connect_timeout         = 60
wait_timeout            = 28800
interactive_timeout     = 28800
max_allowed_packet      = 16M

# ─── Security ───
read_only               = ON

# ─── Character Set ───
character_set_server    = utf8mb4
collation_server        = utf8mb4_unicode_520_ci

# ─── MyISAM Tuning ───
# ViciDial uses MyISAM extensively. Size key_buffer to ~25% of RAM.
default_storage_engine  = MyISAM
key_buffer_size         = 2G
myisam_sort_buffer_size = 128M
myisam_recover_options  = BACKUP,FORCE
myisam_use_mmap         = 1

# ─── InnoDB (for any InnoDB tables that replicate) ───
innodb_buffer_pool_size = 1G
innodb_file_per_table   = ON
innodb_flush_method     = O_DIRECT
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size  = 8M

# ─── Memory & Cache ───
table_open_cache        = 4096
table_definition_cache  = 16384
sort_buffer_size        = 4M
read_buffer_size        = 4M
read_rnd_buffer_size    = 16M
join_buffer_size        = 1M
thread_cache_size       = 50
max_heap_table_size     = 512M
tmp_table_size          = 256M
open_files_limit        = 24576
query_cache_size        = 0
query_cache_type        = 0

# ─── Replication — Relay Logs ───
relay_log               = /var/lib/mysql/relay-bin
relay_log_recovery      = ON
relay_log_purge         = ON
max_relay_log_size      = 256M

# ─── Replication — Parallel Apply ───
slave_parallel_threads  = 4
slave_parallel_mode     = optimistic

# ─── Replication — Network ───
slave_net_timeout       = 60
slave_compressed_protocol = ON

# ─── Replication — Safety ───
skip_slave_start        = ON
slave_skip_errors       = 1032,1062,1146,1269,1396,1932

# ─── Replication — GTID ───
gtid_strict_mode        = OFF

# ─── Replication — No binlog on replica (terminal slave) ───
log_slave_updates       = OFF

# ─── Binary Log Expiry ───
expire_logs_days        = 3

# ─── Slow Query Log ───
slow_query_log          = 1
slow_query_log_file     = /var/lib/mysql/replica-slow.log
long_query_time         = 5
log_slow_verbosity      = query_plan,explain

# ─── Query Timeout Safety Net ───
max_statement_time      = 300

# ─── Error Log ───
log_error               = /var/log/mysql/error.log
log_warnings            = 2

Key settings explained:

Setting	Value	Purpose
`read_only = ON`	Prevents accidental writes from non-SUPER users	Safety net for a reporting replica
`skip_slave_start = ON`	Replication does not auto-start on server boot	Allows you to inspect state before starting
`slave_compressed_protocol = ON`	Compresses the replication stream	Reduces WAN bandwidth by 60-70%
`slave_parallel_threads = 4`	4 SQL applier threads per connection	Increases apply throughput
`slave_parallel_mode = optimistic`	Threads apply in parallel, roll back on conflict	Best throughput for MyISAM workloads
`log_slave_updates = OFF`	Do not write applied events to local binlog	Saves disk I/O; this is a terminal slave
`relay_log_recovery = ON`	Reprocess relay logs after crash	Prevents corruption from incomplete relay logs
`max_statement_time = 300`	Kill queries running longer than 5 minutes	Protects against runaway report queries
`key_buffer_size = 2G`	MyISAM index cache	Must be large enough for ViciDial's index-heavy tables
`query_cache_size = 0`	Disabled	Query cache causes mutex contention under replication writes
`slave_skip_errors`	1032,1062,1146,...	Skip transient errors from MEMORY tables and MyISAM quirks

Multi-Source Filter Configuration

Filter Configuration: `60-multisource.cnf`

Place this file at /etc/mysql/mariadb.conf.d/60-multisource.cnf:

[mysqld]
# ─── Multi-Source Replication Filters ───
# Map each master's 'asterisk' database to a unique local database

master1.replicate_rewrite_db = asterisk->master1_asterisk
master2.replicate_rewrite_db = asterisk->master2_asterisk
master3.replicate_rewrite_db = asterisk->master3_asterisk

# Skip transient/MEMORY tables for all connections
master1.replicate_ignore_table = asterisk.vicidial_live_agents
master1.replicate_ignore_table = asterisk.live_sip_channels
master1.replicate_ignore_table = asterisk.live_channels
master1.replicate_ignore_table = asterisk.vicidial_auto_calls
master1.replicate_ignore_table = asterisk.server_updater
master1.replicate_ignore_table = asterisk.web_client_sessions
master1.replicate_ignore_table = asterisk.vicidial_hopper
master1.replicate_ignore_table = asterisk.vicidial_campaign_server_status
master1.replicate_ignore_table = asterisk.parked_channels
master1.replicate_ignore_table = asterisk.vicidial_manager
master1.replicate_ignore_table = asterisk.cid_channels_recent
master1.replicate_ignore_table = asterisk.sip_packets

master2.replicate_ignore_table = asterisk.vicidial_live_agents
master2.replicate_ignore_table = asterisk.live_sip_channels
master2.replicate_ignore_table = asterisk.live_channels
master2.replicate_ignore_table = asterisk.vicidial_auto_calls
master2.replicate_ignore_table = asterisk.server_updater
master2.replicate_ignore_table = asterisk.web_client_sessions
master2.replicate_ignore_table = asterisk.vicidial_hopper
master2.replicate_ignore_table = asterisk.vicidial_campaign_server_status
master2.replicate_ignore_table = asterisk.parked_channels
master2.replicate_ignore_table = asterisk.vicidial_manager
master2.replicate_ignore_table = asterisk.cid_channels_recent
master2.replicate_ignore_table = asterisk.sip_packets

master3.replicate_ignore_table = asterisk.vicidial_live_agents
master3.replicate_ignore_table = asterisk.live_sip_channels
master3.replicate_ignore_table = asterisk.live_channels
master3.replicate_ignore_table = asterisk.vicidial_auto_calls
master3.replicate_ignore_table = asterisk.server_updater
master3.replicate_ignore_table = asterisk.web_client_sessions
master3.replicate_ignore_table = asterisk.vicidial_hopper
master3.replicate_ignore_table = asterisk.vicidial_campaign_server_status
master3.replicate_ignore_table = asterisk.parked_channels
master3.replicate_ignore_table = asterisk.vicidial_manager
master3.replicate_ignore_table = asterisk.cid_channels_recent
master3.replicate_ignore_table = asterisk.sip_packets

Understanding the Filters

replicate_rewrite_db is the cornerstone of this setup. Every master has a database called asterisk. Without rewriting, all three masters would try to write into the same asterisk database on the replica, causing collisions. The rewrite rule transparently maps:

master1: asterisk  →  master1_asterisk
master2: asterisk  →  master2_asterisk
master3: asterisk  →  master3_asterisk

The connection name prefix (e.g., master1.) scopes each filter to its specific replication connection. This is the MariaDB multi-source syntax -- each connection has its own independent filter set.

replicate_ignore_table filters out tables that should never be replicated:

Table	Type	Why Skip
`vicidial_live_agents`	MEMORY	Volatile agent state; rebuilt on Asterisk restart
`live_sip_channels`	MEMORY	Active SIP channel tracking; changes thousands of times per minute
`live_channels`	MEMORY	Active Asterisk channel list
`vicidial_auto_calls`	MEMORY	Predictive dialer's active call queue
`server_updater`	MyISAM	Server heartbeat timestamps; master-specific
`web_client_sessions`	MEMORY	Agent web UI sessions
`vicidial_hopper`	MyISAM	Dialer lead hopper; extremely high churn
`vicidial_campaign_server_status`	MEMORY	Campaign runtime stats
`parked_channels`	MEMORY	Parked call tracking
`vicidial_manager`	MEMORY	Asterisk AMI command queue
`cid_channels_recent`	MEMORY	Recent caller ID cache
`sip_packets`	MyISAM	SIP capture data; extremely high volume (can generate GB/day)

Critical detail: The replicate_ignore_table filter uses the original database name (asterisk), not the rewritten name. MariaDB applies ignore filters before the rewrite. If you used master1_asterisk.vicidial_live_agents, the filter would not match.

Why These Tables Cause Replication Errors

MEMORY tables are emptied when the master's MariaDB restarts. The master then logs DELETE or TRUNCATE statements in the binlog. If the replica has already lost those rows (or never had them), you get:

Error 1032 (Can't find record in ...) -- Trying to delete or update a row that does not exist on the replica.
Error 1062 (Duplicate entry ...) -- Trying to insert a row that already exists.

The vicidial_hopper table sees thousands of INSERT/DELETE cycles per minute as the dialer loads and processes leads. Replicating it is pointless for reporting and a constant source of errors.

Initial Data Load

Before starting replication, you need to seed the replica with a snapshot of each master's data. This establishes the baseline from which replication will continue.

Step 1: Dump Each Master

On each master, run mysqldump with --master-data to embed the binary log position in the dump file:

# On Master 1
mysqldump \
  --master-data=2 \
  --single-transaction \
  --routines \
  --triggers \
  --databases asterisk \
  --ignore-table=asterisk.vicidial_live_agents \
  --ignore-table=asterisk.live_sip_channels \
  --ignore-table=asterisk.live_channels \
  --ignore-table=asterisk.vicidial_auto_calls \
  --ignore-table=asterisk.server_updater \
  --ignore-table=asterisk.web_client_sessions \
  --ignore-table=asterisk.vicidial_hopper \
  --ignore-table=asterisk.vicidial_campaign_server_status \
  --ignore-table=asterisk.parked_channels \
  --ignore-table=asterisk.vicidial_manager \
  --ignore-table=asterisk.cid_channels_recent \
  --ignore-table=asterisk.sip_packets \
  | gzip > /tmp/master1_dump.sql.gz

Important flags:

--master-data=2 -- Writes the CHANGE MASTER TO statement as a comment in the dump. You will extract the log file and position from this.
--single-transaction -- For InnoDB tables, provides a consistent snapshot without locking. MyISAM tables will still lock briefly.
--ignore-table -- Skip the same MEMORY and transient tables we filter in replication. No point dumping them.

For large databases (50GB+): Consider using mariadb-backup (Mariabackup) instead of mysqldump. It performs a physical backup that is significantly faster for large datasets. The binary log position is recorded in xtrabackup_binlog_info.

Step 2: Transfer Dumps to Replica

scp -P 9322 /tmp/master1_dump.sql.gz REPLICA_IP:/tmp/

Repeat for each master.

Step 3: Create Target Databases and Import

On the replica, create the renamed databases and import each dump:

# Create the target databases
mariadb -e "CREATE DATABASE IF NOT EXISTS master1_asterisk;"
mariadb -e "CREATE DATABASE IF NOT EXISTS master2_asterisk;"
mariadb -e "CREATE DATABASE IF NOT EXISTS master3_asterisk;"

# Import Master 1 (rewrite database name during import)
zcat /tmp/master1_dump.sql.gz \
  | sed 's/`asterisk`/`master1_asterisk`/g' \
  | mariadb

# Import Master 2
zcat /tmp/master2_dump.sql.gz \
  | sed 's/`asterisk`/`master2_asterisk`/g' \
  | mariadb

# Import Master 3
zcat /tmp/master3_dump.sql.gz \
  | sed 's/`asterisk`/`master3_asterisk`/g' \
  | mariadb

Step 4: Extract Binary Log Positions

Find the binary log coordinates embedded in each dump file:

zcat /tmp/master1_dump.sql.gz | head -30 | grep "CHANGE MASTER"

You will see a line like:

-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000042', MASTER_LOG_POS=12345678;

Record the MASTER_LOG_FILE and MASTER_LOG_POS values for each master. You will need them in the next section.

Starting Replication

Step 1: Configure Each Connection

MariaDB multi-source replication uses named connections. Each CHANGE MASTER statement includes a connection name that matches the prefixes in your 60-multisource.cnf filter configuration.

-- Connection: master1
CHANGE MASTER 'master1' TO
  MASTER_HOST     = 'MASTER1_IP',
  MASTER_PORT     = 3306,
  MASTER_USER     = 'repl_replica',
  MASTER_PASSWORD = 'REPL_PASSWORD',
  MASTER_LOG_FILE = 'mysql-bin.000042',
  MASTER_LOG_POS  = 12345678,
  MASTER_CONNECT_RETRY = 10,
  MASTER_HEARTBEAT_PERIOD = 30;

-- Connection: master2
CHANGE MASTER 'master2' TO
  MASTER_HOST     = 'MASTER2_IP',
  MASTER_PORT     = 3306,
  MASTER_USER     = 'repl_replica',
  MASTER_PASSWORD = 'REPL_PASSWORD',
  MASTER_LOG_FILE = 'mysql-bin.000055',
  MASTER_LOG_POS  = 87654321,
  MASTER_CONNECT_RETRY = 10,
  MASTER_HEARTBEAT_PERIOD = 30;

-- Connection: master3
CHANGE MASTER 'master3' TO
  MASTER_HOST     = 'MASTER3_IP',
  MASTER_PORT     = 3306,
  MASTER_USER     = 'repl_replica',
  MASTER_PASSWORD = 'REPL_PASSWORD',
  MASTER_LOG_FILE = 'mysql-bin.000371',
  MASTER_LOG_POS  = 44556677,
  MASTER_CONNECT_RETRY = 10,
  MASTER_HEARTBEAT_PERIOD = 30;

Parameters explained:

Parameter	Value	Purpose
`MASTER_CONNECT_RETRY`	10	Retry connection every 10 seconds if it drops
`MASTER_HEARTBEAT_PERIOD`	30	Master sends a heartbeat every 30 seconds if idle, preventing timeout disconnects
`MASTER_LOG_FILE` / `MASTER_LOG_POS`	From dump	The exact position to start reading binlog events

Step 2: Start All Connections

-- Start all replication connections at once
START ALL SLAVES;

Or start them individually for more control:

START SLAVE 'master1';
START SLAVE 'master2';
START SLAVE 'master3';

Step 3: Verify Replication Status

Check the status of all connections:

SHOW ALL SLAVES STATUS\G

For each connection, verify:

Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
Last_IO_Error:
Last_SQL_Error:
Replicate_Rewrite_DB: asterisk->master1_asterisk

The key indicators:

Field	Expected	Problem If
`Slave_IO_Running`	`Yes`	`Connecting` = network/auth issue; `No` = fatal error
`Slave_SQL_Running`	`Yes`	`No` = SQL apply error (check `Last_SQL_Error`)
`Seconds_Behind_Master`	`0`	High number = replica is lagging
`Last_IO_Error`	(empty)	Any text here indicates a connection problem
`Last_SQL_Error`	(empty)	Any text here indicates a data apply problem

You can also check a single connection:

SHOW SLAVE 'master1' STATUS\G

Health Monitoring

Replication Health Check Script

Save this script as /usr/local/bin/check_replication.sh:

#!/bin/bash
# ─────────────────────────────────────────────────────────
# MariaDB Multi-Source Replication Health Monitor
# Checks all named replication connections and logs alerts
# ─────────────────────────────────────────────────────────

# Configuration
CONNECTIONS=("master1" "master2" "master3")
ALERT_LAG=60          # seconds behind master before alerting
LOG_FILE="/var/log/replication_alerts.log"
ERRORS=0

for CONN in "${CONNECTIONS[@]}"; do
    STATUS=$(mariadb -e "SHOW SLAVE '$CONN' STATUS\G" 2>/dev/null)

    if [ -z "$STATUS" ]; then
        continue  # Connection not configured yet
    fi

    IO=$(echo "$STATUS" | grep 'Slave_IO_Running:' | awk '{print $2}')
    SQL=$(echo "$STATUS" | grep 'Slave_SQL_Running:' | awk '{print $2}')
    LAG=$(echo "$STATUS" | grep 'Seconds_Behind_Master:' | awk '{print $2}')
    ERR=$(echo "$STATUS" | grep 'Last_SQL_Error:' | sed 's/.*Last_SQL_Error: //')

    # Check IO and SQL threads
    if [ "$IO" != "Yes" ] || [ "$SQL" != "Yes" ]; then
        echo "$(date): REPLICATION BROKEN [$CONN] — IO=$IO SQL=$SQL Error: $ERR" \
            >> "$LOG_FILE"
        ERRORS=1
    fi

    # Check replication lag
    if [ "$LAG" != "NULL" ] && [ "$LAG" -gt "$ALERT_LAG" ] 2>/dev/null; then
        echo "$(date): REPLICATION LAG [$CONN] — ${LAG}s behind" \
            >> "$LOG_FILE"
    fi
done

if [ $ERRORS -eq 1 ]; then
    exit 1
fi
exit 0

Make it executable:

chmod +x /usr/local/bin/check_replication.sh

What the Script Checks

For each named connection:

IO Thread -- Is the connection to the master alive and receiving binlog events?
SQL Thread -- Is the replica successfully applying events to local tables?
Replication Lag -- Is the replica falling behind? Alerts if lag exceeds 60 seconds.
SQL Errors -- Captures and logs the specific error text for broken connections.

The script exits with code 1 if any connection is broken, making it compatible with monitoring systems (Nagios, Zabbix, Prometheus node_exporter textfile collector) that check exit codes.

Extending the Monitor

For production deployments, consider adding:

# Email alert (requires mailutils or msmtp)
if [ $ERRORS -eq 1 ]; then
    tail -5 "$LOG_FILE" | mail -s "REPLICATION ALERT: $(hostname)" [email protected]
fi

# Prometheus textfile metric
echo "mariadb_replication_ok{connection=\"$CONN\"} $([ \"$IO\" = \"Yes\" ] && [ \"$SQL\" = \"Yes\" ] && echo 1 || echo 0)" \
    > /var/lib/prometheus/node-exporter/replication.prom

Cron Setup

Schedule the health monitor to run every 5 minutes:

crontab -e

Add the following line:

*/5 * * * * /usr/local/bin/check_replication.sh

Log Rotation

Create /etc/logrotate.d/replication-alerts:

/var/log/replication_alerts.log {
    weekly
    rotate 8
    compress
    missingok
    notifempty
    create 640 root root
}

This keeps 8 weeks of alert history before rotating out old logs.

Firewall and Security

Replica-Side Firewall (UFW)

The replica should only accept MariaDB connections from the master servers and your monitoring/reporting infrastructure:

# Allow SSH (adjust port as needed)
ufw allow 9322/tcp comment 'SSH'

# Allow MariaDB from each master
ufw allow from MASTER1_IP to any port 3306 comment 'Master1 MariaDB'
ufw allow from MASTER2_IP to any port 3306 comment 'Master2 MariaDB'
ufw allow from MASTER3_IP to any port 3306 comment 'Master3 MariaDB'

# Allow MariaDB from reporting tools (Grafana, report servers, etc.)
ufw allow from GRAFANA_IP to any port 3306 comment 'Grafana'

# Enable the firewall
ufw enable

Verify:

ufw status numbered

Expected output:

     To                         Action      From
     --                         ------      ----
[ 1] 9322/tcp                   ALLOW IN    Anywhere
[ 2] 3306                       ALLOW IN    MASTER1_IP
[ 3] 3306                       ALLOW IN    MASTER2_IP
[ 4] 3306                       ALLOW IN    MASTER3_IP
[ 5] 3306                       ALLOW IN    GRAFANA_IP

Master-Side Firewall

On each master, allow the replica to connect on port 3306:

# On each master server
ufw allow from REPLICA_IP to any port 3306 comment 'Replication replica'

Database User Security

Create separate users on the replica for different consumers:

-- Read-only reporting user with query timeout
CREATE USER 'report_user'@'ALLOWED_IP'
  IDENTIFIED BY 'REPORT_PASSWORD';

GRANT SELECT ON master1_asterisk.* TO 'report_user'@'ALLOWED_IP';
GRANT SELECT ON master2_asterisk.* TO 'report_user'@'ALLOWED_IP';
GRANT SELECT ON master3_asterisk.* TO 'report_user'@'ALLOWED_IP';

-- Apply a per-user query timeout (prevents runaway queries)
ALTER USER 'report_user'@'ALLOWED_IP'
  WITH MAX_STATEMENT_TIME 5;

FLUSH PRIVILEGES;

User separation guidelines:

User	Access	Timeout	Purpose
`repl_replica` (on masters)	`REPLICATION SLAVE` only	N/A	Replication connection
`report_user` (on replica)	`SELECT` on all replica DBs	5 seconds	Automated report scripts
`grafana_ro` (on replica)	`SELECT` on all replica DBs	5 seconds	Grafana dashboards
`root` (on replica)	Full local access	300 seconds	Administration only

Parallel Replication Tuning

MariaDB's parallel replication applies binlog events using multiple threads, significantly improving throughput for write-heavy workloads like ViciDial.

Configuration

The two key settings in 50-server.cnf:

slave_parallel_threads  = 4
slave_parallel_mode     = optimistic

Parallel Mode Options

Mode	Behavior	Best For
`conservative`	Only parallelizes transactions from the same group commit	InnoDB-heavy workloads; safest
`optimistic`	Applies transactions in parallel, retries on conflict	Mixed MyISAM/InnoDB; best throughput
`aggressive`	Maximizes parallelism; higher conflict rate	Benchmarking; not recommended for production
`minimal`	Very limited parallelism	Debugging

Why optimistic for ViciDial:

ViciDial's workload is predominantly MyISAM with non-transactional writes. The optimistic mode allows the replica to apply events in parallel and only serializes when it detects a conflict (e.g., two events modifying the same row). With ViciDial's write patterns -- different agents, different calls, different campaigns -- conflicts are rare, and throughput is excellent.

Thread Count Guidelines

Replica CPU Cores	Suggested `slave_parallel_threads`	Notes
2-4	2	Conservative; leave cores for queries
4-8	4	Good balance for 3 master connections
8-16	8	Higher throughput if lag is a concern
16+	12	Diminishing returns above 12

The slave_parallel_threads setting is global, not per-connection. With 3 connections and 4 threads, each connection's events are processed by the shared thread pool. In practice, this provides excellent throughput because the connections rarely all have heavy traffic simultaneously.

Monitoring Parallel Performance

Check how many transactions have been retried due to conflicts:

SHOW ALL SLAVES STATUS\G

Look at the Retried_transactions field. A small number of retries is normal and expected in optimistic mode. If retries are growing rapidly (thousands per hour), consider switching to conservative mode or investigating the conflicting queries.

Handling Schema Differences

Cross-Version Replication

When masters run different MariaDB versions (e.g., 10.6 and 10.11), keep these points in mind:

The replica should run the same or newer version than the oldest master. MariaDB maintains backward compatibility in the binlog format, so a 10.11 replica can apply events from a 10.6 master.
New system table columns introduced in newer MariaDB versions may appear in DDL statements. The slave_skip_errors setting handles most of these gracefully.
Storage engine differences are generally not an issue. ViciDial uses MyISAM on all servers. If a master has some InnoDB tables, they replicate as InnoDB on the replica.

Table Structure Differences Between Masters

In a multi-server call center, each server may be running a slightly different ViciDial version or have custom tables. This is fine:

Tables that exist on Master 1 but not Master 2 simply appear in master1_asterisk and not in master2_asterisk.
If a table has extra columns on one master, the replica's copy of that table will match whichever master it replicates from.
DDL statements (ALTER TABLE, CREATE TABLE) replicate and apply to the correct renamed database.

Adding Custom Tables

If you create custom tables on a master (e.g., for a CRM integration), they will automatically replicate to the corresponding database on the replica, as long as they are in the asterisk database and not in the ignore list.

Error Skipping Strategy

Configured Skip Errors

The slave_skip_errors setting in 50-server.cnf tells MariaDB to silently skip specific error codes instead of stopping replication:

slave_skip_errors = 1032,1062,1146,1269,1396,1932

Error Code	Message	Common Cause
1032	`Can't find record`	DELETE/UPDATE for a row that does not exist on the replica (MEMORY table cleared, or initial sync mismatch)
1062	`Duplicate entry`	INSERT for a row that already exists (MEMORY table rebuild, or initial sync overlap)
1146	`Table doesn't exist`	Statement references a table not present on the replica (custom table on one master, or filtered table)
1269	`Can't revoke all privileges`	User management statement that does not apply cleanly
1396	`Operation CREATE USER failed`	User already exists or doesn't exist for CREATE/DROP
1932	`Table ... doesn't exist in engine`	Table file missing on disk but exists in metadata (crash recovery artifact)

When Is Error Skipping Safe?

Error skipping is safe when:

The affected tables are transient (MEMORY tables, hopper, live tracking tables).
The replica is read-only and used only for reporting.
You are not using the replica for failover or promotion to master.

Error skipping is NOT safe when:

The replica might be promoted to master.
You need bit-for-bit consistency between master and replica.
The errors involve critical data tables (CDRs, agent logs, recording metadata).

Manual Error Skipping

If replication stops on an error that is not in the skip list, investigate first:

-- Check the error
SHOW SLAVE 'master1' STATUS\G

If the error is safe to skip (e.g., a one-time DDL issue), skip it manually:

-- Stop the specific connection
STOP SLAVE 'master1';

-- Skip one event
SET GLOBAL master1.sql_slave_skip_counter = 1;

-- Restart
START SLAVE 'master1';

Warning: Never blindly skip errors on critical data tables. If vicidial_closer_log or vicidial_log (your CDR tables) are throwing errors, investigate the root cause. Skipping CDR events means lost call records.

Troubleshooting

Problem: `Slave_IO_Running: Connecting`

The IO thread is trying to connect to the master but cannot establish a connection.

Diagnostic steps:

# 1. Test network connectivity from replica to master
mariadb -h MASTER1_IP -P 3306 -u repl_replica -p -e "SELECT 1;"

# 2. Check if MariaDB is listening on the master
ssh MASTER1 "ss -tlnp | grep 3306"

# 3. Check master's bind-address (must be 0.0.0.0, not 127.0.0.1)
ssh MASTER1 "grep bind-address /etc/my.cnf /etc/mysql/mariadb.conf.d/*.cnf"

# 4. Check firewall on the master
ssh MASTER1 "ufw status | grep 3306"

# 5. Verify the replication user exists and has the right password
ssh MASTER1 "mariadb -e \"SELECT User, Host FROM mysql.user WHERE User = 'repl_replica';\""

Common fixes:

Cause	Fix
Master's `bind-address = 127.0.0.1`	Change to `0.0.0.0` and restart MariaDB
Firewall blocking port 3306	`ufw allow from REPLICA_IP to any port 3306`
Wrong password	Recreate the user with the correct password
Master not running	`systemctl start mariadb` on the master
DNS resolution failure	Use IP addresses, not hostnames; ensure `skip-name-resolve` is set

Problem: `Slave_SQL_Running: No`

The SQL thread has stopped due to an error applying a binlog event.

SHOW SLAVE 'master1' STATUS\G
-- Look at: Last_SQL_Error, Last_SQL_Errno

Common errors and fixes:

Error	Typical Cause	Fix
1032: Can't find record	Row deleted on master, never existed on replica	Add 1032 to `slave_skip_errors` or skip manually
1062: Duplicate entry	Row inserted on master, already exists on replica	Add 1062 to `slave_skip_errors` or skip manually
1146: Table doesn't exist	Table exists on master but not on replica	Create the table on the replica, or add 1146 to skip list
1007: Database already exists	`CREATE DATABASE` replayed	Skip the event
Schema mismatch	Column added on master but not on replica	Usually self-resolves (the ALTER will replicate); skip if stuck

Problem: High `Seconds_Behind_Master`

The replica is lagging behind one or more masters.

Diagnostic steps:

-- Check which connection is lagging
SHOW ALL SLAVES STATUS\G

-- Check for long-running queries on the replica that might be blocking
SHOW PROCESSLIST;

-- Check for lock waits
SHOW ENGINE INNODB STATUS\G

Common causes and fixes:

Cause	Fix
Slow disk I/O on replica	Move to SSD; check `iostat -x 1`
Report query holding table locks	Add `max_statement_time` to limit query duration
Single-threaded apply bottleneck	Increase `slave_parallel_threads`
Master had a burst of large operations	Wait; the replica will catch up
Network saturation	Check bandwidth; `slave_compressed_protocol` helps

Problem: Relay Log Corruption After Crash

If the replica crashes (power loss, OOM kill), relay logs may be incomplete.

-- relay_log_recovery = ON handles this automatically
-- But if it doesn't, manually reset:

STOP SLAVE 'master1';

-- Reset and re-fetch from the last known good position
RESET SLAVE 'master1';

-- Reconfigure with the current master position
CHANGE MASTER 'master1' TO
  MASTER_HOST     = 'MASTER1_IP',
  MASTER_USER     = 'repl_replica',
  MASTER_PASSWORD = 'REPL_PASSWORD',
  MASTER_LOG_FILE = 'mysql-bin.000042',
  MASTER_LOG_POS  = 12345678;

START SLAVE 'master1';

To find the correct position after a crash, check the relay log info:

-- The Exec_Master_Log_Pos from before the crash
-- is stored in the relay-log.info file or the mysql.slave_master_info table
SELECT * FROM mysql.slave_master_info WHERE Connection_name = 'master1'\G

Problem: Binlog File No Longer Available on Master

If the replica was offline too long, the master may have purged the binlog file that the replica needs.

Last_IO_Error: Got fatal error 1236 from master ... binlog truncated

Fix: You need to re-initialize that connection with a fresh dump:

Stop the affected connection: STOP SLAVE 'master1';
Take a new dump from the master with --master-data=2
Import it into the replica database
CHANGE MASTER with the new position from the dump
START SLAVE 'master1';

Prevention: Set expire_logs_days = 7 (or higher) on masters so binlogs are retained long enough to survive replica downtime.

Maintenance Operations

Stopping and Starting Replication

-- Stop all connections (e.g., before replica maintenance)
STOP ALL SLAVES;

-- Start all connections
START ALL SLAVES;

-- Stop/start a single connection
STOP SLAVE 'master1';
START SLAVE 'master1';

After Replica Server Reboot

Because skip_slave_start = ON, replication does not auto-start. After rebooting the replica:

# 1. Check that MariaDB is running
systemctl status mariadb

# 2. Inspect replication state before starting
mariadb -e "SHOW ALL SLAVES STATUS\G" | grep -E 'Connection_name|IO_Running|SQL_Running|Seconds_Behind|Last.*Error'

# 3. Start all connections
mariadb -e "START ALL SLAVES;"

# 4. Verify
mariadb -e "SHOW ALL SLAVES STATUS\G" | grep -E 'Connection_name|IO_Running|SQL_Running|Seconds_Behind'

Adding a New Master

To add a fourth master to an existing setup:

Add filter rules to 60-multisource.cnf:

master4.replicate_rewrite_db = asterisk->master4_asterisk

master4.replicate_ignore_table = asterisk.vicidial_live_agents
master4.replicate_ignore_table = asterisk.live_sip_channels
# ... (same 12 tables as other connections)

Restart MariaDB on the replica:

systemctl restart mariadb

Create the database, import the initial dump, and configure the connection:

CREATE DATABASE IF NOT EXISTS master4_asterisk;

CHANGE MASTER 'master4' TO
  MASTER_HOST     = 'MASTER4_IP',
  MASTER_USER     = 'repl_replica',
  MASTER_PASSWORD = 'REPL_PASSWORD',
  MASTER_LOG_FILE = 'mysql-bin.000001',
  MASTER_LOG_POS  = 12345;

START SLAVE 'master4';

Update the monitoring script's CONNECTIONS array.

Removing a Master

STOP SLAVE 'master3';
RESET SLAVE 'master3' ALL;

The ALL keyword removes the connection configuration entirely. The replicated data in master3_asterisk remains on disk until you explicitly drop it.

Performance Considerations

Disk I/O

The replica receives write traffic from all masters simultaneously. With 3 masters, expect 3x the write I/O of a single server. Key recommendations:

Use SSDs. Spinning disks cannot keep up with multi-source write loads from busy call centers.
Separate relay logs and data files onto different disks if possible (use a symlink or mount point for /var/lib/mysql/relay-bin*).
Monitor I/O wait: iostat -x 1 should show %util below 80% sustained.

Memory

With ViciDial's MyISAM tables, the key_buffer_size is critical. The key buffer caches index blocks for all MyISAM tables across all databases:

-- Check key buffer utilization
SHOW STATUS LIKE 'Key%';

Calculate hit ratio:

Key cache hit ratio = 1 - (Key_reads / Key_read_requests) * 100

Target 99%+ hit ratio. If Key_reads is growing relative to Key_read_requests, increase key_buffer_size.

Network Bandwidth

With slave_compressed_protocol = ON, a busy ViciDial server generating 10 MB/s of binlog data compresses to roughly 3-4 MB/s over the wire. For 3 masters, budget ~10-12 MB/s sustained bandwidth to the replica.

Table Open Cache

Each replicated database has its own set of tables. With 3 databases of ~400 tables each, you need table_open_cache = 4096 or higher to avoid table open/close churn:

-- Check if the cache is too small
SHOW STATUS LIKE 'Opened_tables';

If Opened_tables is growing rapidly, increase table_open_cache.

Query Performance on Replica

Report queries on the replica compete with replication writes for disk I/O and locks. Best practices:

Set max_statement_time = 300 globally to kill runaway queries.
Create per-user timeouts (ALTER USER ... WITH MAX_STATEMENT_TIME 5) for automated report scripts.
Schedule heavy reports during off-peak hours (early morning, weekends).
Add indexes on the replica for your specific report queries -- these do not need to exist on the masters.

Replica-only indexes: You can safely CREATE INDEX on the replica for report optimization. Since read_only = ON prevents application writes and log_slave_updates = OFF prevents the DDL from propagating, the index stays local. However, be aware that if the master runs ALTER TABLE on the same table, the replica's version of the table will be rebuilt, potentially dropping your custom index. Monitor this and recreate indexes as needed.

Appendix A: Complete File Reference

Files on Each Master

File	Contents
`/etc/my.cnf` or `/etc/mysql/mariadb.conf.d/50-server.cnf`	`server_id`, `log_bin`, `binlog_format`, `expire_logs_days`

Files on Replica

File	Contents
`/etc/mysql/mariadb.conf.d/50-server.cnf`	Server config, replication engine, resource tuning
`/etc/mysql/mariadb.conf.d/60-multisource.cnf`	Per-connection rewrite and ignore rules
`/usr/local/bin/check_replication.sh`	Health monitoring script
`/var/log/replication_alerts.log`	Alert log written by health monitor
`/var/lib/mysql/replica-slow.log`	Slow query log
`/var/log/mysql/error.log`	MariaDB error log
`/etc/logrotate.d/replication-alerts`	Log rotation config

User Accounts

User	Server	Privileges	Purpose
`repl_replica`	Each master	`REPLICATION SLAVE`	Binlog streaming
`report_user`	Replica	`SELECT` on all `*_asterisk` DBs	Report scripts
`grafana_ro`	Replica	`SELECT` on all `*_asterisk` DBs	Grafana dashboards

Appendix B: Quick Reference Commands

-- Show all replication connections
SHOW ALL SLAVES STATUS\G

-- Show one connection
SHOW SLAVE 'master1' STATUS\G

-- Start/stop all
START ALL SLAVES;
STOP ALL SLAVES;

-- Start/stop one
START SLAVE 'master1';
STOP SLAVE 'master1';

-- Skip one event on a connection
STOP SLAVE 'master1';
SET GLOBAL master1.sql_slave_skip_counter = 1;
START SLAVE 'master1';

-- Remove a connection entirely
STOP SLAVE 'master3';
RESET SLAVE 'master3' ALL;

-- Check relay log space usage
SHOW ALL SLAVES STATUS\G
-- Look at: Relay_Log_Space (bytes per connection)

-- Check replication filters in effect
SHOW ALL SLAVES STATUS\G
-- Look at: Replicate_Rewrite_DB, Replicate_Ignore_Table

-- Quick health one-liner
mariadb -e "SHOW ALL SLAVES STATUS\G" \
  | grep -E 'Connection_name|IO_Running|SQL_Running|Seconds_Behind|Last.*Error:'

Appendix C: Capacity Planning

Estimating Replica Storage

Replica disk = (Master1 data size) + (Master2 data size) + (Master3 data size)
             + relay log space (~1 GB per connection)
             + temp tables for queries
             + 20% headroom

For a typical ViciDial deployment with 2 years of CDR history:

Component	Size
`master1_asterisk`	15-40 GB
`master2_asterisk`	15-40 GB
`master3_asterisk`	15-40 GB
Relay logs	~3 GB
Slow query log	< 1 GB
System + overhead	~5 GB
Total	60-130 GB

When to Archive

ViciDial's vicidial_log, vicidial_closer_log, and recording_log tables grow indefinitely. On the replica, consider:

Partitioning by month (replica-only) for fast DROP PARTITION cleanup.
Archive cron that moves records older than N days to archive tables.
Setting up data retention on the masters (ViciDial's built-in archiving moves old records to _archive tables).

This tutorial documents a production multi-source replication deployment for VoIP call centers. Adapt connection names, IP addresses, and table lists to your specific environment. Always test configuration changes on a staging system before applying to production.