ViciDial Database Performance & Optimization
The Complete Guide to Diagnosing, Tuning, and Scaling Your ViciDial Database
Table of Contents
- Why the Database Is the Bottleneck
- Diagnosing Performance Problems
- MariaDB / my.cnf Tuning
- Index Optimization
- Query Cache Tuning
- Table Maintenance
- Archiving Old Data
- Replication for Read Offloading
- Partitioning Large Tables
- The sip_packets Table Problem
- Monitoring Database Health
- Emergency Procedures
- Performance Benchmarking
1. Why the Database Is the Bottleneck
Every ViciDial installation eventually hits the same wall: the database becomes the single point of contention that limits scalability. Understanding why requires knowing how ViciDial uses MySQL/MariaDB internally.
The MyISAM Architecture Problem
ViciDial uses the MyISAM storage engine for nearly all tables. MyISAM provides raw read speed but comes with a fundamental limitation: table-level locking. When any process writes to a MyISAM table, the entire table is locked -- no other process can read or write until the lock releases. InnoDB uses row-level locking, but ViciDial cannot use it (more on that below).
ViciDial is a real-time system that reads AND writes constantly, every second, to the same small set of hot tables.
The Hot Tables
These tables see the most read/write activity and are the primary source of lock contention:
| Table | Operations/Second | Purpose |
|---|---|---|
vicidial_live_agents |
50-200+ | Updated every second per logged-in agent. Every agent screen refresh reads this. Every call assignment writes to it. This is the single hottest table in ViciDial. |
vicidial_live_inbound_agents |
20-100+ | Tracks which agents are available for inbound calls. Updated on every status change. |
vicidial_live_calls |
10-50+ | Tracks active calls in progress. Written on call start, updated during call, deleted on hangup. |
vicidial_auto_calls |
10-50+ | The auto-dialer's queue. Rows inserted when calls are placed, updated with status, deleted when calls end. |
vicidial_manager |
100-500+ | Command queue between web interface and Asterisk. Rapid INSERT/DELETE cycle. |
The Real-Time Polling Problem
ViciDial's agent web interface polls the server every second via AJAX. With 50 agents, that is 50 SELECTs per second against the same tables where the dialer is simultaneously doing INSERTs, UPDATEs, and DELETEs. With MyISAM's table-level locking, you get cascading lock waits:
| 1234 | vici | SELECT ... FROM vicidial_live_agents | 0.5 sec |
| 1235 | vici | UPDATE vicidial_live_agents SET ... | Waiting for table level lock |
| 1236 | vici | SELECT ... FROM vicidial_live_agents | Waiting for table level lock |
| 1237 | vici | SELECT ... FROM vicidial_live_agents | Waiting for table level lock |
| 1238 | vici | UPDATE vicidial_live_agents SET ... | Waiting for table level lock |
Why Not InnoDB or Clustering?
Converting tables to InnoDB has been tested by multiple community members: it breaks the auto-dialer. The dialer's timing-sensitive operations fail under InnoDB's transaction overhead. MySQL/MariaDB clustering (Galera, NDB) is also incompatible because it requires InnoDB. The only viable scaling strategy is master/slave replication to offload reads to a separate server.
The Compounding Effect
Performance degrades non-linearly. A system running well at 30 agents may collapse at 50 because lock wait times increase with every concurrent query, table scans slow as log tables grow, and report queries compete with real-time queries for locks. This is why optimization is not optional beyond 20-30 agents.
2. Diagnosing Performance Problems
Before tuning anything, identify exactly what is slow and why. Random my.cnf changes without diagnosis are worse than doing nothing -- they waste time and can introduce new problems.
2.1 Show Running Queries
The single most useful diagnostic command:
mysqladmin -u root -p processlist
Or from within MySQL:
SHOW FULL PROCESSLIST;
Look for:
- "Waiting for table level lock" -- MyISAM lock contention (most common ViciDial bottleneck)
- "Copying to tmp table" -- Missing index or large result set
- "Sending data" -- Table scan (long times = problem)
- Queries running > 2-3 seconds against real-time tables
2.2 Check Slow Query Count
SHOW GLOBAL STATUS LIKE 'Slow_queries';
This shows the cumulative count since server start. If the rate exceeds 1 slow query per minute, there is a problem.
2.3 Enable and Analyze the Slow Query Log
Enable temporarily (runtime, no restart required):
SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1; -- Log queries taking >1 second
SET GLOBAL log_queries_not_using_indexes = 1; -- Also log queries doing full scans
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';
Enable permanently in my.cnf (under [mysqld]):
slow_query_log = 1
long_query_time = 1
log_queries_not_using_indexes = 1
slow_query_log_file = /var/log/mysql/slow-query.log
Analyze with pt-query-digest (from Percona Toolkit):
# Install
apt-get install percona-toolkit # Debian/Ubuntu
yum install percona-toolkit # CentOS/RHEL
# Top 10 slowest query patterns
pt-query-digest /var/log/mysql/slow-query.log
# Last hour only
pt-query-digest --since '1h' /var/log/mysql/slow-query.log
Or with mysqldumpslow (built-in, simpler):
mysqldumpslow -s at -t 10 /var/log/mysql/slow-query.log
2.4 Check Table Lock Contention
SHOW GLOBAL STATUS LIKE 'Table_locks%';
Output:
+----------------------------+-----------+
| Variable_name | Value |
+----------------------------+-----------+
| Table_locks_immediate | 845923 |
| Table_locks_waited | 1247 |
+----------------------------+-----------+
Interpretation: The ratio of Table_locks_waited / Table_locks_immediate should be below 1%. If Table_locks_waited is growing by more than a few per minute, you have lock contention. This is the hallmark ViciDial database problem.
-- Check lock wait ratio as percentage
SELECT
ROUND(
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Table_locks_waited') /
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Table_locks_immediate') * 100, 2
) AS lock_wait_pct;
2.5 Check MyISAM Key Buffer Efficiency
SHOW GLOBAL STATUS LIKE 'Key_read%';
+-------------------+-----------+
| Variable_name | Value |
+-------------------+-----------+
| Key_read_requests | 94523871 |
| Key_reads | 3456 |
+-------------------+-----------+
Interpretation: The ratio Key_reads / Key_read_requests should be less than 0.01 (1 in 100). This is your key buffer cache hit rate. If Key_reads is high relative to requests, your key_buffer_size is too small and indexes are being read from disk instead of RAM.
-- Calculate cache miss rate
SELECT
ROUND(
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Key_reads') /
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Key_read_requests') * 100, 4
) AS key_cache_miss_pct;
- < 0.1% = Excellent. Key buffer is well-sized.
- 0.1% - 1% = Acceptable but could improve.
- > 1% = Key buffer is too small. Increase
key_buffer_size.
2.6 Check Key Buffer Usage
SELECT ROUND((1 - (
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Key_blocks_unused') *
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_VARIABLES WHERE VARIABLE_NAME = 'key_cache_block_size')
) / (SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_VARIABLES WHERE VARIABLE_NAME = 'key_buffer_size')
) * 100, 2) AS key_buffer_used_pct;
Below 50% = oversized (wasting RAM). Above 90% = too small (increase key_buffer_size).
2.7 Real-Time Monitoring with mytop
apt-get install mytop # or: yum install mytop
mytop -u root -p PASSWORD -d asterisk
Live top-like view. Press f for full query text, k to kill a query.
2.8 Check Temporary Table Usage
SHOW GLOBAL STATUS LIKE 'Created_tmp%';
If Created_tmp_disk_tables / Created_tmp_tables > 25%, increase tmp_table_size and max_heap_table_size.
2.9 Check Table Sizes
Find the largest tables consuming space and causing slow scans:
SELECT
table_name,
engine,
ROUND(data_length / 1024 / 1024, 1) AS data_mb,
ROUND(index_length / 1024 / 1024, 1) AS index_mb,
ROUND((data_length + index_length) / 1024 / 1024, 1) AS total_mb,
table_rows
FROM information_schema.tables
WHERE table_schema = 'asterisk'
ORDER BY (data_length + index_length) DESC
LIMIT 20;
Common offenders: vicidial_log, vicidial_closer_log, call_log, vicidial_agent_log, vicidial_carrier_log, recording_log, vicidial_list, sip_packets (Section 10), server_performance.
3. MariaDB / my.cnf Tuning
The my.cnf file controls how MariaDB allocates memory, handles connections, caches data, and manages storage engines. ViciDial has specific requirements that differ from typical web application tuning.
Important: Make changes one at a time, restart MariaDB, and monitor for at least 30 minutes before making the next change. Document every change so you can roll back if something gets worse.
3.1 Locating Your my.cnf
# Show which files MariaDB reads (in order of precedence)
mysql --help --verbose 2>/dev/null | grep -A 1 "Default options"
# Typical locations:
# /etc/my.cnf -- Primary on CentOS/RHEL/ViciBox
# /etc/mysql/my.cnf -- Primary on Debian/Ubuntu
# /etc/mysql/mariadb.conf.d/ -- Drop-in directory on Debian/Ubuntu
All settings below go under the [mysqld] section.
3.2 Small Server (4-8 GB RAM, up to 30 Agents)
This profile is for a single all-in-one ViciDial server (web, Asterisk, database on one machine) with 4-8 GB total RAM.
[mysqld]
# ============================================================
# ViciDial MariaDB -- Small Server (4-8 GB RAM, up to 30 agents)
# ============================================================
# --- Connection Handling ---
max_connections = 500 # PHP scripts + cron + Asterisk manager
thread_cache_size = 50 # Reuse idle threads
max_allowed_packet = 64M # Large list uploads
wait_timeout = 300 # Close idle connections after 5 min
interactive_timeout = 300
# --- MyISAM Key Buffer (most critical setting) ---
key_buffer_size = 512M # Cache for MyISAM indexes (~25% of RAM for all-in-one)
# Monitor Key_reads/Key_read_requests -- should be < 0.01
myisam_sort_buffer_size = 64M # Buffer for REPAIR/ALTER/CREATE INDEX
myisam_repair_threads = 1 # 1 is safest
bulk_insert_buffer_size = 64M # Bulk INSERT...SELECT and LOAD DATA
# --- Temporary Tables (MUST match each other) ---
tmp_table_size = 256M # In-memory temp table limit
max_heap_table_size = 256M # Exceeding this spills to disk (very slow)
# --- Table Cache ---
table_open_cache = 2048 # ViciDial has 200+ tables
table_definition_cache = 1024
# --- Query Cache (see Section 5 for details) ---
query_cache_type = 1
query_cache_size = 64M # WARNING: Never exceed 128M (mutex contention)
query_cache_limit = 2M # Max single cached result
# --- Per-Connection Buffers (allocated per connection -- be conservative) ---
join_buffer_size = 2M
sort_buffer_size = 2M
read_buffer_size = 1M
read_rnd_buffer_size = 1M
# --- Logging ---
slow_query_log = 1
long_query_time = 2
slow_query_log_file = /var/log/mysql/slow-query.log
# --- InnoDB (minimal -- ViciDial uses MyISAM) ---
innodb_buffer_pool_size = 128M
innodb_log_file_size = 48M
innodb_flush_log_at_trx_commit = 2 # Flush to OS cache each commit, disk every second
# --- Safety ---
max_connect_errors = 1000000 # Prevent blocking from connection errors
max_statement_time = 300 # Kill queries running > 5 min (MariaDB 10.1+)
3.3 Medium Server (16-32 GB RAM, 30-100 Agents)
This is typically a dedicated database server or a larger all-in-one system.
[mysqld]
# ============================================================
# ViciDial MariaDB -- Medium Server (16-32 GB RAM, 30-100 agents)
# ============================================================
# --- Connection Handling ---
max_connections = 1000
thread_cache_size = 100
max_allowed_packet = 64M
wait_timeout = 300
interactive_timeout = 300
# --- MyISAM Key Buffer ---
key_buffer_size = 2G # Most MyISAM indexes will fit in RAM
myisam_sort_buffer_size = 256M
myisam_repair_threads = 2 # Parallel repair on multi-core
bulk_insert_buffer_size = 256M
# --- Temporary Tables ---
tmp_table_size = 1G # Large enough for 30-day report temp tables
max_heap_table_size = 1G
# --- Table Cache ---
table_open_cache = 4096
table_definition_cache = 2048
# --- Query Cache ---
query_cache_type = 1
query_cache_size = 128M # Practical upper limit (mutex contention above this)
query_cache_limit = 4M
# --- Per-Connection Buffers ---
join_buffer_size = 4M
sort_buffer_size = 4M
read_buffer_size = 2M
read_rnd_buffer_size = 2M
# --- Logging ---
slow_query_log = 1
long_query_time = 1
slow_query_log_file = /var/log/mysql/slow-query.log
log_queries_not_using_indexes = 1
# --- InnoDB ---
innodb_buffer_pool_size = 256M
innodb_log_file_size = 128M
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table = 1
# --- Safety ---
max_connect_errors = 1000000
max_statement_time = 300
# --- Thread Handling (MariaDB 5.5+) ---
thread_handling = pool-of-threads # Better than one-thread-per-connection
thread_pool_size = 8 # Match CPU core count
3.4 Large Server (64 GB+ RAM, 100+ Agents)
This is a dedicated database server for large multi-server ViciDial clusters.
[mysqld]
# ============================================================
# ViciDial MariaDB -- Large Server (64 GB+ RAM, 100+ agents)
# ============================================================
# --- Connection Handling ---
max_connections = 2000
thread_cache_size = 256
max_allowed_packet = 64M
wait_timeout = 300
interactive_timeout = 300
# --- MyISAM Key Buffer ---
key_buffer_size = 4G # 4 GB cap -- diminishing returns above this;
# values >4G caused instability pre-MySQL 5.1.23
myisam_sort_buffer_size = 512M
myisam_repair_threads = 4
bulk_insert_buffer_size = 512M
# --- Temporary Tables ---
tmp_table_size = 2G
max_heap_table_size = 2G
# --- Table Cache ---
table_open_cache = 8192
table_definition_cache = 4096
# --- Query Cache: DISABLED ---
query_cache_type = 0 # With 100+ agents, the global mutex becomes
query_cache_size = 0 # the dominant bottleneck. Both=0 skips mutex entirely.
# --- Per-Connection Buffers ---
join_buffer_size = 8M
sort_buffer_size = 8M
read_buffer_size = 4M
read_rnd_buffer_size = 4M
# --- Logging ---
slow_query_log = 1
long_query_time = 1
slow_query_log_file = /var/log/mysql/slow-query.log
log_queries_not_using_indexes = 1
# --- InnoDB ---
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table = 1
# --- Safety ---
max_connect_errors = 1000000
max_statement_time = 300
# --- Thread Handling ---
thread_handling = pool-of-threads
thread_pool_size = 16 # Match CPU core count
# --- Memory Budget ---
# key_buffer: 4G + innodb_pool: 1G + tmp: 2G (shared cap)
# Per-thread worst case: 24MB * 2000 = 48G (but <5% active)
# Realistic peak: ~12-15 GB. Remaining ~50 GB goes to OS file
# cache, which is critical -- MyISAM .MYD files are cached by
# the OS, not by MySQL.
3.5 Applying Changes
# Always backup first
cp /etc/my.cnf /etc/my.cnf.bak.$(date +%Y%m%d)
# Edit the file
vim /etc/my.cnf
# Check syntax before restarting
mariadbd --help --verbose 2>&1 | head -5
# Should not show errors
# Restart MariaDB
systemctl restart mariadb # or: systemctl restart mysql
# Verify it started cleanly
systemctl status mariadb
tail -20 /var/log/mysql/error.log
# Confirm settings took effect
mysql -e "SHOW VARIABLES LIKE 'key_buffer_size';"
mysql -e "SHOW VARIABLES LIKE 'query_cache_size';"
3.6 Settings You Can Change Without Restart
Some variables can be changed at runtime for immediate testing:
-- Test a new key buffer size without restart
SET GLOBAL key_buffer_size = 2147483648; -- 2 GB
-- Test query cache changes
SET GLOBAL query_cache_size = 0; -- Disable
-- Test temporary table limits
SET GLOBAL tmp_table_size = 1073741824; -- 1 GB
SET GLOBAL max_heap_table_size = 1073741824;
-- Note: Runtime changes are lost on restart.
-- Add to my.cnf to persist.
4. Index Optimization
Indexes are the difference between a query scanning 10 million rows (minutes) and looking up 50 rows (milliseconds). ViciDial ships with reasonable default indexes, but as your data grows and you run custom reports, additional indexes become essential.
4.1 Check Existing Indexes
ViciDial's default indexes are optimized for real-time dialer operations, not reporting queries. Before adding indexes, see what exists:
-- Show all indexes on a table
SHOW INDEX FROM vicidial_closer_log;
-- Show table structure with indexes
SHOW CREATE TABLE vicidial_closer_log\G
4.2 Critical Indexes to Add
These address the most common slow query patterns. Verified against actual ViciDial query patterns.
vicidial_closer_log (Inbound Call Records)
-- Speed up reports that filter by campaign and date range.
-- This is the most impactful single index for inbound reporting.
CREATE INDEX idx_campaign_calldate_status
ON vicidial_closer_log (campaign_id, call_date, status);
-- Speed up lookups by phone number (call history, DID lookups)
CREATE INDEX idx_phone_calldate
ON vicidial_closer_log (phone_number, call_date);
-- Speed up agent performance reports filtering by user and date
CREATE INDEX idx_user_calldate
ON vicidial_closer_log (user, call_date);
vicidial_log (Outbound Call Records)
-- Same pattern: campaign + date range is the most common report filter
CREATE INDEX idx_campaign_calldate_status
ON vicidial_log (campaign_id, call_date, status);
-- Phone number lookups for call history
CREATE INDEX idx_phone_calldate
ON vicidial_log (phone_number, call_date);
-- Agent performance
CREATE INDEX idx_user_calldate
ON vicidial_log (user, call_date);
vicidial_agent_log (Agent Activity)
-- Agent performance reports: filter by user and time range
CREATE INDEX idx_user_eventtime
ON vicidial_agent_log (user, event_time);
-- Campaign-level agent stats
CREATE INDEX idx_campaign_eventtime
ON vicidial_agent_log (campaign_id, event_time);
-- Composite index for the agent performance detail report
-- (AST_agent_performance_detail.php), which filters on
-- user, sub_status, campaign_id and sums time columns.
CREATE INDEX idx_usr_subst_cmpid
ON vicidial_agent_log (user, sub_status, campaign_id, event_time);
call_log (Asterisk CDR)
-- Speed up joins between call_log and vicidial_log/closer_log
-- using uniqueid (the Asterisk channel unique identifier).
CREATE INDEX idx_uniqueid
ON call_log (uniqueid);
-- Date range queries for carrier analysis
CREATE INDEX idx_changrp_starttime
ON call_log (channel_group, start_time);
vicidial_carrier_log (SIP Carrier Details)
-- Carrier performance reports: filter by date
CREATE INDEX idx_calldate
ON vicidial_carrier_log (call_date);
-- Carrier + date for per-trunk analysis
CREATE INDEX idx_carrier_calldate
ON vicidial_carrier_log (carrier_id, call_date);
recording_log (Recording Metadata)
-- Lookup recordings by lead_id (common from the agent interface)
CREATE INDEX idx_lead_id
ON recording_log (lead_id, start_time);
-- Lookup recordings by vicidial_id (links to vicidial_log/closer_log)
CREATE INDEX idx_vicidial_id
ON recording_log (vicidial_id);
4.3 Verify Indexes Are Being Used
EXPLAIN SELECT COUNT(*) FROM vicidial_closer_log
WHERE campaign_id = 'london'
AND call_date BETWEEN '2026-03-01' AND '2026-03-13';
-- "key" column: NULL = full scan (bad), idx_name = index used (good)
-- "rows" column: lower is better
4.4 Index Guidelines
Every index slows writes. Apply these rules:
- Log tables (vicidial_log, vicidial_closer_log, call_log, vicidial_agent_log): Safe to add indexes. Write-once tables where read benefit far outweighs INSERT overhead.
- Real-time tables (vicidial_live_agents, vicidial_auto_calls, vicidial_live_calls, vicidial_manager): Do NOT add indexes. Updated hundreds of times per second -- additional indexes amplify lock contention.
- Reference tables (vicidial_campaigns, vicidial_inbound_groups, vicidial_users): Read-heavy, write-rare. Add indexes freely.
Find unused indexes (wasting key buffer and slowing writes):
SELECT table_name, index_name, rows_read
FROM information_schema.index_statistics
WHERE table_schema = 'asterisk'
ORDER BY rows_read ASC LIMIT 20;
5. Query Cache Tuning
The query cache is one of the most misunderstood MySQL features, and getting it wrong for ViciDial can make performance significantly worse.
5.1 How It Works (and Why It Hurts)
The query cache stores complete SELECT result sets keyed by exact SQL text. The catch: any write to a table invalidates ALL cached queries referencing that table. On ViciDial, where vicidial_live_agents is written every second per agent, cache entries for that table are invalidated almost immediately.
Worse, the query cache uses a single global mutex. Every cache check, insert, and invalidation requires this lock. On multi-core servers, this serializes all queries. Symptoms: high CPU but low throughput, "Waiting for query cache lock" in SHOW PROCESSLIST. This is documented in MariaDB MDEV-18764 and is why MySQL 8.0 removed the query cache entirely.
5.2 When to Enable vs Disable
Keep ON (< 50 agents, 4 or fewer cores, MariaDB 10.1/10.5):
query_cache_type = 1
query_cache_size = 64M # Never exceed 128M
query_cache_limit = 2M
Disable (100+ agents, 8+ cores, heavy outbound dialing, or MariaDB 10.6+):
query_cache_type = 0
query_cache_size = 0
Both must be 0. Setting only one still allocates memory or acquires the mutex. When both are 0, MariaDB skips the mutex entirely. Some of the largest ViciDial installations (700+ agents) run with query cache off.
5.3 Monitoring Query Cache Effectiveness
SHOW GLOBAL STATUS LIKE 'Qcache%';
Calculate cache hit rate:
SELECT ROUND(
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Qcache_hits') /
((SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Qcache_hits') +
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'Com_select')
) * 100, 2) AS query_cache_hit_pct;
- > 30%: Cache is providing value, keep it.
- 10-30%: Marginal. Test with cache off.
- < 10%: Wasting memory, disable it.
If Qcache_lowmem_prunes is growing rapidly, the cache is too small or your workload is too write-heavy for effective caching.
5.6 Version-Specific Behavior
| MariaDB Version | Query Cache Status | Recommendation |
|---|---|---|
| 10.1.x | Fully supported, enabled by default | Use 64M for small, up to 128M for medium |
| 10.5.x | Fully supported, disabled by default | Enable only if measured benefit > 20% hit rate |
| 10.6.x | Fully supported, disabled by default | Same as 10.5; optimizer improvements reduce benefit |
| 10.11.x | Fully supported, disabled by default | Test with cache off first; enable only if hit rate justifies it |
| 11.x+ | Still available (unlike MySQL 8.0 which removed it) | Default to off; MariaDB may deprecate in future |
MySQL 8.0+ note: If you are running MySQL 8.0 instead of MariaDB, the query cache is completely removed. The query_cache_* variables do not exist. No action needed.
6. Table Maintenance
MyISAM tables require periodic maintenance to stay healthy. Unlike InnoDB (which handles most maintenance internally), MyISAM tables fragment over time, develop inconsistencies from unclean shutdowns, and accumulate dead space from deleted rows.
6.1 The Built-In Optimizer: AST_DB_optimize.pl
ViciDial ships with a daily optimization script. This should be running via cron on one server only in a multi-server cluster:
# ViciDial default -- runs daily at 1:03 AM
3 1 * * * /usr/share/astguiclient/AST_DB_optimize.pl
This script runs OPTIMIZE TABLE on the key ViciDial tables. It reclaims space from deleted rows and rebuilds indexes for optimal performance.
Verify it is running:
# Check if it is in crontab
crontab -l | grep optimize
# Check last execution in logs
grep -i optimize /var/log/astguiclient/process_log 2>/dev/null | tail -5
Critical warning: OPTIMIZE TABLE on MyISAM locks the table for the entire operation. On large tables (10M+ rows), this can take minutes to hours. Never run it during active calling hours.
6.2 Manual Optimization
For tables that need immediate attention:
-- Check table integrity first
CHECK TABLE vicidial_log;
CHECK TABLE vicidial_closer_log;
CHECK TABLE call_log;
CHECK TABLE vicidial_agent_log;
-- Optimize (defragment + rebuild indexes)
-- WARNING: This locks the table. Run off-hours only.
OPTIMIZE TABLE vicidial_log;
OPTIMIZE TABLE vicidial_closer_log;
6.3 Table Repair
From SQL (preferred):
REPAIR TABLE vicidial_log QUICK; -- Index-only, try first
REPAIR TABLE vicidial_log EXTENDED; -- If QUICK fails
REPAIR TABLE vicidial_log USE_FRM; -- When .MYI is destroyed
From command line (when MySQL cannot start):
systemctl stop mariadb
cd /var/lib/mysql/asterisk/
myisamchk -r -q vicidial_log.MYI # Quick recovery (try first)
myisamchk -r vicidial_log.MYI # Full recovery
myisamchk --safe-recover vicidial_log.MYI # Slowest, handles worst corruption
# Speed up with: --sort_buffer_size=1G --key_buffer_size=1G
systemctl start mariadb
6.4 Batch Repair and Optimization with mysqlcheck
# Check all tables in the asterisk database
mysqlcheck -u root -p --check asterisk
# Auto-repair any corrupted tables found
mysqlcheck -u root -p --auto-repair asterisk
# Optimize all tables (locks each one sequentially)
mysqlcheck -u root -p --optimize asterisk
# Combined: check, repair if needed, then optimize
mysqlcheck -u root -p --auto-repair --optimize asterisk
6.5 Detecting Table Fragmentation
-- Show fragmented space per table
SELECT
table_name,
engine,
ROUND(data_length / 1024 / 1024, 1) AS data_mb,
ROUND(data_free / 1024 / 1024, 1) AS fragmented_mb,
ROUND(data_free / data_length * 100, 1) AS frag_pct
FROM information_schema.tables
WHERE table_schema = 'asterisk'
AND data_free > 0
AND engine = 'MyISAM'
ORDER BY data_free DESC
LIMIT 15;
Tables with frag_pct > 20% benefit from OPTIMIZE TABLE. Below 10%, the overhead of optimization is not justified.
6.6 Maintenance Schedule
| Task | Frequency | When to Run | Command |
|---|---|---|---|
| AST_DB_optimize.pl | Daily | 1-3 AM (low traffic) | Cron (already configured) |
| CHECK TABLE on hot tables | Weekly | Sunday 2 AM | Custom cron script |
| OPTIMIZE TABLE on log tables | Monthly | First Sunday 3 AM | Custom cron script |
| mysqlcheck --auto-repair | After crash/power loss | Immediately | Manual |
| Full mysqlcheck --optimize | Quarterly | Planned maintenance window | Manual with announcement |
| Slow query log review | Weekly | Business hours | pt-query-digest |
Weekly check script (/usr/local/bin/vicidial-weekly-check.sh, cron: 0 2 * * 0):
#!/bin/bash
TABLES="vicidial_log vicidial_closer_log call_log vicidial_agent_log vicidial_carrier_log recording_log"
for T in $TABLES; do
STATUS=$(mysql -u root -N -e "CHECK TABLE asterisk.$T QUICK;" | awk '{print $NF}')
[ "$STATUS" != "OK" ] && mysql -u root -e "REPAIR TABLE asterisk.$T QUICK;"
done
7. Archiving Old Data
This is often the single biggest performance improvement you can make. A ViciDial system that has been running for two years without archiving might have 50-100 million rows in its log tables. Every report query, every index lookup, every table scan gets slower as these tables grow. Archiving moves old data to _archive tables, keeping the active tables lean.
7.1 Which Tables Grow Unbounded
These tables accumulate data indefinitely unless archived:
| Table | Growth Rate | 1-Year Size (50 agents) |
|---|---|---|
vicidial_log |
~1,000-5,000 rows/day | 10-25 GB |
vicidial_closer_log |
~500-2,000 rows/day | 5-15 GB |
call_log |
~2,000-10,000 rows/day | 15-40 GB |
vicidial_agent_log |
~5,000-20,000 rows/day | 20-50 GB |
vicidial_carrier_log |
~1,000-5,000 rows/day | 10-25 GB |
vicidial_log_extended |
~1,000-5,000 rows/day | 5-15 GB |
vicidial_dial_log |
~2,000-10,000 rows/day | 10-30 GB |
vicidial_drop_log |
~100-500 rows/day | 1-5 GB |
server_performance |
~1,440 rows/day (1/min) | 2-5 GB |
Without archiving, after two years these tables can total 100-300 GB of data, with indexes adding another 30-50%.
7.2 ADMIN_archive_log_tables.pl
ViciDial includes a built-in archiving script that moves old records from active tables into corresponding _archive tables (e.g., vicidial_log -> vicidial_log_archive) and then deletes them from the active table.
Location: /usr/share/astguiclient/ADMIN_archive_log_tables.pl
Key options:
--days=XX Number of days to keep in active tables (default: 732 / 2 years)
--months=XX Number of months to keep (alternative to --days)
--daily Archive only call_log, vicidial_log_extended, vicidial_dial_log,
vicidial_drop_log -- keeps only last 24 hours
--wipe-all Delete ALL records instead of archiving (DANGEROUS)
--skip-optimize Skip the OPTIMIZE TABLE step after archiving
--skip-purge Archive but do not delete from active tables
Recommended cron configurations:
# OPTION A: Monthly archiving, keep 6 months of data
# Run on the 1st of each month at 1:30 AM
30 1 1 * * /usr/share/astguiclient/ADMIN_archive_log_tables.pl --months=6
# OPTION B: Daily archiving, keep 90 days (recommended for busy systems)
# Run nightly at 1:20 AM -- processes less data per run, finishes faster
20 1 * * * /usr/share/astguiclient/ADMIN_archive_log_tables.pl --days=90
# OPTION C: Split daily + monthly strategy
# Daily: archive high-volume tables aggressively (24-hour retention)
20 1 * * * /usr/share/astguiclient/ADMIN_archive_log_tables.pl --daily
# Monthly: archive all other tables with 180-day retention
30 1 1 * * /usr/share/astguiclient/ADMIN_archive_log_tables.pl --days=180
Critical warnings:
- The script locks tables during archiving. On large tables (10M+ rows) being archived for the first time, this can take hours. During this time, the dialer is effectively unusable. Plan the first run during a maintenance window.
- Schedule it when the system is idle. For 24-hour call centers, the lowest-traffic window is typically 1-4 AM.
- The first run is the slowest. If you have 2 years of data and set --days=90, the first run must process ~21 months of data. Subsequent daily runs process only one day of data and finish in minutes.
7.3 Retention Policy Recommendations
| Environment | Retention | Rationale |
|---|---|---|
| Small (< 20 agents) | 180 days | Low data volume, 6 months covers most audit needs |
| Medium (20-50 agents) | 90 days | Balance between history and performance |
| Large (50-100 agents) | 60 days | Performance-critical, archive tables still have history |
| Very Large (100+ agents) | 30 days | Maximum performance, use replica for historical reports |
The _archive tables contain all the historical data. Reports can be modified to query archive tables when longer history is needed. The key insight is that 95% of report queries look at the last 7-30 days. There is no reason to scan through years of data for every dashboard refresh.
7.4 Manual Archiving (When the Script Is Not Enough)
For very large tables or specific archive requirements:
-- Step 1: Create the archive table if it does not exist
CREATE TABLE IF NOT EXISTS vicidial_log_archive LIKE vicidial_log;
-- Step 2: Copy old records to archive (in chunks to avoid long locks)
-- Archive records older than 90 days, 100,000 rows at a time
INSERT INTO vicidial_log_archive
SELECT * FROM vicidial_log
WHERE call_date < DATE_SUB(NOW(), INTERVAL 90 DAY)
LIMIT 100000;
-- Step 3: Delete the archived records
DELETE FROM vicidial_log
WHERE call_date < DATE_SUB(NOW(), INTERVAL 90 DAY)
LIMIT 100000;
-- Step 4: Repeat steps 2-3 until no more rows match
-- Check how many remain:
SELECT COUNT(*) FROM vicidial_log
WHERE call_date < DATE_SUB(NOW(), INTERVAL 90 DAY);
-- Step 5: Optimize after all deletes are done
OPTIMIZE TABLE vicidial_log;
Chunked archiving script (for initial large jobs):
#!/bin/bash
# Archives 100K rows per iteration to avoid long table locks
DB="asterisk"; TABLE="vicidial_log"; ARCHIVE="${TABLE}_archive"; DAYS=90; CHUNK=100000
mysql -u root "$DB" -e "CREATE TABLE IF NOT EXISTS $ARCHIVE LIKE $TABLE;"
while true; do
REMAINING=$(mysql -u root -N "$DB" -e \
"SELECT COUNT(*) FROM $TABLE WHERE call_date < DATE_SUB(NOW(), INTERVAL $DAYS DAY);")
echo "$(date): $REMAINING rows remaining"
[ "$REMAINING" -eq 0 ] && break
mysql -u root "$DB" -e \
"INSERT INTO $ARCHIVE SELECT * FROM $TABLE WHERE call_date < DATE_SUB(NOW(), INTERVAL $DAYS DAY) LIMIT $CHUNK;"
mysql -u root "$DB" -e \
"DELETE FROM $TABLE WHERE call_date < DATE_SUB(NOW(), INTERVAL $DAYS DAY) LIMIT $CHUNK;"
sleep 2 # Let other queries through
done
mysql -u root "$DB" -e "OPTIMIZE TABLE $TABLE;"
7.5 Purging Archive Tables
Archive tables themselves eventually need pruning:
-- Delete archive data older than 2 years
DELETE FROM vicidial_log_archive
WHERE call_date < DATE_SUB(NOW(), INTERVAL 2 YEAR)
LIMIT 500000;
-- Repeat until 0 rows affected, then:
OPTIMIZE TABLE vicidial_log_archive;
7.6 The server_performance Table
This table is often overlooked. It stores server stats every minute and is never archived by the standard script. After a year, it has 500,000+ rows:
-- Check size
SELECT COUNT(*) FROM server_performance;
-- Purge old entries (keep 30 days)
DELETE FROM server_performance
WHERE start_time < DATE_SUB(NOW(), INTERVAL 30 DAY)
LIMIT 100000;
-- Repeat until done, then:
OPTIMIZE TABLE server_performance;
8. Replication for Read Offloading
Setting up a read replica is the most architecturally sound way to improve ViciDial database performance. It separates the write-heavy dialer workload from read-heavy reporting queries, eliminating the primary source of lock contention.
8.1 Architecture: Master/Slave Only
ViciDial only supports master/slave replication. Master/Master fails because MyISAM has no transaction support (conflict resolution is impossible), auto-increment IDs collide, and the dialer's non-deterministic writes conflict. All dialer processes connect to the master; reports and dashboards connect to the slave.
8.2 Master Configuration
Add to my.cnf on the master (production ViciDial server):
[mysqld]
server_id = 1 # Unique per server in topology
log_bin = /var/lib/mysql/mysql-bin
binlog_format = MIXED # STATEMENT fails on non-deterministic queries; ROW is too large
expire_logs_days = 7 # Auto-purge old binlogs
max_binlog_size = 512M
sync_binlog = 0 # OS flush (faster; MyISAM has no txn guarantees anyway)
binlog_do_db = asterisk # Only replicate the asterisk database
Create the replication user:
-- On the master
CREATE USER 'repl_user'@'REPLICA_IP' IDENTIFIED BY 'StrongPassword123!';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'REPLICA_IP';
FLUSH PRIVILEGES;
Get the initial replication position:
-- On the master
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
Note the File and Position values. Keep this session open (do not close it until the dump is complete).
Take the initial dump (in a second terminal):
# Full dump of the asterisk database
mysqldump -u root -p --single-transaction --master-data=2 \
--databases asterisk > /tmp/master-dump.sql
# If using MyISAM tables (which ViciDial does), use --lock-all-tables instead:
mysqldump -u root -p --lock-all-tables --master-data=2 \
--databases asterisk > /tmp/master-dump.sql
Release the lock (in the first terminal):
UNLOCK TABLES;
Transfer the dump to the replica:
scp -P 9322 /tmp/master-dump.sql replica_server:/tmp/
8.3 Replica Configuration
Add to my.cnf on the replica:
[mysqld]
server_id = 10
read_only = 1 # Prevent accidental writes
relay_log = /var/lib/mysql/relay-bin
relay_log_purge = 1
slave_skip_errors = 1032,1062,1690 # 1032=row not found, 1062=dup key, 1690=range
# Normal with MyISAM's lack of txn consistency
log_slave_updates = 0 # Save disk; enable only if chaining replicas
# Replica is read-optimized
key_buffer_size = 4G
query_cache_type = 1
query_cache_size = 128M # More effective on replica (lower invalidation rate)
Load the initial dump:
mysql -u root -p < /tmp/master-dump.sql
Configure and start replication:
-- On the replica
CHANGE MASTER TO
MASTER_HOST = 'MASTER_IP',
MASTER_PORT = 3306,
MASTER_USER = 'repl_user',
MASTER_PASSWORD = 'StrongPassword123!',
MASTER_LOG_FILE = 'mysql-bin.000042', -- From SHOW MASTER STATUS
MASTER_LOG_POS = 15423876; -- From SHOW MASTER STATUS
START SLAVE;
-- Verify
SHOW SLAVE STATUS\G
Check these fields in SHOW SLAVE STATUS:
Slave_IO_Running: Yes -- Must be Yes
Slave_SQL_Running: Yes -- Must be Yes
Seconds_Behind_Master: 0 -- Should be 0 or very low
Last_IO_Error: -- Must be empty
Last_SQL_Error: -- Must be empty
8.4 Pointing Reports to the Replica
ViciDial has a built-in setting: Admin -> System Settings -> Report Database Server. Set it to the replica IP. For Grafana and custom reports, point them at the replica directly.
8.5 Handling Replica Lag
SHOW SLAVE STATUS\G -- Check Seconds_Behind_Master
- 0-5 seconds: Normal.
- 5-30 seconds: Acceptable for reports.
- 30+ seconds: Investigate (slow replica hardware, competing report queries, network latency).
Enable parallel replication (MariaDB 10.0+):
slave_parallel_threads = 4 # Match available CPU cores
slave_parallel_mode = optimistic
8.6 Multi-Source Replication
Multiple ViciDial servers can replicate to one reporting replica using MariaDB's multi-source replication (10.0+). Each connection uses replicate_rewrite_db to map each server's asterisk database to a unique name (e.g., alpha_asterisk). This enables centralized cross-site reporting from a single server.
9. Partitioning Large Tables
Table partitioning divides a single table into multiple physical segments based on a partition key (typically a date column). Each partition is stored as a separate file on disk. This provides two major benefits for ViciDial:
- Faster queries: When a query includes the partition key (e.g.,
WHERE call_date BETWEEN ...), MariaDB only scans the relevant partitions ("partition pruning"), skipping all others. - Faster archiving: Dropping a partition is nearly instantaneous, compared to
DELETEwhich must remove rows one by one and update indexes.
9.1 When to Partition
Partitioning is worth the complexity for tables that:
- Have more than 50 million rows
- Are queried primarily by date range
- Need fast archiving/purging of old data
For tables under 10 million rows, proper indexing (Section 4) and archiving (Section 7) are sufficient.
9.2 Partitioning vicidial_log by Month
-- Step 1: Verify table engine and current structure
SHOW CREATE TABLE vicidial_log\G
-- Step 2: Create a new partitioned table
-- NOTE: The partition key must be part of the primary key or unique key.
-- This often requires altering the primary key.
-- First, check the current primary key:
SHOW INDEX FROM vicidial_log WHERE Key_name = 'PRIMARY';
-- If the primary key does not include call_date, we need to modify it.
-- Example: if primary key is (uniqueid), we change to (uniqueid, call_date):
ALTER TABLE vicidial_log
DROP PRIMARY KEY,
ADD PRIMARY KEY (uniqueid, call_date);
-- Step 3: Add partitions by month using RANGE on UNIX_TIMESTAMP
ALTER TABLE vicidial_log
PARTITION BY RANGE (TO_DAYS(call_date)) (
PARTITION p2026_01 VALUES LESS THAN (TO_DAYS('2026-02-01')),
PARTITION p2026_02 VALUES LESS THAN (TO_DAYS('2026-03-01')),
PARTITION p2026_03 VALUES LESS THAN (TO_DAYS('2026-04-01')),
PARTITION p2026_04 VALUES LESS THAN (TO_DAYS('2026-05-01')),
PARTITION p2026_05 VALUES LESS THAN (TO_DAYS('2026-06-01')),
PARTITION p2026_06 VALUES LESS THAN (TO_DAYS('2026-07-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
9.3 Partitioning call_log
-- call_log typically uses start_time as the date column
ALTER TABLE call_log
PARTITION BY RANGE (TO_DAYS(start_time)) (
PARTITION p2026_01 VALUES LESS THAN (TO_DAYS('2026-02-01')),
PARTITION p2026_02 VALUES LESS THAN (TO_DAYS('2026-03-01')),
PARTITION p2026_03 VALUES LESS THAN (TO_DAYS('2026-04-01')),
PARTITION p2026_04 VALUES LESS THAN (TO_DAYS('2026-05-01')),
PARTITION p2026_05 VALUES LESS THAN (TO_DAYS('2026-06-01')),
PARTITION p2026_06 VALUES LESS THAN (TO_DAYS('2026-07-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
9.4 Fast Archiving with DROP PARTITION
Instead of DELETE FROM vicidial_log WHERE call_date < '2026-01-01' (which scans and deletes millions of rows):
-- Near-instantaneous removal of an entire month of data
ALTER TABLE vicidial_log DROP PARTITION p2025_12;
This is orders of magnitude faster than DELETE because it simply removes the partition's data files from disk without scanning rows.
9.5 Adding New Partitions
Add new partitions before p_future fills up:
ALTER TABLE vicidial_log
REORGANIZE PARTITION p_future INTO (
PARTITION p2026_07 VALUES LESS THAN (TO_DAYS('2026-08-01')),
PARTITION p2026_08 VALUES LESS THAN (TO_DAYS('2026-09-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Automate with a monthly cron job that calculates 3 months ahead and runs REORGANIZE PARTITION.
9.6 Verifying Partition Pruning
-- Check that queries only scan relevant partitions
EXPLAIN PARTITIONS
SELECT COUNT(*) FROM vicidial_log
WHERE call_date BETWEEN '2026-03-01' AND '2026-03-31';
The partitions column should show only p2026_03, not all partitions.
9.7 Partitioning Caveats
- Primary key must include the partition key. This can break ViciDial's assumption of unique primary keys if the original PK does not include the date column. Test thoroughly.
- MyISAM partitioned tables lock all partitions before pruning on MariaDB versions before 10.5. Upgrade MariaDB if you are on an older version.
- Replication compatibility. If the master is partitioned, the slave should also be partitioned the same way. Or, replicate with
binlog_format = ROWto avoid partition-dependent SQL statements. - Backup complexity. Partitioned tables produce multiple .MYD and .MYI files per partition. Ensure your backup scripts handle this.
- Test on a non-production server first. Partitioning a live ViciDial table wrong can break the dialer.
10. The sip_packets Table Problem
The sip_packets table is one of the most common causes of unexpected disk usage on ViciDial servers. It stores raw SIP packet captures and can grow to tens of gigabytes without anyone noticing until the disk is full.
10.1 Checking the Size
SELECT
table_name,
ROUND(data_length / 1024 / 1024, 1) AS data_mb,
ROUND(index_length / 1024 / 1024, 1) AS index_mb,
ROUND((data_length + index_length) / 1024 / 1024, 1) AS total_mb,
table_rows
FROM information_schema.tables
WHERE table_schema = 'asterisk'
AND table_name = 'sip_packets';
If this returns several gigabytes, you have the problem.
10.2 What Fills It
The AST_sip_event_logging.pl process captures SIP messages and inserts them into sip_packets. On a busy server (1,000+ calls/day), this creates several hundred thousand rows per day at 1-3 KB each.
ps aux | grep -i sip_event # Check if running
crontab -l | grep -i sip_event # Check cron
systemctl status sip_event_logging # Check systemd service
10.3 Safe Truncation
If you do not need SIP capture data (most installations do not), truncate the table:
-- Check size first
SELECT COUNT(*) FROM sip_packets;
-- TRUNCATE is much faster than DELETE -- it drops and recreates the table
TRUNCATE TABLE sip_packets;
TRUNCATE is nearly instantaneous regardless of table size, compared to DELETE FROM sip_packets which would scan and delete every row.
10.4 Disabling SIP Capture
crontab -e # Remove sip_event_logging lines
systemctl stop sip_event_logging # Stop service
systemctl disable sip_event_logging # Prevent restart
Also check Admin -> System Settings -> SIP Event Logging and set to disabled.
10.5 If You Need SIP Capture
Keep it running but add a purge cron to retain only 3 days:
0 2 * * * mysql -u root asterisk -e "DELETE FROM sip_packets WHERE sip_call_date < DATE_SUB(NOW(), INTERVAL 3 DAY) LIMIT 500000;"
For longer retention, send SIP captures to a dedicated Homer server instead.
11. Monitoring Database Health
Ongoing monitoring catches problems before they become outages. A slow query that takes 5 seconds today will take 50 seconds in 6 months as the table grows.
11.1 Prometheus mysqld_exporter Setup
The mysqld_exporter from Prometheus provides comprehensive MySQL/MariaDB metrics.
Install:
# Download latest release
cd /opt
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.16.0/mysqld_exporter-0.16.0.linux-amd64.tar.gz
tar xzf mysqld_exporter-0.16.0.linux-amd64.tar.gz
mv mysqld_exporter-0.16.0.linux-amd64 mysqld_exporter
Create MySQL user for the exporter:
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'ExporterPass123!';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;
Create credentials file:
cat > /opt/mysqld_exporter/.my.cnf << 'EOF'
[client]
user=exporter
password=ExporterPass123!
EOF
chmod 600 /opt/mysqld_exporter/.my.cnf
Systemd service (/etc/systemd/system/mysqld-exporter.service):
[Unit]
Description=Prometheus MySQL Exporter
After=network.target mariadb.service
[Service]
User=prometheus
ExecStart=/opt/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/opt/mysqld_exporter/.my.cnf \
--web.listen-address=:9104 \
--collect.info_schema.tables --collect.info_schema.processlist \
--collect.global_status --collect.global_variables
Restart=always
[Install]
WantedBy=multi-user.target
11.2 Key Metrics to Monitor
These are the most important metrics from mysqld_exporter for ViciDial:
| Metric | Meaning | Alert Threshold |
|---|---|---|
mysql_up |
Database reachable (1=yes, 0=no) | Alert if 0 for > 30 seconds |
mysql_global_status_threads_connected |
Current open connections | Alert if > 80% of max_connections |
mysql_global_status_slow_queries |
Cumulative slow query count | Alert if rate > 1/minute |
mysql_global_status_table_locks_waited |
Lock waits (MyISAM contention) | Alert if rate > 10/minute |
mysql_global_status_key_reads |
Key cache misses (disk reads) | Alert if miss ratio > 1% |
mysql_global_status_created_tmp_disk_tables |
Temp tables on disk | Alert if > 25% of total |
mysql_global_status_questions |
Total queries/second | Baseline monitoring |
mysql_global_status_bytes_received |
Network traffic in | Capacity planning |
mysql_global_status_bytes_sent |
Network traffic out | Capacity planning |
mysql_slave_status_seconds_behind_master |
Replication lag (on replica) | Alert if > 60 seconds |
mysql_slave_status_slave_io_running |
Replication IO thread (on replica) | Alert if 0 |
mysql_slave_status_slave_sql_running |
Replication SQL thread (on replica) | Alert if 0 |
11.3 Grafana Dashboard
Import Grafana dashboard ID 7362 or 14057 for MySQL/MariaDB. Key custom PromQL panels for ViciDial:
- Lock ratio:
rate(mysql_global_status_table_locks_waited[5m]) / rate(mysql_global_status_table_locks_immediate[5m]) * 100 - Key cache miss:
rate(mysql_global_status_key_reads[5m]) / rate(mysql_global_status_key_read_requests[5m]) * 100 - Connection %:
mysql_global_status_threads_connected / mysql_global_variables_max_connections * 100 - Slow/min:
rate(mysql_global_status_slow_queries[5m]) * 60
11.4 Simple Shell-Based Monitoring
If Prometheus is not available, a cron-based check (/usr/local/bin/mysql-health-check.sh, run every 5 min):
#!/bin/bash
CONN=$(mysql -u root -N -e "SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME='Threads_connected';")
[ "$CONN" -gt 400 ] && echo "High connections: $CONN" | mail -s "[DB ALERT] $(hostname)" [email protected]
LOCKED=$(mysql -u root -N -e "SELECT COUNT(*) FROM information_schema.PROCESSLIST WHERE STATE LIKE '%lock%';")
[ "$LOCKED" -gt 10 ] && mysql -u root -e "SHOW FULL PROCESSLIST;" >> /var/log/mysql/lock-events.log
DISK=$(df /var/lib/mysql | tail -1 | awk '{print $5}' | tr -d '%')
[ "$DISK" -gt 85 ] && echo "Disk at ${DISK}%" | mail -s "[DB ALERT] $(hostname)" [email protected]
11.5 mysqltuner.pl
The mysqltuner.pl tool provides a comprehensive one-time analysis with actionable recommendations:
# Download and run
wget https://raw.githubusercontent.com/major/MySQLTuner-perl/master/mysqltuner.pl
perl mysqltuner.pl --user root --pass YOUR_PASSWORD
# ViciDial includes a similar tool:
# The mysql-tuning.sh in the extras directory
chmod +x /usr/share/astguiclient/extras/mysql-tuning.sh
/usr/share/astguiclient/extras/mysql-tuning.sh
Run mysqltuner.pl after the server has been running for at least 24 hours under normal load. Results on a freshly started server are meaningless.
12. Emergency Procedures
When the database is down or critically degraded, every minute costs calls. These procedures are for emergency stabilization, not optimal long-term configuration.
12.1 MariaDB Will Not Start
Check the error log first:
tail -50 /var/log/mysql/error.log # or /var/log/mysqld.log
Common failures:
| Error | Fix |
|---|---|
| "Table is marked as crashed" | systemctl stop mariadb && cd /var/lib/mysql/asterisk/ && myisamchk -r TABLE.MYI && systemctl start mariadb |
| "Unable to lock ./ibdata1" | Another mysqld running: pkill -9 mysqld && sleep 2 && systemctl start mariadb |
| "Can't open file TABLE.MYI" | If corrupt: myisamchk -r TABLE.MYI. If missing: REPAIR TABLE asterisk.TABLE USE_FRM; |
InnoDB corruption:
# Add temporarily to my.cnf: innodb_force_recovery = 1
# Start MariaDB, immediately dump: mysqldump --all-databases > /tmp/emergency.sql
# Remove the setting, restore from dump
# Try levels 1-3 first. Levels 4-6 are DANGEROUS (permanent corruption risk).
12.2 Disk Full Emergency
When the disk fills, MariaDB stops accepting writes, the dialer freezes, and agents disconnect.
Immediate space recovery (in priority order):
du -sh /var/lib/mysql/asterisk/* | sort -rh | head -20 # 1. Find what's using space
mysql -u root -e "TRUNCATE TABLE asterisk.sip_packets;" # 2. Often the biggest win
mysql -u root -e "PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 2 DAY);" # 3. Binlogs
> /var/log/mysql/slow-query.log # 4. Slow query log
mysql -u root -e "DELETE FROM asterisk.server_performance WHERE start_time < DATE_SUB(NOW(), INTERVAL 7 DAY) LIMIT 500000;"
WARNING: Before purging binary logs, verify replica has processed them: SHOW SLAVE STATUS\G on replica, check Relay_Master_Log_File.
Prevent recurrence: Add expire_logs_days = 3 to my.cnf.
12.3 Replication Broken
Symptoms: Reports showing stale data, Slave_IO_Running: No or Slave_SQL_Running: No.
-- On the replica, check status
SHOW SLAVE STATUS\G
IO thread stopped: Check Last_IO_Error in SHOW SLAVE STATUS. Common causes: master moved past needed binlog (re-sync needed), authentication error (verify repl user on master).
SQL thread stopped: Check Last_SQL_Error. Skip a single error:
STOP SLAVE; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE;
If errors persist, full resync is safer: STOP SLAVE; RESET SLAVE ALL; then re-dump (Section 8.2/8.3).
12.4 Killing Runaway Queries
When a report query has been running for 10 minutes and is locking tables:
-- Find the offending query
SHOW FULL PROCESSLIST;
-- Kill it by thread ID
KILL 12345;
-- If KILL does not work (rare):
KILL QUERY 12345; -- Kills only the query, not the connection
Prevent future occurrences:
-- Set a query timeout (MariaDB 10.1+)
SET GLOBAL max_statement_time = 300; -- 5-minute timeout
-- Add to my.cnf to persist
-- max_statement_time = 300
12.5 Emergency Bulk Repair
When multiple tables crash after power failure, stop MariaDB and repair all MyISAM tables:
systemctl stop mariadb
cd /var/lib/mysql/asterisk/
for MYI in *.MYI; do
myisamchk -c "$MYI" || myisamchk -r --sort_buffer_size=512M "$MYI" || \
myisamchk --safe-recover --sort_buffer_size=512M "$MYI"
done
systemctl start mariadb
13. Performance Benchmarking
Measuring performance before and after changes is essential. Without measurements, you cannot tell whether a change helped, hurt, or did nothing.
13.1 Baseline Measurements
Before making any changes, capture baseline metrics:
-- Current query performance
SHOW GLOBAL STATUS LIKE 'Questions';
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SHOW GLOBAL STATUS LIKE 'Table_locks_waited';
SHOW GLOBAL STATUS LIKE 'Key_read%';
SHOW GLOBAL STATUS LIKE 'Created_tmp%';
-- Record these values and the current timestamp
-- Check again after 1 hour
-- Calculate rates: (value_after - value_before) / seconds_elapsed
Record these values and check again after 1 hour to calculate rates.
13.2 Benchmarking Specific ViciDial Queries
These are the actual query patterns that matter for ViciDial performance. Time them before and after changes.
Real-time agent screen query (runs every second):
-- Time the agent screen lookup
SET @start = NOW(6);
SELECT * FROM vicidial_live_agents WHERE user = 'agent001';
SELECT TIMESTAMPDIFF(MICROSECOND, @start, NOW(6)) / 1000 AS ms;
-- Target: < 5 ms
Inbound report query (common slow query):
-- Time a typical inbound report
SET @start = NOW(6);
SELECT
campaign_id, status, COUNT(*) as calls,
AVG(length_in_sec) as avg_length
FROM vicidial_closer_log
WHERE call_date BETWEEN '2026-03-01' AND '2026-03-13'
GROUP BY campaign_id, status;
SELECT TIMESTAMPDIFF(MICROSECOND, @start, NOW(6)) / 1000 AS ms;
-- Target: < 2000 ms (2 seconds) for 30-day range
Agent performance report:
SET @start = NOW(6);
SELECT
user,
SUM(talk_sec) as total_talk,
SUM(pause_sec) as total_pause,
SUM(wait_sec) as total_wait,
COUNT(*) as events
FROM vicidial_agent_log
WHERE event_time BETWEEN '2026-03-01' AND '2026-03-13'
GROUP BY user;
SELECT TIMESTAMPDIFF(MICROSECOND, @start, NOW(6)) / 1000 AS ms;
-- Target: < 3000 ms for 30-day range
Call log lookup by uniqueid (join pattern):
SET @start = NOW(6);
SELECT cl.*, vcl.status, vcl.user
FROM call_log cl
LEFT JOIN vicidial_closer_log vcl ON cl.uniqueid = vcl.uniqueid
WHERE cl.uniqueid = '1741234567.12345';
SELECT TIMESTAMPDIFF(MICROSECOND, @start, NOW(6)) / 1000 AS ms;
-- Target: < 50 ms
13.3 Synthetic Benchmarking with mysqlslap
# Concurrency test: 50 clients, 200 queries each
mysqlslap --user=root --password=PASS \
--concurrency=50 --iterations=3 --number-of-queries=200 \
--create-schema=test_benchmark --auto-generate-sql
# Simulate ViciDial read pattern
mysqlslap --user=root --password=PASS \
--concurrency=100 --iterations=5 \
--query="SELECT * FROM asterisk.vicidial_live_agents LIMIT 50;" \
--number-of-queries=1000
Compare "Average number of seconds" before and after changes. Lower is better.
13.4 Before/After Methodology
- Capture baseline during normal business hours
- Run query benchmarks 3 times, record median
- Make one change only
- Wait 2+ hours for caches to warm
- Capture new metrics at same time of day, similar load
- Run same benchmarks, compare
Do not make multiple changes at once, benchmark on cold caches, or compare different-load days.
13.5 Quick Performance Health Summary
Run this single query to get an overview of database health at any point:
SELECT
-- Key buffer efficiency
ROUND((1 - (
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Key_reads') /
GREATEST((SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Key_read_requests'), 1)
)) * 100, 2) AS key_cache_hit_pct,
-- Lock contention
ROUND(
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Table_locks_waited') /
GREATEST((SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Table_locks_immediate'), 1)
* 100, 4
) AS lock_wait_pct,
-- Temp tables on disk
ROUND(
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Created_tmp_disk_tables') /
GREATEST((SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Created_tmp_tables'), 1)
* 100, 2
) AS tmp_disk_pct,
-- Connection usage
ROUND(
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Threads_connected') /
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_VARIABLES WHERE VARIABLE_NAME = 'max_connections')
* 100, 2
) AS connection_used_pct,
-- Slow queries per hour (approximate)
ROUND(
(SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Slow_queries') /
GREATEST((SELECT VARIABLE_VALUE FROM information_schema.GLOBAL_STATUS WHERE VARIABLE_NAME = 'Uptime') / 3600, 1)
, 2) AS slow_per_hour;
Target values:
| Metric | Good | Warning | Critical |
|---|---|---|---|
| key_cache_hit_pct | > 99.9% | 99-99.9% | < 99% |
| lock_wait_pct | < 0.1% | 0.1-1% | > 1% |
| tmp_disk_pct | < 10% | 10-25% | > 25% |
| connection_used_pct | < 50% | 50-80% | > 80% |
| slow_per_hour | < 1 | 1-10 | > 10 |
Quick Reference: Priority Order of Changes
If you are starting fresh and need to prioritize, implement these changes in order of impact:
| Priority | Change | Expected Impact | Risk | Section |
|---|---|---|---|---|
| 1 | Archive old data (--days=90) | 50-80% improvement on report queries | Low (data preserved in _archive tables) | 7 |
| 2 | Tune key_buffer_size | 20-40% improvement on index lookups | Very low (restart required) | 3 |
| 3 | Add indexes on log tables | 10x improvement on specific report queries | Low (slightly slower INSERTs) | 4 |
| 4 | Truncate sip_packets | Recover gigabytes of disk space | Very low (SIP capture data is rarely needed) | 10 |
| 5 | Set up read replica for reports | Eliminates report lock contention entirely | Medium (operational complexity) | 8 |
| 6 | Tune query cache (or disable) | 5-15% improvement on multi-core systems | Low (reversible) | 5 |
| 7 | Enable slow query log | No direct improvement, but identifies targets | Very low | 2 |
| 8 | Regular table maintenance | Prevents degradation over time | Low (must schedule off-hours) | 6 |
| 9 | Partition large tables | Faster archiving and query pruning | Medium (schema change, test first) | 9 |
| 10 | Set max_statement_time | Prevents runaway query lockups | Very low | 3 |
References
ViciDial Forum Threads: Optimizing MySQL | MySQL Clustering | Query Cache Tuning | my.cnf Recommendations | live_agent Table Locks | MEMORY Tables | Archive Recommendations | OPTIMIZE TABLE Hangs | Adding Slave DB
ViciDial Source: ADMIN_archive_log_tables.pl | mysql-tuning.sh | AST_DB_optimize.pl | ViciWiki DB Structure
MariaDB Documentation: Optimizing key_buffer_size | Query Cache | RANGE Partitioning | Setting Up Replication | InnoDB Recovery | REPAIR TABLE | Slow Query Log | MDEV-18764 Query Cache Contention
Tools: Prometheus mysqld_exporter | MySQLTuner-perl | Dialer.one MySQL Tuning Guide | Percona pt-query-digest
Tutorial verified against MariaDB 10.1, 10.5, 10.11, and ViciDial SVN trunk (2024-2026). Settings tested on production systems running 14 to 130+ concurrent agents across multiple servers.