MySQL MTS复制: hitting slave_pending_jobs_size_max

测试步骤:

从库停止复制:stop slave;

主库创建大表400万条记录。

开启从库复制:start slave;

监测从库error log持续输出:

2018-12-06T10:40:52.616289+08:00 4 [Note] Multi-threaded slave: Coordinator has waited 2431 times hitting slave_pending_jobs_size_max; current event size = 8207.
2018-12-06T10:40:52.647618+08:00 4 [Note] Multi-threaded slave: Coordinator has waited 2441 times hitting slave_pending_jobs_size_max; current event size = 8207.
2018-12-06T10:40:52.679589+08:00 4 [Note] Multi-threaded slave: Coordinator has waited 2451 times hitting slave_pending_jobs_size_max; current event size = 8207.
2018-12-06T10:40:52.711510+08:00 4 [Note] Multi-threaded slave: Coordinator has waited 2461 times hitting slave_pending_jobs_size_max; current event size = 8207.
2018-12-06T10:40:52.750250+08:00 4 [Note] Multi-threaded slave: Coordinator has waited 2471 times hitting slave_pending_jobs_size_max; current event size = 8207.
2018-12-06T10:40:52.785731+08:00 4 [Note] Multi-threaded slave: Coordinator has waited 2481 times hitting slave_pending_jobs_size_max; current event size = 8207.

 

搜索发现报错有以下两种情况

第一种


Last_Error: Cannot schedule event Rows_query, relay-log name ./db-s18-relay-bin.000448, position 419156572 to Worker thread because its size 18483519 exceeds 16777216 of slave_pending_jobs_size_max.

第二种


[Note] Multi-threaded slave: Coordinator has waited 701 times hitting slave_pending_jobs_size_max; current event size = 8167.

BUG地址:https://bugs.mysql.com/bug.php?id=68462

以上两种报错,初步判断问题可能在 slave_pending_jobs_size_max 的大小上,此值,官方默认是 16M,此值可以动态调整 slave-pending-jobs-size-max参数说明

在多线程复制时,在队列中Pending的事件所占用的最大内存,默认为16M,如果内存富余,或者延迟较大时,可以适当调大;注意这个值要比主库的max_allowed_packet大
slave-pending-jobs-size-max有如下几种情况:


1.- 如果event大小已经超过了等待任务大小的上限(配置slave-pending-jobs-size-max ),就报event太大的错,然后返回;


2.- 如果event大小+已经在等待的任务大小超过了slave-pending-jobs-size-max,就等待,至到等待队列变小;


3.- 如果当前的worker的队列满的话,也等待。

———————   检查slave_pending_jobs_size_max参数值为默认: +—————————–+———-+

| Variable_name               | Value    |

+—————————–+———-+

| slave_pending_jobs_size_max | 16777216 |

+—————————–+———-+   调大该参数: root@localhost:3306.sock [(none)]>set global slave_pending_jobs_size_max=16777216*8;

Query OK, 0 rows affected (0.02 sec) root@localhost:3306.sock [(none)]>show variables like ‘%job%‘;

+—————————–+———–+

| Variable_name               | Value     |

+—————————–+———–+

| slave_pending_jobs_size_max | 134217728 |

+—————————–+———–+

1 row in set (0.00 sec)   重新测试,告警消失。

 
WL#11348: Defaults: Increase Slave‘s Multi-Threaded Event Applier Buffer 看来MySQL 8.0的下一个版本会将默认值提高了。