[MXS-1222] 200% CPU (2 threads) usage on BinLogRouter Created: 2017-04-17  Updated: 2020-08-25  Resolved: 2017-09-27

Status: Closed
Project: MariaDB MaxScale
Component/s: binlogrouter
Affects Version/s: 1.4.1
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Michaël de groot Assignee: Unassigned
Resolution: Incomplete Votes: 1
Labels: None


 Description   

Hi,

I have a MaxScale BinLogServer instance that occasionally runs at 200% CPU usage. Threads is configured to 2, so it is consuming as much CPU as possible.

This is the output of gdb:

[root@dbbck02 ~]# gdb -ex "set pagination 0" -ex "thread apply all bt" \
>  --batch -p 38662          
[New LWP 38663]
[New LWP 38665]
[New LWP 38666]
[New LWP 38667]
[New LWP 38668]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__memcpy_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:273
273     ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: No such file or directory.
Thread 6 (Thread 0x7f7af5462700 (LWP 38668)):
#0  0x00007f7b139b9e23 in epoll_wait () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f7b1458ad5d in poll_waitevents (arg=0x1) at /home/vagrant/workspace/server/core/poll.c:591
#2  0x0000000000404e8b in worker_thread_main (arg=0x1) at /home/vagrant/workspace/server/core/gateway.c:1024
#3  0x00007f7b140c76ba in start_thread (arg=0x7f7af5462700) at pthread_create.c:333
#4  0x00007f7b139b982d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 5 (Thread 0x7f7af5c63700 (LWP 38667)):
#0  0x00007f7b140d0c1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f7b1459795c in thread_millisleep (ms=100) at /home/vagrant/workspace/server/core/thread.c:74
#2  0x00007f7b145805c1 in hkthread (data=0x0) at /home/vagrant/workspace/server/core/housekeeper.c:277
#3  0x00007f7b140c76ba in start_thread (arg=0x7f7af5c63700) at pthread_create.c:333
#4  0x00007f7b139b982d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 4 (Thread 0x7f7b08b9d700 (LWP 38666)):
#0  0x00007f7b140d0c1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000000000406efe in log_flush_cb (arg=0x7ffce5d1afa0) at /home/vagrant/workspace/server/core/gateway.c:2066
#2  0x00007f7b140c76ba in start_thread (arg=0x7f7b08b9d700) at pthread_create.c:333
#3  0x00007f7b139b982d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 3 (Thread 0x7f7b0ba15700 (LWP 38665)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007f7b0c1927a2 in inline_mysql_cond_timedwait (abstime=0x7f7b0ba14dc0, mutex=0x7f7b0d25cca0 <LOCK_checkpoint>, that=0x7f7b0d25cce0 <COND_checkpoint>, src_file=<optimized out>, src_line=<optimized out>) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1020
#2  my_service_thread_sleep (control=control@entry=0x7f7b0ca27420 <checkpoint_control>, sleep_time=<optimized out>) at /home/buildbot/buildbot/build/storage/maria/ma_servicethread.c:118
#3  0x00007f7b0c18acf7 in ma_checkpoint_background (arg=0x1e) at /home/buildbot/buildbot/build/storage/maria/ma_checkpoint.c:705
#4  0x00007f7b140c76ba in start_thread (arg=0x7f7b0ba15700) at pthread_create.c:333
#5  0x00007f7b139b982d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 2 (Thread 0x7f7b0e348700 (LWP 38663)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f7b145997dd in skygw_message_wait (mes=0x21f5db0) at /home/vagrant/workspace/utils/skygw_utils.cc:790
#2  0x00007f7b14584889 in thr_filewriter_fun (data=0x21f6250) at /home/vagrant/workspace/server/core/log_manager.cc:2281
#3  0x00007f7b140c76ba in start_thread (arg=0x7f7b0e348700) at pthread_create.c:333
#4  0x00007f7b139b982d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 1 (Thread 0x7f7b14a42780 (LWP 38662)):
#0  __memcpy_avx_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:273
#1  0x00007f7b1456b759 in gwbuf_make_contiguous (orig=0x23213c0) at /home/vagrant/workspace/server/core/buffer.c:834
#2  0x00007f7b145875d2 in modutil_get_complete_packets (p_readbuf=0x7ffce5d17b88) at /home/vagrant/workspace/server/core/modutil.c:576
#3  0x00007f7b08fb21d0 in gw_read_backend_event (dcb=0x7f7aec000d20) at /home/vagrant/workspace/server/modules/protocol/mysql_backend.c:512
#4  0x00007f7b1458bb01 in process_pollq (thread_id=0) at /home/vagrant/workspace/server/core/poll.c:999
#5  0x00007f7b1458b2e5 in poll_waitevents (arg=0x0) at /home/vagrant/workspace/server/core/poll.c:722
#6  0x0000000000406c73 in main (argc=9, argv=0x7ffce5d1d4b8) at /home/vagrant/workspace/server/core/gateway.c:1975

I have 2 "equal" instances of MaxScale running, I observe this issue only on 1 of the 2. The network path of maxscale to the target server is the only obvious difference. Both instances have mariadb10-compatibility=1 while they are replicating from an old percona 5.6 version.

The instance starts to hang at the middle of a large transaction. It remains hanging for a couple of hours, and then, it seams magically, continues. The binlog where it hangs on ends up being equal on both instances.

In the log, I see frequent disconnects on the node that has the issue:

2017-04-17 01:14:03   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Table Map Event 19) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 01:14:13   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035013, pos 6410894
2017-04-17 01:14:13   notice : Binlog_Service: Request binlog records from mysql-bin.035013 at position 6410894 from master server ml-dbrw-public.customer.com:3306
2017-04-17 01:14:13   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 01:14:13   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 01:24:13   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 599 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 01:24:23   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035013, pos 48287084
2017-04-17 01:24:23   notice : Binlog_Service: Request binlog records from mysql-bin.035013 at position 48287084 from master server ml-dbrw-public.customer.com:3306
2017-04-17 01:24:23   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 01:24:23   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 01:44:30   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Table Map Event 19) received 409 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 01:44:40   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035018, pos 3190820
2017-04-17 01:44:40   notice : Binlog_Service: Request binlog records from mysql-bin.035018 at position 3190820 from master server ml-dbrw-public.customer.com:3306
2017-04-17 01:44:40   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 01:44:40   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 01:54:40   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 535 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 01:54:50   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035018, pos 123140995
2017-04-17 01:54:50   notice : Binlog_Service: Request binlog records from mysql-bin.035018 at position 123140995 from master server ml-dbrw-public.customer.com:3306
2017-04-17 01:54:50   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 01:54:50   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 02:04:53   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 02:05:03   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035019, pos 5392684
2017-04-17 02:05:03   notice : Binlog_Service: Request binlog records from mysql-bin.035019 at position 5392684 from master server ml-dbrw-public.customer.com:3306
2017-04-17 02:05:03   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 02:05:03   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 04:23:05   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Query Event 2) received 588 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 04:23:16   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035023, pos 3566661
2017-04-17 04:23:16   notice : Binlog_Service: Request binlog records from mysql-bin.035023 at position 3566661 from master server ml-dbrw-public.customer.com:3306
2017-04-17 04:23:16   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 04:23:16   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 04:38:17   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Table Map Event 19) received 438 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 04:38:27   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035024, pos 8269275
2017-04-17 04:38:27   notice : Binlog_Service: Request binlog records from mysql-bin.035024 at position 8269275 from master server ml-dbrw-public.customer.com:3306
2017-04-17 04:38:27   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 04:38:27   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 04:48:27   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Table Map Event 19) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 04:48:37   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035024, pos 8351622
2017-04-17 04:48:37   notice : Binlog_Service: Request binlog records from mysql-bin.035024 at position 8351622 from master server ml-dbrw-public.customer.com:3306
2017-04-17 04:48:37   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 04:48:37   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 04:58:37   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 469 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 04:58:47   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035024, pos 84358664
2017-04-17 04:58:47   notice : Binlog_Service: Request binlog records from mysql-bin.035024 at position 84358664 from master server ml-dbrw-public.customer.com:3306
2017-04-17 04:58:47   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 04:58:47   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 05:39:08   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 05:39:18   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035025, pos 60123421
2017-04-17 05:39:18   notice : Binlog_Service: Request binlog records from mysql-bin.035025 at position 60123421 from master server ml-dbrw-public.customer.com:3306
2017-04-17 05:39:18   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 05:39:18   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 05:51:36   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 05:51:46   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035025, pos 137705181
2017-04-17 05:51:46   notice : Binlog_Service: Request binlog records from mysql-bin.035025 at position 137705181 from master server ml-dbrw-public.customer.com:3306
2017-04-17 05:51:46   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 05:51:46   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 08:00:10   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 343 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 08:00:20   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035048, pos 386554911
2017-04-17 08:00:20   notice : Binlog_Service: Request binlog records from mysql-bin.035048 at position 386554911 from master server ml-dbrw-public.customer.com:3306
2017-04-17 08:00:20   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 08:00:20   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 08:37:21   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Transaction ID Event (2 Phase Commit) 16) received 477 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 08:37:31   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035078, pos 67939334
2017-04-17 08:37:31   notice : Binlog_Service: Request binlog records from mysql-bin.035078 at position 67939334 from master server ml-dbrw-public.customer.com:3306
2017-04-17 08:37:31   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 08:37:31   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 10:21:36   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 10:21:46   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035078, pos 348744796
2017-04-17 10:21:46   notice : Binlog_Service: Request binlog records from mysql-bin.035078 at position 348744796 from master server ml-dbrw-public.customer.com:3306
2017-04-17 10:21:46   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 10:21:46   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 10:54:03   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 10:54:13   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035078, pos 506971295
2017-04-17 10:54:13   notice : Binlog_Service: Request binlog records from mysql-bin.035078 at position 506971295 from master server ml-dbrw-public.customer.com:3306
2017-04-17 10:54:13   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 10:54:13   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 11:09:21   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Update Rows Event (v2) 31) received 559 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 11:09:31   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035079, pos 174500715
2017-04-17 11:09:31   notice : Binlog_Service: Request binlog records from mysql-bin.035079 at position 174500715 from master server ml-dbrw-public.customer.com:3306
2017-04-17 11:09:31   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 11:09:31   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 11:19:31   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Update Rows Event (v2) 31) received 600 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 11:19:42   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035079, pos 179126059
2017-04-17 11:19:42   notice : Binlog_Service: Request binlog records from mysql-bin.035079 at position 179126059 from master server ml-dbrw-public.customer.com:3306
2017-04-17 11:19:42   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 11:19:42   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 12:43:01   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Transaction ID Event (2 Phase Commit) 16) received 534 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 12:43:11   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035092, pos 14280668
2017-04-17 12:43:11   notice : Binlog_Service: Request binlog records from mysql-bin.035092 at position 14280668 from master server ml-dbrw-public.customer.com:3306
2017-04-17 12:43:11   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 12:43:11   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 13:18:15   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 359 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 13:18:25   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035096, pos 18680470
2017-04-17 13:18:25   notice : Binlog_Service: Request binlog records from mysql-bin.035096 at position 18680470 from master server ml-dbrw-public.customer.com:3306
2017-04-17 13:18:25   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 13:18:25   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log
2017-04-17 14:44:33   error  : No event received from master ml-dbrw-public.customer.com:3306 in heartbeat period (300 seconds), last event (Write Rows Event (v2) 30) received 494 seconds ago. Assuming connection is dead and reconnecting.
2017-04-17 14:44:43   notice : Binlog_Service: attempting to connect to master server ml-dbrw-public.customer.com:3306, binlog mysql-bin.035099, pos 131672424
2017-04-17 14:44:43   notice : Binlog_Service: Request binlog records from mysql-bin.035099 at position 131672424 from master server ml-dbrw-public.customer.com:3306
2017-04-17 14:44:43   notice : Binlog_Service: identity seen by the master: server_id: 5000, uuid: 2f804b99-2056-11e7-9c12-0cc47ac2d22c
2017-04-17 14:44:43   notice : Binlog_Service: identity seen by the slaves: server_id: 205, uuid: cd1ee52b-2e96-11e6-82a7-0cc47a7765f4, hostname: mldb03.customer.com, MySQL version: 5.6.14-62.0-log

I assume that the issue has somethingt to do with the disconnects and the ETL-like process (large transactions) that occur. Please tell me what is needed to investigate, I can send you binlogs or coredump that I generated in a private channel.

Thanks,
Michaël



 Comments   
Comment by markus makela [ 2017-04-17 ]

The 1.4.1 to 1.4.5 diff on the binlogrouter source alone is about 40 commits, I'd strongly recommend updating to 1.4.5 as it contains a significant amount of bugfixes related to the binlogrouter.

Here's the diff:

[markusjm@localhost MaxScale]$ git log --oneline maxscale-1.4.1..maxscale-1.4.5 -- server/modules/routing/binlog/
 
e4028f4 Fix debug assertion on packet length in blr_master.c
ead1c1e MXS-958: a new binlog file is created after CHANGE MASTER if there is no pending transaction
e5b4572 MXS-958: the specified binlog file is created in $binlogdir while configuring binlog server for the first time
445086a Use of MXS_FREE not allowed here
89aa5b8 MXS-960: In BRL, accept passwords with "," in them.
426f237 MXS-961: error in replication stream and checksum calculation
6243daf Removing error messages while executing RESET SLAVE
9436e7f Transaction Safety default is OFF
eb1da43 Typo fix
080e780 Added refresh users on START SLAVE
63f6a63 MXS-584: added support for SET @@session.autocommit = ON | OFF
c4da795 MXS-741 When no events are read from binlog file, the catchup routine is called only if master server is connected.
4bcebe5 Added support for SET autocommit=1
2668645 Code review fix
008510f Addition of SELECT USER()
d18e707 SET NAMES XXX added
098077f Transaction Safety default is ON
cc90c24 Removed useless comparisons
06a5774 Remove unnecessary comparisons
3f99be5 Transaction safety is again disabled by default
affba3c Merge branch '1.2.1-binlog_router_trx' into release-1.4.2
2855389 Non-critical errors in binlogrouter are now properly logged as warnings
aabcc4f Added clarifying comments to event distribution code
25f4238 Merge branch '1.2.1-binlog_router_trx' into release-1.4.2
edb1c18 Added missing check for error packets
bb2d549 Slaves that are behind the master no longer cause an error to be logged
6988c0b Fixed rotate events never being distributed
440bc04 Added more information to binlogrouter error messages
df9f8eb Improved binlogrouter event distribution error messages
335b1e0 Added master pos to log message
52ef898 Made sending of events more strict
aa7e346 Fixed event position updating
cf62ac0 Added more information to the incomplete transaction error message
fe84f95 Removed messages about incomplete transaction events
bb94ffb Added more information to duplicate event error message
0ce4847 Added logging for safe event and current event mismatch
ed93565 Added DCB write queue to error message
0493196 Disconnect slave if duplicate event detected
51e6000 Add duplicate event detection & logging.
5070b81 Reformat binlog router.

Most of these commits touch on transaction safety and how it works but it is very possible that some of these changes will fix the problem you are seeing.

Comment by Johan Wikman [ 2017-05-09 ]

michaeldg Have you updated to 1.4.5? Is the problem still present?

Comment by Michaël de groot [ 2017-05-09 ]

I don't think so, but perhaps wagnerbianchi, Kurt or ben saw this issue returning.

Comment by Wagner Bianchi (Inactive) [ 2017-05-09 ]

Hey Guys,

Yes, regarding what the customer reported and comparing versions currently, Maxscale instances are running on version 1.4.5. Regarding what Alina added, she is not seeing any alerts anymore.

mariadb@xxxxxxx:~$ /usr/bin/maxscale --version
MaxScale 1.4.5

By the way, regarding the chat I had with Markus some days ago, I think if we see again the issue regarding MXS consuming lots of CPU, it seems to be that 2 threads configured currently are not enough and we should configure it to auto, as server has a good number of resources we can use.

[root@xxxx ~]# cat /proc/cpuinfo | grep processor | wc -l
48

do you guys agree?

Cheers,

Bianchi

Comment by Johan Wikman [ 2017-05-09 ]

Be careful with auto. Up until version 2.0 there's quite a lot of internal locking going on, so with too many threads the performance will drop.

So, it's safer to add threads one by one and see how the behavior changes. In 2.1 the use of auto is becoming feasible and in 2.2 it's probably going to be the default.

Comment by Johan Wikman [ 2017-09-21 ]

michaeldg wagnerbianchi
Guys, is this relevant still?

Comment by Johan Wikman [ 2017-09-27 ]

As the associated support case has been closed, I'll close this one as well.
If not ok, please reopen.

Generated at Thu Feb 08 04:05:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.