[MXS-2016] Alternate Maxscale Instance hung during failover , CPU usage 500% Created: 2018-08-16  Updated: 2019-01-28  Resolved: 2019-01-28

Status: Closed
Project: MariaDB MaxScale
Component/s: failover
Affects Version/s: 2.2.9
Fix Version/s: 2.2.12

Type: Bug Priority: Major
Reporter: Pramod Mahto Assignee: Johan Wikman
Resolution: Fixed Votes: 1
Labels: None

Sprint: MXS-SPRINT-65, MXS-SPRINT-66

 Description   

Maxscale instance hang during failover , having multiple instance running on different node with
passive=0 and passive=1 value .

Maxscale Instance hang was configured with passive=1.

Stacktrace from the Hang Maxscale Instance:-

 
[Thread debugging using libthread_db enabled]
0x00007f06e60e1334 in __lll_lock_wait () from /lib64/libpthread.so.0
 
Thread 9 (Thread 0x7f06e4a68700 (LWP 6038)):
#0  0x00007f06e65c3a6e in thr_filewriter_fun (data=0x927e00) at /home/ec2-user/MaxScale/server/core/log_manager.cc:2353
#1  0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#2  0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 8 (Thread 0x7f06e2748700 (LWP 6039)):
#0  0x00007f06e60dd8a9 in pthread_mutex_unlock () from /lib64/libpthread.so.0
#1  0x00007f06e65e6d0a in simple_mutex_unlock (sm=0x927928) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:496
#2  0x00007f06e65c2cbf in blockbuf_get_writepos (priority=5, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1019
#3  logmanager_write_log (priority=5, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#4  log_write (priority=5, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#5  mxs_log_message (priority=5, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#6  0x00007f06e65c93ff in mon_log_state_change (ptr=0x93d560) at /home/ec2-user/MaxScale/server/core/monitor.cc:1520
#7  0x00007f06e65cc89d in mon_process_state_changes (monitor=0x93c110, script=0x0, events=4194303) at /home/ec2-user/MaxScale/server/core/monitor.cc:1765
#8  0x00007f06e361d453 in monitorMain (arg=0x93e300) at /home/ec2-user/MaxScale/server/modules/monitor/mariadbmon/mariadbmon.cc:2510
#9  0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 7 (Thread 0x7f06e1d47700 (LWP 6040)):
#0  0x00007f06e60e200d in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000004061bf in log_flush_cb (arg=<value optimized out>) at /home/ec2-user/MaxScale/server/core/gateway.cc:2388
#2  0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 6 (Thread 0x7f06e1346700 (LWP 6041)):
#0  0x00007f06e60e200d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f06e65e7ffe in thread_millisleep (ms=<value optimized out>) at /home/ec2-user/MaxScale/server/core/thread.cc:70
#2  0x00007f06e65ba275 in hkthread (data=<value optimized out>) at /home/ec2-user/MaxScale/server/core/housekeeper.cc:242
#3  0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 5 (Thread 0x7f06d3fff700 (LWP 6042)):
#0  0x00007f06e60e14c5 in __lll_unlock_wake () from /lib64/libpthread.so.0
#1  0x00007f06e60dd947 in _L_unlock_657 () from /lib64/libpthread.so.0
#2  0x00007f06e60dd8af in pthread_mutex_unlock () from /lib64/libpthread.so.0
#3  0x00007f06e65e6d0a in simple_mutex_unlock (sm=0x927928) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:496
#4  0x00007f06e65c2cbf in blockbuf_get_writepos (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1019
#5  logmanager_write_log (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#6  log_write (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#7  mxs_log_message (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#8  0x00007f06e3a48734 in select_connect_backend_servers (router_nservers=<value optimized out>, max_nslaves=2, session=0x7f06ccf17470, config=<value optimized out>, backends=std::list = {...}, current_master=std::tr1::shared_ptr (empty) 0x0, sescmd_list=0x0, expected_responses=0x0, type=ALL) at /home/ec2-user/MaxScale/server/modules/routing/readwritesplit/rwsplit_select_backends.cc:306
#9  0x00007f06e3a41f31 in RWSplitSession::create (router=0x9589a0, session=0x7f06ccf17470) at /home/ec2-user/MaxScale/server/modules/routing/readwritesplit/readwritesplit.cc:797
#10 0x00007f06e3a42149 in newSession (router_inst=<value optimized out>, session=<value optimized out>) at /home/ec2-user/MaxScale/server/modules/routing/readwritesplit/readwritesplit.cc:869
#11 0x00007f06e65e421e in session_alloc_body (service=0x92d0f0, client_dcb=0x7f06cdd059c0, session=0x7f06ccf17470) at /home/ec2-user/MaxScale/server/core/session.cc:167
#12 0x00007f06e29f521a in gw_read_do_authentication (dcb=0x7f06cdd059c0) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:742
#13 gw_read_client_event (dcb=0x7f06cdd059c0) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:503
#14 0x00007f06e65b51bc in dcb_process_poll_events (dcb=0x7f06cdd059c0, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3098
#15 0x00007f06e65b5388 in dcb_handler (dcb=0x7f06cdd059c0, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3183
#16 0x00007f06e65edb6b in maxscale::Worker::poll_waitevents (this=0x933c10) at /home/ec2-user/MaxScale/server/core/worker.cc:1205
#17 0x00007f06e65ede41 in maxscale::Worker::run (this=0x933c10) at /home/ec2-user/MaxScale/server/core/worker.cc:885
#18 0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 4 (Thread 0x7f06d35fe700 (LWP 6043)):
#0  0x00007f06e60e1334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f06e60dc5d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f06e60dc4a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f06e65e718f in simple_mutex_lock (sm=0x927928, block=<value optimized out>) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:463
#4  0x00007f06e65c2d1a in blockbuf_get_writepos (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1044
#5  logmanager_write_log (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#6  log_write (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#7  mxs_log_message (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#8  0x00007f06e3a48734 in select_connect_backend_servers (router_nservers=<value optimized out>, max_nslaves=2, session=0x7f06c4ddb920, config=<value optimized out>, backends=std::list = {...}, current_master=std::tr1::shared_ptr (empty) 0x0, sescmd_list=0x0, expected_responses=0x0, type=ALL) at /home/ec2-user/MaxScale/server/modules/routing/readwritesplit/rwsplit_select_backends.cc:306
#9  0x00007f06e3a41f31 in RWSplitSession::create (router=0x9589a0, session=0x7f06c4ddb920) at /home/ec2-user/MaxScale/server/modules/routing/readwritesplit/readwritesplit.cc:797
#10 0x00007f06e3a42149 in newSession (router_inst=<value optimized out>, session=<value optimized out>) at /home/ec2-user/MaxScale/server/modules/routing/readwritesplit/readwritesplit.cc:869
#11 0x00007f06e65e421e in session_alloc_body (service=0x92d0f0, client_dcb=0x7f06c55a5450, session=0x7f06c4ddb920) at /home/ec2-user/MaxScale/server/core/session.cc:167
#12 0x00007f06e29f521a in gw_read_do_authentication (dcb=0x7f06c55a5450) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:742
#13 gw_read_client_event (dcb=0x7f06c55a5450) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:503
#14 0x00007f06e65b51bc in dcb_process_poll_events (dcb=0x7f06c55a5450, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3098
#15 0x00007f06e65b5388 in dcb_handler (dcb=0x7f06c55a5450, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3183
#16 0x00007f06e65edb6b in maxscale::Worker::poll_waitevents (this=0x92a770) at /home/ec2-user/MaxScale/server/core/worker.cc:1205
#17 0x00007f06e65ede41 in maxscale::Worker::run (this=0x92a770) at /home/ec2-user/MaxScale/server/core/worker.cc:885
#18 0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 3 (Thread 0x7f06d2bfd700 (LWP 6044)):
#0  0x00007f06e60e1334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f06e60dc5d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f06e60dc4a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f06e65e718f in simple_mutex_lock (sm=0x7f06c93e4358, block=<value optimized out>) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:463
#4  0x00007f06e65c2cd5 in blockbuf_get_writepos (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1025
#5  logmanager_write_log (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#6  log_write (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#7  mxs_log_message (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#8  0x00007f06e65e45ec in session_alloc_body (service=0x92d0f0, client_dcb=0x7f06c8d91d90, session=0x7f06c915bde0) at /home/ec2-user/MaxScale/server/core/session.cc:172
#9  0x00007f06e29f521a in gw_read_do_authentication (dcb=0x7f06c8d91d90) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:742
#10 gw_read_client_event (dcb=0x7f06c8d91d90) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:503
#11 0x00007f06e65b51bc in dcb_process_poll_events (dcb=0x7f06c8d91d90, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3098
#12 0x00007f06e65b5388 in dcb_handler (dcb=0x7f06c8d91d90, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3183
#13 0x00007f06e65edb6b in maxscale::Worker::poll_waitevents (this=0x92e210) at /home/ec2-user/MaxScale/server/core/worker.cc:1205
#14 0x00007f06e65ede41 in maxscale::Worker::run (this=0x92e210) at /home/ec2-user/MaxScale/server/core/worker.cc:885
#15 0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 2 (Thread 0x7f06d21fc700 (LWP 6045)):
#0  0x00007f06e517d1b3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f06e669b181 in MHD_epoll (daemon=0x990d60, may_block=<value optimized out>) at daemon.c:4267
#2  0x00007f06e669bf15 in MHD_select_thread (cls=0x990d60) at daemon.c:4544
#3  0x00007f06e60daaa1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f06e517cbbd in clone () from /lib64/libc.so.6
 
Thread 1 (Thread 0x7f06e6b08c00 (LWP 6037)):
#0  0x00007f06e60e1334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f06e60dc5d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f06e60dc4a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f06e65e718f in simple_mutex_lock (sm=0x7f06cd294768, block=<value optimized out>) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:463
#4  0x00007f06e65c2cd5 in blockbuf_get_writepos (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1025
#5  logmanager_write_log (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#6  log_write (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#7  mxs_log_message (priority=3, modname=<value optimized out>, file=<value optimized out>, line=<value optimized out>, function=<value optimized out>, format=<value optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#8  0x00007f06e65e45ec in session_alloc_body (service=0x92d0f0, client_dcb=0x14206a0, session=0x14df240) at /home/ec2-user/MaxScale/server/core/session.cc:172
#9  0x00007f06e29f521a in gw_read_do_authentication (dcb=0x14206a0) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:742
#10 gw_read_client_event (dcb=0x14206a0) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:503
#11 0x00007f06e65b51bc in dcb_process_poll_events (dcb=0x14206a0, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3098
#12 0x00007f06e65b5388 in dcb_handler (dcb=0x14206a0, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3183
#13 0x00007f06e65edb6b in maxscale::Worker::poll_waitevents (this=0x938160) at /home/ec2-user/MaxScale/server/core/worker.cc:1205
#14 0x00007f06e65ede41 in maxscale::Worker::run (this=0x938160) at /home/ec2-user/MaxScale/server/core/worker.cc:885
#15 0x0000000000408a72 in main (argc=<value optimized out>, argv=<value optimized out>) at /home/ec2-user/MaxScale/server/core/gateway.cc:2251



 Comments   
Comment by Johan Wikman [ 2018-08-16 ]

Threads 5 and 8 appear to unlock the same mutex, which would indicate that they both immediately before that were of the opinion that they owned it. That should be impossible.

Comment by Dipti Joshi (Inactive) [ 2018-10-11 ]

johan.wikman Is this duplicate of https://jira.mariadb.org/browse/MXS-1953?

Comment by Johan Wikman [ 2018-10-12 ]

No, this is not related to that.

Comment by Richard Stracke [ 2018-11-17 ]

Same issue.
Also Thread 4,5 and 8 have the same mutex

 
 
sm=0x7f7e08009da8
 
Thread 8 (Thread 0x7f7e0c993700 (LWP 100823)):
#0  0x00007f7e1364a42d in __lll_lock_wait () from /usr/lib64/libpthread.so.0
#1  0x00007f7e13645dcb in _L_lock_812 () from /usr/lib64/libpthread.so.0
#2  0x00007f7e13645c98 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#3  0x00007f7e13b584f7 in simple_mutex_lock (sm=0x7f7e08009da8, block=<optimized out>) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:463
#4  0x00007f7e13b2fe5d in blockbuf_get_writepos (flush=false, str_len=129, p_bb=<synthetic pointer>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1025
#5  logmanager_write_log (str=0x7f7e0c991048 "notice : Server changed state: node2[10.56.198.20:6603]: new_master. [Slave, Running] -> [Master, Running]", str_len=<optimized out>, prefix_len=9, flush=LOG_FLUSH_NO, priority=5) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#6  log_write (file=<optimized out>, line=<optimized out>, function=<optimized out>, flush=LOG_FLUSH_NO, str=0x7f7e0c991048 "notice : Server changed state: node2[10.56.198.20:6603]: new_master. [Slave, Running] -> [Master, Running]", len=<optimized out>, prefix_len=9, priority=5) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#7  mxs_log_message (priority=5, modname=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>, format=<optimized out>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#8  0x00007f7e13b35fe1 in mon_log_state_change (ptr=ptr@entry=0x118aa00) at /home/ec2-user/MaxScale/server/core/monitor.cc:1520
#9  0x00007f7e13b3906b in mon_process_state_changes (monitor=0x1189490, script=0x0, events=4194303) at /home/ec2-user/MaxScale/server/core/monitor.cc:1765
#10 0x00007f7e0d86cc91 in monitorMain (arg=<optimized out>) at /home/ec2-user/MaxScale/server/modules/monitor/mariadbmon/mariadbmon.cc:2510
#11 0x00007f7e13643e25 in start_thread () from /usr/lib64/libpthread.so.0
#12 0x00007f7e1162634d in clone () from /usr/lib64/libc.so.6
 
 
Thread 5 (Thread 0x7f7e06ffd700 (LWP 100826)):
#0  0x00007f7e1364a42d in __lll_lock_wait () from /usr/lib64/libpthread.so.0
#1  0x00007f7e13645dcb in _L_lock_812 () from /usr/lib64/libpthread.so.0
#2  0x00007f7e13645c98 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#3  0x00007f7e13b584f7 in simple_mutex_lock (sm=0x7f7e08009da8, block=<optimized out>) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:463
#4  0x00007f7e13b2fe5d in blockbuf_get_writepos (flush=true, str_len=219, p_bb=<synthetic pointer>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1025
#5  logmanager_write_log (str=0x7f7e06ff9508 "error  : [readconnroute] Failed to create new routing session. Couldn't find eligible candidate server. Freeing allocated resources. (subsequent similar messages suppressed for 10000 milliseconds)", str_len=<optimized out>, prefix_len=9, flush=LOG_FLUSH_YES, priority=3) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#6  log_write (file=0x7f7e0dea5d88 "/home/ec2-user/MaxScale/server/modules/routing/readconnroute/readconnroute.c", line=233463176, function=0x7f7e0dea5f7f <__func__.24786> "newSession", flush=LOG_FLUSH_YES, str=0x7f7e06ff9508 "error  : [readconnroute] Failed to create new routing session. Couldn't find eligible candidate server. Freeing allocated resources. (subsequent similar messages suppressed for 10000 milliseconds)", len=<optimized out>, prefix_len=9, priority=3) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#7  mxs_log_message (priority=priority@entry=3, modname=modname@entry=0x7f7e0dea5bcd "readconnroute", file=file@entry=0x7f7e0dea5d88 "/home/ec2-user/MaxScale/server/modules/routing/readconnroute/readconnroute.c", line=line@entry=416, function=function@entry=0x7f7e0dea5f7f <__func__.24786> "newSession", format=format@entry=0x7f7e0dea5d18 "Failed to create new routing session. Couldn't find eligible candidate server. Freeing allocated resources.") at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#8  0x00007f7e0dea54d0 in newSession (instance=0x118cbb0, session=0x7f7dfc2657f0) at /home/ec2-user/MaxScale/server/modules/routing/readconnroute/readconnroute.c:415
#9  0x00007f7e13b55111 in session_alloc_body (service=0x117d0a0, client_dcb=0x7f7dfc2831e0, session=0x7f7dfc2657f0) at /home/ec2-user/MaxScale/server/core/session.cc:167
#10 0x00007f7e0cc3ea83 in gw_read_do_authentication (nbytes_read=<optimized out>, read_buffer=0x7f7dfc1c7330, dcb=0x7f7dfc2831e0) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:742
#11 gw_read_client_event (dcb=0x7f7dfc2831e0) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:503
#12 0x00007f7e13b227aa in dcb_process_poll_events (dcb=dcb@entry=0x7f7dfc2831e0, events=events@entry=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3098
#13 0x00007f7e13b22948 in dcb_handler (dcb=0x7f7dfc2831e0, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3183
#14 0x00007f7e13b5f0d1 in maxscale::Worker::poll_waitevents (this=this@entry=0x1189930) at /home/ec2-user/MaxScale/server/core/worker.cc:1205
#15 0x00007f7e13b5f342 in maxscale::Worker::run (this=0x1189930) at /home/ec2-user/MaxScale/server/core/worker.cc:885
#16 0x00007f7e13643e25 in start_thread () from /usr/lib64/libpthread.so.0
#17 0x00007f7e1162634d in clone () from /usr/lib64/libc.so.6
 
Thread 4 (Thread 0x7f7e067fc700 (LWP 100827)):
#0  0x00007f7e1364a42d in __lll_lock_wait () from /usr/lib64/libpthread.so.0
#1  0x00007f7e13645dcb in _L_lock_812 () from /usr/lib64/libpthread.so.0
#2  0x00007f7e13645c98 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#3  0x00007f7e13b584f7 in simple_mutex_lock (sm=0x7f7e08009da8, block=<optimized out>) at /home/ec2-user/MaxScale/server/core/skygw_utils.cc:463
#4  0x00007f7e13b2fe5d in blockbuf_get_writepos (flush=true, str_len=155, p_bb=<synthetic pointer>) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1025
#5  logmanager_write_log (str=0x7f7e067f8548 "error  : [readconnroute] Failed to create new routing session. Couldn't find eligible candidate server. Freeing allocated resources.", str_len=<optimized out>, prefix_len=9, flush=LOG_FLUSH_YES, priority=3) at /home/ec2-user/MaxScale/server/core/log_manager.cc:838
#6  log_write (file=0x7f7e0dea5d88 "/home/ec2-user/MaxScale/server/modules/routing/readconnroute/readconnroute.c", line=233463176, function=0x7f7e0dea5f7f <__func__.24786> "newSession", flush=LOG_FLUSH_YES, str=0x7f7e067f8548 "error  : [readconnroute] Failed to create new routing session. Couldn't find eligible candidate server. Freeing allocated resources.", len=<optimized out>, prefix_len=9, priority=3) at /home/ec2-user/MaxScale/server/core/log_manager.cc:1309
#7  mxs_log_message (priority=priority@entry=3, modname=modname@entry=0x7f7e0dea5bcd "readconnroute", file=file@entry=0x7f7e0dea5d88 "/home/ec2-user/MaxScale/server/modules/routing/readconnroute/readconnroute.c", line=line@entry=416, function=function@entry=0x7f7e0dea5f7f <__func__.24786> "newSession", format=format@entry=0x7f7e0dea5d18 "Failed to create new routing session. Couldn't find eligible candidate server. Freeing allocated resources.") at /home/ec2-user/MaxScale/server/core/log_manager.cc:3039
#8  0x00007f7e0dea54d0 in newSession (instance=0x118cbb0, session=0x7f7df04b2860) at /home/ec2-user/MaxScale/server/modules/routing/readconnroute/readconnroute.c:415
#9  0x00007f7e13b55111 in session_alloc_body (service=0x117d0a0, client_dcb=0x7f7df0519f30, session=0x7f7df04b2860) at /home/ec2-user/MaxScale/server/core/session.cc:167
#10 0x00007f7e0cc3ea83 in gw_read_do_authentication (nbytes_read=<optimized out>, read_buffer=0x7f7df0586b60, dcb=0x7f7df0519f30) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:742
#11 gw_read_client_event (dcb=0x7f7df0519f30) at /home/ec2-user/MaxScale/server/modules/protocol/MySQL/mariadbclient/mysql_client.cc:503
#12 0x00007f7e13b227aa in dcb_process_poll_events (dcb=dcb@entry=0x7f7df0519f30, events=events@entry=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3098
#13 0x00007f7e13b22948 in dcb_handler (dcb=0x7f7df0519f30, events=5) at /home/ec2-user/MaxScale/server/core/dcb.cc:3183
#14 0x00007f7e13b5f0d1 in maxscale::Worker::poll_waitevents (this=this@entry=0x1189c20) at /home/ec2-user/MaxScale/server/core/worker.cc:1205
#15 0x00007f7e13b5f342 in maxscale::Worker::run (this=0x1189c20) at /home/ec2-user/MaxScale/server/core/worker.cc:885
#16 0x00007f7e13643e25 in start_thread () from /usr/lib64/libpthread.so.0
#17 0x00007f7e1162634d in clone () from /usr/lib64/libc.so.6

In the log the same message was written very often in the same time.

2018-11-17 03:13:41   error  : [readwritesplit] Couldn't find suitable Master from 3 candidates.
2018-11-17 03:13:41   error  : [readwritesplit] Couldn't find suitable Master from 3 candidates.
2018-11-17 03:13:41   error  : [readwritesplit] Couldn't find suitable Master from 3 candidates.
2018-11-17 03:13:41   error  : [readwritesplit] Couldn't find suitable Master from 3 candidates.
2018-11-17 03:13:41   error  : [readwritesplit] Couldn't find suitable Master from 3 candidates.
2018-11-17 03:13:41   error  : [readwritesplit] Couldn't find suitable Master from 3 candidates.
2018-11-17 03:13:41   error  : [readwritesplit] Couldn't find suitable Master from 3 candidates. (subsequent similar messages suppressed for 10000 milliseconds)

The CPU 400% issue happens during failover

Comment by Johan Wikman [ 2018-11-26 ]

This is likely to have been fixed in 2.2.12 by commit https://github.com/mariadb-corporation/MaxScale/commit/33fa9b26fe080e0be1e4367784ba3382b6308c7f

Generated at Thu Feb 08 04:11:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.