[MXS-2567] CDC module can wind up in an infinite loop Created: 2019-06-19  Updated: 2019-10-04  Resolved: 2019-10-04

Status: Closed
Project: MariaDB MaxScale
Component/s: cdc
Affects Version/s: 2.3.4, 2.3.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jonas Seiler Assignee: markus makela
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

docker image mariadb/maxscale


Sprint: MXS-SPRINT-91

 Description   

Hello,

i want to stream a mysql dump to kafka. I use a similar setup as in https://mariadb.com/resources/blog/real-time-data-streaming-to-kafka-with-maxscale-cdc/. It does work but maxscale CPU usage stays high (100% cpu usage) even when no event is streamed .
The workflow is as follows.

  • start maxscale and mariadb docker containers
  • setup users, right etc ...
  • start maxscale as read slave
  • add a db dump to mariadb
  • start the cdc script
    • this is when the infinite loop happens

The problem seems to be tied to the dump itself. It doesn't happen for all dumbs.
It appears that the CDC component gets stuck in an infinite loop.
Here the maxscale log:

2019-06-19 09:47:23   notice : (2) avro-service: new connection from [::ffff:172.28.0.4]
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing.
2019-06-19 09:47:23   warning: (8) Events have already been injected to current DCB, discarding existing. (subsequent similar messages suppressed for 10000 milliseconds)

Following the warning let me to the following peace of code that suggest it might be an infinite loop:

MaxScale/server/core/dcb.cc

 3225 static uint32_t dcb_handler(DCB* dcb, uint32_t events)
 3226 {
 3227     this_thread.current_dcb = dcb;
 3228     uint32_t rv = dcb_process_poll_events(dcb, events);
 3229 
 3230     // When all I/O events have been handled, we will immediately
 3231     // process an added fake event. As the handling of a fake event
 3232     // may lead to the addition of another fake event we loop until
 3233     // there is no fake event or the dcb has been closed.
 3234 
 3235     while ((dcb->n_close == 0) && (dcb->fake_event != 0))
 3236     {
 3237         events = dcb->fake_event;
 3238         dcb->fake_event = 0;
 3239 
 3240         rv |= dcb_process_poll_events(dcb, events);
 3241     }
 3242 
 3243     this_thread.current_dcb = NULL;
 3244 
 3245     return rv;
 3246 }

Due to data privacy reasons I cannot provide the dump though.
May be the replication diagnostics is helpful:

	Service:                             replication-service
	Router:                              binlogrouter
	State:                               Started
	Master connection DCB:               0x5613cf39b4b0
	Master connection state:                     Binlog Dump
	Binlog directory:                            /var/lib/maxscale
	Heartbeat period (seconds):                  300
	Number of master connects:                   1
	Number of delayed reconnects:                0
	Number of connect retries:                   0
	Connect retry interval:                      60
	Connect retry count limit:                   1000
	Current binlog file:                         binlog.000003
	Current binlog position:                     7351830
	Number of slave servers:                     0
	No. of binlog events received this session:  36943
	Total no. of binlog events received:         36943
	No. of bad CRC received from master:         0
	Number of binlog events per minute
	Current        5        10       15       30 Min Avg
	      0       0.0      0.0      0.0      0.0
	Number of fake binlog events:                0
	Number of artificial binlog events:          3
	Number of binlog events in error:            0
	Number of binlog rotate events:              2
	Number of heartbeat events:                  0
	Number of packets received:                  168
	Number of residual data packets:             0
	Average events per packet:                   219.9
	Last event from master at:                   Wed Jun 19 09:40:15 2019 (16 seconds ago)
	Last event from master:                      0x2, Query Event
	Last binlog event timestamp:                 1560937215 (Wed Jun 19 09:40:15 2019)
	Events received:
		Invalid                                  0
		Start Event V3                           0
		Query Event                              7392
		Stop Event                               2
		Rotate Event                             3
		Integer Session Variable                 0
		Load Event                               0
		Slave Event                              0
		Create File Event                        0
		Append Block Event                       0
		Exec Load Event                          0
		Delete File Event                        0
		New Load Event                           0
		Rand Event                               0
		User Variable Event                      0
		Format Description Event                 3
		Transaction ID Event (2 Phase Commit)    38
		Begin Load Query Event                   0
		Execute Load Query Event                 0
		Table Map Event                          7221
		Write Rows Event (v0)                    0
		Update Rows Event (v0)                   0
		Delete Rows Event (v0)                   0
		Write Rows Event (v1)                    7627
		Update Rows Event (v1)                   0
		Delete Rows Event (v1)                   0
		Incident Event                           0
		Heartbeat Event                          0
		Ignorable Event                          0
		Rows Query Event                         0
		Write Rows Event (v2)                    0
		Update Rows Event (v2)                   0
		Delete Rows Event (v2)                   0
		GTID Event                               0
		Anonymous GTID Event                     0
		Previous GTIDS Event                     0
		MariaDB 10 Annotate Rows Event                      7221
		MariaDB 10 Binlog Checkpoint Event                  3
		MariaDB 10 GTID Event                               7430
		MariaDB 10 GTID List Event                          3
		MariaDB 10 Start Encryption Event                   0
	Started:                             Wed Jun 19 09:40:07 2019
	Root user access:                    Disabled
	Backend databases:
		[mariadb_test]:3306    Protocol: mariadbbackend    Name: binlog_router_master_host
	Total connections:                   3
	Currently connected:                 2



 Comments   
Comment by markus makela [ 2019-06-19 ]

Have you tried the latest 2.3 release?

Comment by Jonas Seiler [ 2019-06-19 ]

sure. still happens

Comment by markus makela [ 2019-07-03 ]

My initial thought was that this was fixed by 1efe3e0b2058f8c837e4a57c6589d6e10b068870 but that is already in 2.3.8. Perhaps this is a different problem.

Comment by markus makela [ 2019-09-27 ]

Haven't been able to reproduce this with 2.4. Can you try if you can reproduce it with 2.4?

Comment by markus makela [ 2019-10-04 ]

Closing as Cannot Reproduce as we haven't been able to reproduce this. If this still happens with 2.4, please let us know and we'll reopen this.

Generated at Thu Feb 08 04:15:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.