Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-956

Maxscale crash: Removing DCB 0x7fbf94016760 but was in state DCB_STATE_DISCONNECTED which is not legal for a call to dcb_close

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 2.0.1, 2.0.2
    • 2.0.3
    • readwritesplit
    • 2016-21, 2016-22, 2016-23

    Description

      Maxscale crashed with the following error messages in the log:

      2016-10-25 05:52:31   error  : 140461134051072 [dcb_close] Error : Removing DCB 0x7fbf94016760 but was in state DCB_STATE_DISCONNECTED which is not legal for a call to dcb_close. 
      2016-10-25 05:52:31   error  : 140461125658368 [dcb_close] Error : Removing DCB 0x7fbf94012ca0 but was in state DCB_STATE_DISCONNECTED which is not legal for a call to dcb_close.
      

      Core dump:

      2016-10-31 21:13:48   error  : 140245437769472 [dcb_close] Error : Removing DCB 0x13627f0 but was in state DCB_STATE_DISCONNECTED which is not legal for a call to dcb_close. 
      2016-10-31 21:13:48   error  : Fatal: MaxScale 2.0.1 received fatal signal 6. Attempting backtrace.
      2016-10-31 21:13:48   error  : Commit ID: fa2a66719554d13a00db5c81c5c9ffd5b3a2ce14 System name: Linux Release string: Ubuntu 16.04.1 LTS
      2016-10-31 21:13:48   error  :   /usr/bin/maxscale() [0x403ca7] 
      2016-10-31 21:13:48   error  :   /lib/x86_64-linux-gnu/libpthread.so.0(+0x113e0) [0x7f8d7ce183e0] 
      2016-10-31 21:13:48   error  :   /lib/x86_64-linux-gnu/libpthread.so.0(raise+0x29) [0x7f8d7ce182b9] 
      2016-10-31 21:13:48   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(dcb_close+0x108) [0x7f8d7d2c2060] 
      2016-10-31 21:13:48   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so(+0x32e7) [0x7f8d75e472e7] 
      2016-10-31 21:13:48   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libMySQLClient.so(+0x36b4) [0x7f8d74f6a6b4] 
      2016-10-31 21:13:48   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x32ce0) [0x7f8d7d2bfce0] 
      2016-10-31 21:13:48   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(+0x32951) [0x7f8d7d2bf951] 
      2016-10-31 21:13:48   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(dcb_process_zombies+0x224) [0x7f8d7d2bf75a] 
      2016-10-31 21:13:48   error  :   /usr/lib/x86_64-linux-gnu/maxscale/libmaxscale-common.so.1.0.0(poll_waitevents+0x73e) [0x7f8d7d2d6484] 
      2016-10-31 21:13:48   error  :   /usr/bin/maxscale(worker_thread_main+0x2a) [0x404d8f] 
      2016-10-31 21:13:48   error  :   /lib/x86_64-linux-gnu/libpthread.so.0(+0x770a) [0x7f8d7ce0e70a] 
      2016-10-31 21:13:48   error  :   /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8d7c70082d] 
      

      [maxscale]
      threads=4
       
      [Splitter Service]
      type=service
      router=readwritesplit
      servers=galera1-alb, galera2-alb, galera3-alb
      router_options=master_failure_mode=error_on_write
      user=maxscale-alb
      passwd=*snip*
       
      [Splitter Listener]
      type=listener
      service=Splitter Service
      protocol=MySQLClient
      port=3306
      address=0.0.0.0
       
      [galera1-alb]
      type=server
      address=10.14.0.6
      port=3306
      protocol=MySQLBackend
      priority=1
       
      [galera2-alb]
      type=server
      address=10.14.0.7
      port=3306
      protocol=MySQLBackend
      priority=2
       
      [galera3-alb]
      type=server
      address=10.14.0.8
      port=3306
      protocol=MySQLBackend
      priority=3
       
      [Galera Monitor]
      type=monitor
      module=galeramon
      #disable_master_role_setting=true
      monitor_interval=500
      servers=galera1-alb, galera2-alb, galera3-alb
      user=maxscale-alb
      passwd=*snip*
       
      [CLI]
      type=service
      router=cli
       
      [CLI Listener]
      type=listener
      service=CLI
      protocol=maxscaled
      address=localhost
      port=6603
      

      root@maxscale1:~# maxadmin -pmariadb show sessions
      Session 1 (0x719310)
      	State:               Listener Session
      	Service:             Splitter Service (0x7056c0)
      	Client DCB:          0x7186b0
      	Connected:           Tue Oct 25 07:55:33 2016
      Session 2 (0x721000)
      	State:               Listener Session
      	Service:             CLI (0x702910)
      	Client DCB:          0x7193b0
      	Connected:           Tue Oct 25 07:55:33 2016
      Session 25 (0x7f7b58012140)
      	State:               Session ready for routing
      	Service:             Splitter Service (0x7056c0)
      	Client DCB:          0x7f7b60015d10
      	Client Address:      powerdns_alb@10.14.0.110
      	Connected:           Tue Oct 25 07:55:52 2016
      	Idle:                34 seconds
      Session 26 (0x7f7b600149f0)
      	State:               Session ready for routing
      	Service:             Splitter Service (0x7056c0)
      	Client DCB:          0x7f7b60014310
      	Client Address:      powerdns_alb@10.14.0.110
      	Connected:           Tue Oct 25 07:55:52 2016
      	Idle:                33 seconds
      Session 540 (0x7f7b58013450)
      	State:               Session ready for routing
      	Service:             CLI (0x702910)
      	Client DCB:          0x7f7b58012360
      	Client Address:      127.0.0.1
      	Connected:           Tue Oct 25 08:41:16 2016
      	Idle:                0 seconds
      Session 23 (0x7f7b60015f50)
      	State:               Session ready for routing
      	Service:             Splitter Service (0x7056c0)
      	Client DCB:          0x7f7b600157e0
      	Client Address:      powerdns_alb@10.14.0.110
      	Connected:           Tue Oct 25 07:55:52 2016
      	Idle:                2721 seconds
      Session 24 (0x7f7b600141a0)
      	State:               Session ready for routing
      	Service:             Splitter Service (0x7056c0)
      	Client DCB:          0x7f7b60016170
      	Client Address:      powerdns_alb@10.14.0.110
      	Connected:           Tue Oct 25 07:55:52 2016
      	Idle:                4 seconds
      

      root@maxscale1:~# maxadmin -pmariadb show servers
      Server 0x705120 (galera1-alb)
      	Server:                              10.14.0.6
      	Status:                              Master, Synced, Running
      	Protocol:                            MySQLBackend
      	Port:                                3306
      	Server Version:                      10.1.18-MariaDB-1~xenial
      	Node Id:                             0
      	Master Id:                           -1
      	Slave Ids:                           
      	Repl Depth:                          0
      	Server Parameters:
      	                                       priority	1
      	Number of connections:               531
      	Current no. of conns:                3
      	Current no. of operations:           0
      Server 0x704be0 (galera2-alb)
      	Server:                              10.14.0.7
      	Status:                              Slave, Synced, Running
      	Protocol:                            MySQLBackend
      	Port:                                3306
      	Server Version:                      10.1.18-MariaDB-1~xenial
      	Node Id:                             1
      	Master Id:                           -1
      	Slave Ids:                           
      	Repl Depth:                          0
      	Server Parameters:
      	                                       priority	2
      	Number of connections:               545
      	Current no. of conns:                3
      	Current no. of operations:           0
      Server 0x704700 (galera3-alb)
      	Server:                              10.14.0.8
      	Status:                              Slave, Synced, Running
      	Protocol:                            MySQLBackend
      	Port:                                3306
      	Server Version:                      10.1.18-MariaDB-1~xenial
      	Node Id:                             2
      	Master Id:                           -1
      	Slave Ids:                           
      	Repl Depth:                          0
      	Server Parameters:
      	                                       priority	3
      	Number of connections:               547
      	Current no. of conns:                4
      	Current no. of operations:           0
        
      

      Attachments

        Issue Links

          Activity

            markus makela markus makela added a comment -

            With the fix mentioned earlier, the crash is prevented and this is OK for 2.0.3.

            markus makela markus makela added a comment - With the fix mentioned earlier, the crash is prevented and this is OK for 2.0.3.
            markus makela markus makela added a comment -

            We've fixed a minor bug with the error_on_write mode and some of the error handling. The code is also a bit simpler and all connections and the related backend references are closed at the same time.

            The latest packages for the 2.0.3 release candidate can be found here: http://max-tst-01.mariadb.com/ci-repository/2.0-release-dec12/mariadb-maxscale/

            The packages were built from commit 15a8675fca53da3417b8c0155e43d91e1173f208.

            markus makela markus makela added a comment - We've fixed a minor bug with the error_on_write mode and some of the error handling. The code is also a bit simpler and all connections and the related backend references are closed at the same time. The latest packages for the 2.0.3 release candidate can be found here: http://max-tst-01.mariadb.com/ci-repository/2.0-release-dec12/mariadb-maxscale/ The packages were built from commit 15a8675fca53da3417b8c0155e43d91e1173f208 .
            markus makela markus makela added a comment -

            A discussion on IRC pointed out that the temporary fix does indeed prevent the crash. It again pointed to some strange behavior in the error handling but it also uncovered a bug that could've allowed masters with inconsistent state to be used. Simplifying the connection closing logic in the error handler should give us a better guarantee that the backend server references and the actual connections stay in sync.

            markus makela markus makela added a comment - A discussion on IRC pointed out that the temporary fix does indeed prevent the crash. It again pointed to some strange behavior in the error handling but it also uncovered a bug that could've allowed masters with inconsistent state to be used. Simplifying the connection closing logic in the error handler should give us a better guarantee that the backend server references and the actual connections stay in sync.
            markus makela markus makela added a comment -

            Based on a chat on IRC, MaxScale still crashes. I've added some extra logging about whether the DCB which is being closed has a corresponding backend server reference. I've also added a somewhat of a temporary fix which doesn't close the DCB in closeSession if it isn't in a valid state.

            If possible, please test with this new package: http://max-tst-01.mariadb.com/ci-repository/2.0-markusjm-dec1/mariadb-maxscale/

            The packages are build from commit 1272cccf537aad8b824e1df978e0454e5b3b6c40.

            markus makela markus makela added a comment - Based on a chat on IRC, MaxScale still crashes. I've added some extra logging about whether the DCB which is being closed has a corresponding backend server reference. I've also added a somewhat of a temporary fix which doesn't close the DCB in closeSession if it isn't in a valid state. If possible, please test with this new package: http://max-tst-01.mariadb.com/ci-repository/2.0-markusjm-dec1/mariadb-maxscale/ The packages are build from commit 1272cccf537aad8b824e1df978e0454e5b3b6c40 .
            markus makela markus makela added a comment -

            The log messages would suggest that a DCB not related to the session (already closed) gets processed which triggers the reconnection logic. The reconnection seems to be the thing which causes these errors.

            I've built a new package from commit ee3c42cff781bec3bbbf1898f5b248aaee92fefa with more detailed error messages about where the DCB was closed and where the attempt is being made. It also adds extra checks before the DCB is closed and warns if an reconnection occurs when it shouldn't happen.

            The packages can be found here: http://max-tst-01.mariadb.com/ci-repository/2.0-markusjm-nov29/mariadb-maxscale/

            markus makela markus makela added a comment - The log messages would suggest that a DCB not related to the session (already closed) gets processed which triggers the reconnection logic. The reconnection seems to be the thing which causes these errors. I've built a new package from commit ee3c42cff781bec3bbbf1898f5b248aaee92fefa with more detailed error messages about where the DCB was closed and where the attempt is being made. It also adds extra checks before the DCB is closed and warns if an reconnection occurs when it shouldn't happen. The packages can be found here: http://max-tst-01.mariadb.com/ci-repository/2.0-markusjm-nov29/mariadb-maxscale/

            People

              markus makela markus makela
              mcremers Marlin Cremers
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.