Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7026

Race in InnoDB/XtraDB mutex implementation can stall or hang the server

Details

    Description

      This failure is seen in Buildbot. It is a race that occurs somewhat rarely,
      but it can be consistently reproduced manually by running tests in a loop.

      The failure can occur in different tests, for example like this:

      http://buildbot.askmonty.org/buildbot/builders/bintar-trusty-p8-debug/builds/173

      worker[2] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 31323, winpid: 31323] to create a pid file.
      rpl.rpl_incident 'stmt'                  w2 [ fail ]
              Test ended at 2014-10-31 12:04:27
       
      CURRENT_TEST: rpl.rpl_incident
       
       
      Failed to start mysqld.2
      mysqltest failed but provided no output
      The result from queries just before the failure was:
      < snip >
      call mtr.add_suppression("Slave SQL.*The incident LOST_EVENTS occured on the master.* 1590");
      include/wait_for_slave_sql_error.inc [errno=1590]
      Last_SQL_Error = 'The incident LOST_EVENTS occured on the master. Message: <none>'
      **** On Slave ****
      SELECT * FROM t1;
      a
      1
      2
      3
      SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
      START SLAVE;
      SELECT * FROM t1;
      a
      1
      2
      3
      4
      include/check_slave_is_running.inc
      DROP TABLE t1;
      include/rpl_end.inc

      More failures are seen in a cross-reference search:

      http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=-p8&dt=&bbnum=&typ=&info=&fail_name=&fail_variant=&fail_info_short=&fail_info_full=Failed+to+start+mysqld&limit=100

      It looks like it only fails in debug builds on Power8, but this is not 100%
      confirmed.

      Looking into the error log, we consistently see that it hangs during creation
      of the Doublewrite buffer:

      InnoDB: Doublewrite buffer not found: creating new

      Normally there should be a message after this that the doublewrite buffer was
      created successfully, but when the error occurs, instead the server hangs at
      this point.

      Here is a GDB stacktrace from when it hangs:

      Thread 1 (Thread 0x3fff82fb5d90 (LWP 74233)):
      #0  0x00003fff82f22190 in __pthread_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30c8) at pthread_cond_wait.c:187
      #1  0x0000000010b4f0ac in safe_cond_wait (cond=0x10029bf3150, mp=0x10029bf30a0, file=0x10d1a638 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c", line=207)
          at /home/skysql/bzr/knielsen/5.5/mysys/thr_mutex.c:493
      #2  0x0000000010a7eae4 in os_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30a0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:207
      #3  0x0000000010a7f6d8 in os_event_wait_low (event=0x10029bf30a0, reset_sig_count=2) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:609
      #4  0x00000000108f544c in sync_array_wait_event (arr=0x100292584d0, index=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0arr.c:459
      #5  0x00000000108f9a20 in mutex_spin_wait (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0sync.c:636
      #6  0x00000000109e48d4 in mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:222
      #7  0x00000000109e4a18 in pfs_mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:251
      #8  0x00000000109ee980 in fil_mutex_enter_and_prepare_for_io (space_id=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:985
      #9  0x00000000109f97c8 in _fil_io (type=11, sync=0, space_id=0, zip_size=0, block_offset=157, byte_offset=0, len=16384, buf=0x3fff7b538000, message=0x3fff7af542c0, trx=0x0)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:5446
      #10 0x00000000109933fc in buf_flush_write_block_low (bpage=0x3fff7af542c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1192
      #11 0x0000000010993c64 in buf_flush_page (buf_pool=0x10029c07798, bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1400
      #12 0x0000000010994010 in buf_flush_try_neighbors (space=0, offset=157, flush_type=BUF_FLUSH_LIST, n_flushed=48, n_to_flush=18446744073709551614)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1541
      #13 0x00000000109944c8 in buf_flush_page_and_try_neighbors (bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x3fffd3831ed0)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1638
      #14 0x00000000109949dc in buf_flush_flush_list_batch (buf_pool=0x10029c07798, min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1768
      #15 0x0000000010994b48 in buf_flush_batch (buf_pool=0x10029c07798, flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1848
      #16 0x00000000109952a8 in buf_flush_list (min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:2065
      #17 0x0000000010a53a1c in log_preflush_pool_modified_pages (new_oldest=18446744073709551615, sync=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:1874
      #18 0x0000000010a548c8 in log_make_checkpoint_at (lsn=18446744073709551615, write_always=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:2313
      #19 0x000000001091f0e8 in trx_sys_create_doublewrite_buf () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/trx/trx0sys.c:407
      #20 0x00000000108f2fbc in innobase_start_or_create_for_mysql () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/srv/srv0start.c:2080
      #21 0x0000000010867b90 in innobase_init (p=0x100292581c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/handler/ha_innodb.cc:3225
      #22 0x000000001047364c in ha_initialize_handlerton (plugin=0x100291a3d68) at /home/skysql/bzr/knielsen/5.5/sql/handler.cc:472
      #23 0x0000000010255b68 in plugin_initialize (tmp_root=0x3fffd3833690, plugin=0x100291a3d68, argc=0x11126e70 <remaining_argc>, argv=0x10029128880, options_only=false)
          at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1372
      #24 0x0000000010256718 in plugin_init (argc=0x11126e70 <remaining_argc>, argv=0x10029128880, flags=0) at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1613
      #25 0x000000001015a080 in init_server_components () at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4337
      #26 0x000000001015b808 in mysqld_main (argc=100, argv=0x10029128880) at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4934
      #27 0x00000000101509d8 in main (argc=15, argv=0x3fffd3833f98) at /home/skysql/bzr/knielsen/5.5/sql/main.cc:25

      Here is how I reproduced it, running the test in a loop and waiting for it to
      hang:

      bzr branch lp:maria/5.5
      cd 5.5
      export PATH=/opt/at7.0/bin:$PATH EXTRA_FLAGS="-O2 -Wuninitialized -DFORCE_INIT_OF_VARS" EXTRA_CONFIGS="--with-embedded-privilege-control" AM_EXTRA_MAKEFLAGS="VERBOSE=1"
      cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON
      time make -j4 package
      (cd mysql-test && for i in `seq 40000` ; do echo XXX $i XXX ; time ./mtr --parallel=4 rpl.last_insert_id rpl.rpl_row_blob_innodb --verbose-restart || exit 1 ; done)

      Attachments

        Issue Links

          Activity

            knielsen Kristian Nielsen created issue -
            svoj Sergey Vojtovich made changes -
            Field Original Value New Value
            knielsen Kristian Nielsen made changes -
            Description This failure is seen in Buildbot. It is a race that occurs somewhat rarely,
            but it can be consistently reproduced manually by running tests in a loop.

            The failure can occur in different tests, for example like this:

            http://buildbot.askmonty.org/buildbot/builders/bintar-trusty-p8-debug/builds/173
            {noformat}
            worker[2] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 31323, winpid: 31323] to create a pid file.
            rpl.rpl_incident 'stmt' w2 [ fail ]
                    Test ended at 2014-10-31 12:04:27

            CURRENT_TEST: rpl.rpl_incident


            Failed to start mysqld.2
            mysqltest failed but provided no output
            The result from queries just before the failure was:
            < snip >
            call mtr.add_suppression("Slave SQL.*The incident LOST_EVENTS occured on the master.* 1590");
            include/wait_for_slave_sql_error.inc [errno=1590]
            Last_SQL_Error = 'The incident LOST_EVENTS occured on the master. Message: <none>'
            **** On Slave ****
            SELECT * FROM t1;
            a
            1
            2
            3
            SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
            START SLAVE;
            SELECT * FROM t1;
            a
            1
            2
            3
            4
            include/check_slave_is_running.inc
            DROP TABLE t1;
            include/rpl_end.inc
            {noformat}

            More failures are seen in a cross-reference search:

            http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=-p8&dt=&bbnum=&typ=&info=&fail_name=&fail_variant=&fail_info_short=&fail_info_full=Failed+to+start+mysqld&limit=100

            It looks like it only fails in debug builds on Power8, but this is not 100%
            confirmed.

            Looking into the error log, we consistently see that it hangs during creation
            of the Doublewrite buffer:

            {noformat}
            InnoDB: Doublewrite buffer not found: creating new
            {noformat}

            Normally there should be a message after this that the doublewrite buffer was
            created successfully, but when the error occurs, instead the server hangs at
            this point.

            With debug printouts, I traced the hang to occur inside buf_flush_batch(),
            called from buf_flush_list(), log_preflush_pool_modified_pages(),
            log_make_checkpoint_at(), trx_sys_create_doublewrite_buf(). I'll try to get a
            GDB backtrace when it hangs.

            Here is how I reproduced it, running the test in a loop and waiting for it to
            hang:
            {noformat}
            bzr branch lp:maria/5.5
            cd 5.5
            export PATH=/opt/at7.0/bin:$PATH EXTRA_FLAGS="-O2 -Wuninitialized -DFORCE_INIT_OF_VARS" EXTRA_CONFIGS="--with-embedded-privilege-control" AM_EXTRA_MAKEFLAGS="VERBOSE=1"
            cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON
            time make -j4 package
            (cd mysql-test && for i in `seq 40000` ; do echo XXX $i XXX ; time ./mtr --parallel=4 rpl.last_insert_id rpl.rpl_row_blob_innodb --verbose-restart || exit 1 ; done)
            {noformat}
            This failure is seen in Buildbot. It is a race that occurs somewhat rarely,
            but it can be consistently reproduced manually by running tests in a loop.

            The failure can occur in different tests, for example like this:

            http://buildbot.askmonty.org/buildbot/builders/bintar-trusty-p8-debug/builds/173
            {noformat}
            worker[2] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 31323, winpid: 31323] to create a pid file.
            rpl.rpl_incident 'stmt' w2 [ fail ]
                    Test ended at 2014-10-31 12:04:27

            CURRENT_TEST: rpl.rpl_incident


            Failed to start mysqld.2
            mysqltest failed but provided no output
            The result from queries just before the failure was:
            < snip >
            call mtr.add_suppression("Slave SQL.*The incident LOST_EVENTS occured on the master.* 1590");
            include/wait_for_slave_sql_error.inc [errno=1590]
            Last_SQL_Error = 'The incident LOST_EVENTS occured on the master. Message: <none>'
            **** On Slave ****
            SELECT * FROM t1;
            a
            1
            2
            3
            SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
            START SLAVE;
            SELECT * FROM t1;
            a
            1
            2
            3
            4
            include/check_slave_is_running.inc
            DROP TABLE t1;
            include/rpl_end.inc
            {noformat}

            More failures are seen in a cross-reference search:

            http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=-p8&dt=&bbnum=&typ=&info=&fail_name=&fail_variant=&fail_info_short=&fail_info_full=Failed+to+start+mysqld&limit=100

            It looks like it only fails in debug builds on Power8, but this is not 100%
            confirmed.

            Looking into the error log, we consistently see that it hangs during creation
            of the Doublewrite buffer:

            {noformat}
            InnoDB: Doublewrite buffer not found: creating new
            {noformat}

            Normally there should be a message after this that the doublewrite buffer was
            created successfully, but when the error occurs, instead the server hangs at
            this point.

            Here is a GDB stacktrace from when it hangs:
            {noformat}
            Thread 1 (Thread 0x3fff82fb5d90 (LWP 74233)):
            #0 0x00003fff82f22190 in __pthread_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30c8) at pthread_cond_wait.c:187
            #1 0x0000000010b4f0ac in safe_cond_wait (cond=0x10029bf3150, mp=0x10029bf30a0, file=0x10d1a638 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c", line=207)
                at /home/skysql/bzr/knielsen/5.5/mysys/thr_mutex.c:493
            #2 0x0000000010a7eae4 in os_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30a0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:207
            #3 0x0000000010a7f6d8 in os_event_wait_low (event=0x10029bf30a0, reset_sig_count=2) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:609
            #4 0x00000000108f544c in sync_array_wait_event (arr=0x100292584d0, index=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0arr.c:459
            #5 0x00000000108f9a20 in mutex_spin_wait (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
                at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0sync.c:636
            #6 0x00000000109e48d4 in mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
                at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:222
            #7 0x00000000109e4a18 in pfs_mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
                at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:251
            #8 0x00000000109ee980 in fil_mutex_enter_and_prepare_for_io (space_id=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:985
            #9 0x00000000109f97c8 in _fil_io (type=11, sync=0, space_id=0, zip_size=0, block_offset=157, byte_offset=0, len=16384, buf=0x3fff7b538000, message=0x3fff7af542c0, trx=0x0)
                at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:5446
            #10 0x00000000109933fc in buf_flush_write_block_low (bpage=0x3fff7af542c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1192
            #11 0x0000000010993c64 in buf_flush_page (buf_pool=0x10029c07798, bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1400
            #12 0x0000000010994010 in buf_flush_try_neighbors (space=0, offset=157, flush_type=BUF_FLUSH_LIST, n_flushed=48, n_to_flush=18446744073709551614)
                at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1541
            #13 0x00000000109944c8 in buf_flush_page_and_try_neighbors (bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x3fffd3831ed0)
                at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1638
            #14 0x00000000109949dc in buf_flush_flush_list_batch (buf_pool=0x10029c07798, min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1768
            #15 0x0000000010994b48 in buf_flush_batch (buf_pool=0x10029c07798, flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615)
                at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1848
            #16 0x00000000109952a8 in buf_flush_list (min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:2065
            #17 0x0000000010a53a1c in log_preflush_pool_modified_pages (new_oldest=18446744073709551615, sync=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:1874
            #18 0x0000000010a548c8 in log_make_checkpoint_at (lsn=18446744073709551615, write_always=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:2313
            #19 0x000000001091f0e8 in trx_sys_create_doublewrite_buf () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/trx/trx0sys.c:407
            #20 0x00000000108f2fbc in innobase_start_or_create_for_mysql () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/srv/srv0start.c:2080
            #21 0x0000000010867b90 in innobase_init (p=0x100292581c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/handler/ha_innodb.cc:3225
            #22 0x000000001047364c in ha_initialize_handlerton (plugin=0x100291a3d68) at /home/skysql/bzr/knielsen/5.5/sql/handler.cc:472
            #23 0x0000000010255b68 in plugin_initialize (tmp_root=0x3fffd3833690, plugin=0x100291a3d68, argc=0x11126e70 <remaining_argc>, argv=0x10029128880, options_only=false)
                at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1372
            #24 0x0000000010256718 in plugin_init (argc=0x11126e70 <remaining_argc>, argv=0x10029128880, flags=0) at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1613
            #25 0x000000001015a080 in init_server_components () at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4337
            #26 0x000000001015b808 in mysqld_main (argc=100, argv=0x10029128880) at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4934
            #27 0x00000000101509d8 in main (argc=15, argv=0x3fffd3833f98) at /home/skysql/bzr/knielsen/5.5/sql/main.cc:25
            {noformat}

            Here is how I reproduced it, running the test in a loop and waiting for it to
            hang:
            {noformat}
            bzr branch lp:maria/5.5
            cd 5.5
            export PATH=/opt/at7.0/bin:$PATH EXTRA_FLAGS="-O2 -Wuninitialized -DFORCE_INIT_OF_VARS" EXTRA_CONFIGS="--with-embedded-privilege-control" AM_EXTRA_MAKEFLAGS="VERBOSE=1"
            cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON
            time make -j4 package
            (cd mysql-test && for i in `seq 40000` ; do echo XXX $i XXX ; time ./mtr --parallel=4 rpl.last_insert_id rpl.rpl_row_blob_innodb --verbose-restart || exit 1 ; done)
            {noformat}

            The problem is apparently that a waiting thread is not woken up when an InnoDB mutes is released:

            	while (!event->is_set && event->signal_count == reset_sig_count)
            		os_cond_wait(&(event->cond_var), &(event->os_mutex));
             
            (gdb) p event->is_set
            $3 = 0
            (gdb) p event->signal_count
            $4 = 2
            (gdb) p reset_signal_count
            No symbol "reset_signal_count" in current context.
            (gdb) p reset_sig_count
            $5 = 2
            (gdb) p  !event->is_set && event->signal_count == reset_sig_count
            $6 = 1

            The wakeup condition is not true...

            (gdb) p *mutex
            $8 = {event = 0x10029bf30a0, lock_word = 0 '\000', waiters = 1, list = {prev = 0x10029bf2f70, next = 0x11953928 <srv_misc_tmpfile_mutex>}, 
              cfile_name = 0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", cline = 1673, thread_id = 18446744073709551615, magic_n = 979585, count_os_wait = 1, count_using = 1611, 
              count_spin_loop = 6, count_spin_rounds = 175, count_os_yield = 3, lspent_time = 0, lmax_spent_time = 0, mutex_type = 0, cmutex_name = 0x10d02228 "&fil_system->mutex", pfs_psi = 0x3fff81ff2330}
            (gdb) p mutex->lock_word
            $9 = 0 '\000'

            But the mutex is not locked (lock_word == 0).

            Looks like some issue with the InnoDB mutex implementation...

            knielsen Kristian Nielsen added a comment - The problem is apparently that a waiting thread is not woken up when an InnoDB mutes is released: while (!event->is_set && event->signal_count == reset_sig_count) os_cond_wait(&(event->cond_var), &(event->os_mutex));   (gdb) p event->is_set $3 = 0 (gdb) p event->signal_count $4 = 2 (gdb) p reset_signal_count No symbol "reset_signal_count" in current context. (gdb) p reset_sig_count $5 = 2 (gdb) p !event->is_set && event->signal_count == reset_sig_count $6 = 1 The wakeup condition is not true... (gdb) p *mutex $8 = {event = 0x10029bf30a0, lock_word = 0 '\000', waiters = 1, list = {prev = 0x10029bf2f70, next = 0x11953928 <srv_misc_tmpfile_mutex>}, cfile_name = 0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", cline = 1673, thread_id = 18446744073709551615, magic_n = 979585, count_os_wait = 1, count_using = 1611, count_spin_loop = 6, count_spin_rounds = 175, count_os_yield = 3, lspent_time = 0, lmax_spent_time = 0, mutex_type = 0, cmutex_name = 0x10d02228 "&fil_system->mutex", pfs_psi = 0x3fff81ff2330} (gdb) p mutex->lock_word $9 = 0 '\000' But the mutex is not locked (lock_word == 0). Looks like some issue with the InnoDB mutex implementation...

            There is a comment in mutex_exit_func():

                    /* A problem: we assume that mutex_reset_lock word
                    is a memory barrier, that is when we read the waiters
                    field next, the read must be serialized in memory
                    after the reset. A speculative processor might
                    perform the read first, which could leave a waiting
                    thread hanging indefinitely.
             
                    Our current solution call every second
                    sync_arr_wake_threads_if_sema_free()
                    to wake up possible hanging threads if
                    they are missed in mutex_signal_object. */

            Even though we added "os_isync" between "lock_word store" and "waiters load", it is not enough. Only full memory barrier guarantees StoreLoad order.

            So we still rely on the error monitor thread to wake up such stale threads. But error monitor thread is using mutexes itself and may get stuck similarly to other threads. To workaround this Monty added log_get_lsn_nowait(), which basically does trylock instead of lock. This is added in revision 4271 (monty@mariadb.org-20140819162835-sorv0ogd39f7mui8) of 5.5.

            In this case I suspect error monitor thread is either not started or is stuck on different mutex (other than log_get_lsn mutex).

            svoj Sergey Vojtovich added a comment - There is a comment in mutex_exit_func(): /* A problem: we assume that mutex_reset_lock word is a memory barrier, that is when we read the waiters field next, the read must be serialized in memory after the reset. A speculative processor might perform the read first, which could leave a waiting thread hanging indefinitely.   Our current solution call every second sync_arr_wake_threads_if_sema_free() to wake up possible hanging threads if they are missed in mutex_signal_object. */ Even though we added "os_isync" between "lock_word store" and "waiters load", it is not enough. Only full memory barrier guarantees StoreLoad order. So we still rely on the error monitor thread to wake up such stale threads. But error monitor thread is using mutexes itself and may get stuck similarly to other threads. To workaround this Monty added log_get_lsn_nowait(), which basically does trylock instead of lock. This is added in revision 4271 (monty@mariadb.org-20140819162835-sorv0ogd39f7mui8) of 5.5. In this case I suspect error monitor thread is either not started or is stuck on different mutex (other than log_get_lsn mutex).
            svoj Sergey Vojtovich added a comment - See also https://mariadb.atlassian.net/browse/MDEV-6654
            svoj Sergey Vojtovich made changes -
            elenst Elena Stepanova made changes -
            Assignee Jan Lindström [ jplindst ]
            knielsen Kristian Nielsen added a comment - - edited

            I wrote up a detailed analysis on maria-developers@ of what appears to be the
            cause of the bug.

            https://lists.launchpad.net/maria-developers/msg07860.html

            Incidentally, it appears that this same bug is also what is
            causing various strange hangs for a number of users lately.

            I agree that in this case, the error monitor thread is not yet started (it was
            not seen in the full stack trace). And in the user hang cases, that the error
            monitor thread itself gets stuck on a mutex (I saw this in the case I
            investigated).

            However, I strongly disagree that the whole idea of using error monitor
            thread, or trylock, is in any way an acceptable solution. If the mutex
            implementation is broken, it should be fixed, not worked around with horrible
            hacks.

            knielsen Kristian Nielsen added a comment - - edited I wrote up a detailed analysis on maria-developers@ of what appears to be the cause of the bug. https://lists.launchpad.net/maria-developers/msg07860.html Incidentally, it appears that this same bug is also what is causing various strange hangs for a number of users lately. I agree that in this case, the error monitor thread is not yet started (it was not seen in the full stack trace). And in the user hang cases, that the error monitor thread itself gets stuck on a mutex (I saw this in the case I investigated). However, I strongly disagree that the whole idea of using error monitor thread, or trylock, is in any way an acceptable solution. If the mutex implementation is broken, it should be fixed, not worked around with horrible hacks.
            serg Sergei Golubchik made changes -
            Fix Version/s 10.0 [ 16000 ]
            Fix Version/s 5.5 [ 15800 ]
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Blocker [ 1 ]
            serg Sergei Golubchik made changes -
            Affects Version/s 10.0.14 [ 17101 ]
            Affects Version/s 5.5.40 [ 17100 ]
            Affects Version/s 5.5 [ 15800 ]
            Affects Version/s 10.0 [ 16000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Sergey Vojtovich [ svoj ]

            Kristian, please review fix for this bug.

            svoj Sergey Vojtovich added a comment - Kristian, please review fix for this bug.
            svoj Sergey Vojtovich made changes -
            Assignee Sergey Vojtovich [ svoj ] Kristian Nielsen [ knielsen ]
            Status Open [ 1 ] In Review [ 10002 ]

            ...second patch:

            Date: Wed, 12 Nov 2014 11:49:27 +0000 (UTC)
            From: Sergey Vojtovich <svoj@mariadb.org>
            To: commits@mariadb.org
            Subject: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5

            svoj Sergey Vojtovich added a comment - ...second patch: Date: Wed, 12 Nov 2014 11:49:27 +0000 (UTC) From: Sergey Vojtovich <svoj@mariadb.org> To: commits@mariadb.org Subject: [Commits] Rev 4346: MDEV-7026 - Occasional hang during startup on Power8 in lp:maria/5.5
            knielsen Kristian Nielsen added a comment - - edited
            knielsen Kristian Nielsen added a comment - - edited Review sent to maria-developers@: https://lists.launchpad.net/maria-developers/msg07896.html
            knielsen Kristian Nielsen made changes -
            Assignee Kristian Nielsen [ knielsen ] Sergey Vojtovich [ svoj ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            Kristian, please review updated patch.

            svoj Sergey Vojtovich added a comment - Kristian, please review updated patch.
            svoj Sergey Vojtovich made changes -
            Assignee Sergey Vojtovich [ svoj ] Kristian Nielsen [ knielsen ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            knielsen Kristian Nielsen added a comment - New review sent to maria-developers@: https://lists.launchpad.net/maria-developers/msg07913.html
            knielsen Kristian Nielsen made changes -
            Assignee Kristian Nielsen [ knielsen ] Sergey Vojtovich [ svoj ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            knielsen Kristian Nielsen added a comment - - edited

            Patch to clarify suggestions in previously made review:

            http://lists.askmonty.org/pipermail/commits/2014-November/007050.html
            lp:~maria-captains/maria/5.5-mdev7026

            knielsen Kristian Nielsen added a comment - - edited Patch to clarify suggestions in previously made review: http://lists.askmonty.org/pipermail/commits/2014-November/007050.html lp:~maria-captains/maria/5.5-mdev7026
            knielsen Kristian Nielsen made changes -
            Summary Occasional hang during startup on Power8 Race in InnoDB/XtraDB mutex implementation can stall or hang the server

            Fixed in 10.0.15, 5.5.41:

            revno: 4367
            revision-id: knielsen@knielsen-hq.org-20141119125646-njq02h7irmu9s71b
            parent: sergii@pisem.net-20141118231952-fk3o69075dn3ravv
            committer: Kristian Nielsen <knielsen@knielsen-hq.org>
            branch nick: mariadb-5.5
            timestamp: Wed 2014-11-19 13:56:46 +0100
            message:
              MDEV-7026: Race in InnoDB/XtraDB mutex implementation can stall or hang the server.
             
              The bug was that full memory barrier was missing in the code that ensures that
              a waiter on an InnoDB mutex will not go to sleep unless it is guaranteed to be
              woken up again by another thread currently holding the mutex. This made
              possible a race where a thread could get stuck waiting for a mutex that is in
              fact no longer locked. If that thread was also holding other critical locks,
              this could stall the entire server. There is an error monitor thread than can
              break the stall, it runs about once per second. But if the error monitor
              thread itself got stuck or was not running, then the entire server could hang
              infinitely.
             
              This was introduced on i386/amd64 platforms in 5.5.40 and 10.0.13 by an
              incorrect patch that tried to fix the similar problem for PowerPC.
             
              This commit reverts the incorrect PowerPC patch, and instead implements a fix
              for PowerPC that does not change i386/amd64 behaviour, making PowerPC work
              similarly to i386/amd64.

            svoj Sergey Vojtovich added a comment - Fixed in 10.0.15, 5.5.41: revno: 4367 revision-id: knielsen@knielsen-hq.org-20141119125646-njq02h7irmu9s71b parent: sergii@pisem.net-20141118231952-fk3o69075dn3ravv committer: Kristian Nielsen <knielsen@knielsen-hq.org> branch nick: mariadb-5.5 timestamp: Wed 2014-11-19 13:56:46 +0100 message: MDEV-7026: Race in InnoDB/XtraDB mutex implementation can stall or hang the server.   The bug was that full memory barrier was missing in the code that ensures that a waiter on an InnoDB mutex will not go to sleep unless it is guaranteed to be woken up again by another thread currently holding the mutex. This made possible a race where a thread could get stuck waiting for a mutex that is in fact no longer locked. If that thread was also holding other critical locks, this could stall the entire server. There is an error monitor thread than can break the stall, it runs about once per second. But if the error monitor thread itself got stuck or was not running, then the entire server could hang infinitely.   This was introduced on i386/amd64 platforms in 5.5.40 and 10.0.13 by an incorrect patch that tried to fix the similar problem for PowerPC.   This commit reverts the incorrect PowerPC patch, and instead implements a fix for PowerPC that does not change i386/amd64 behaviour, making PowerPC work similarly to i386/amd64.
            svoj Sergey Vojtovich made changes -
            Component/s Platform Power [ 10137 ]
            Component/s Storage Engine - InnoDB [ 10129 ]
            Component/s Storage Engine - XtraDB [ 10135 ]
            Fix Version/s 5.5.41 [ 17600 ]
            Fix Version/s 10.0.15 [ 17300 ]
            Fix Version/s 5.5 [ 15800 ]
            Fix Version/s 10.0 [ 16000 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            elenst Elena Stepanova made changes -
            svoj Sergey Vojtovich made changes -
            danblack Daniel Black made changes -
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 58212 ] MariaDB v3 [ 64477 ]
            stewart-ibm Stewart Smith added a comment -

            The use of __sync_synchronize() here is incorrect as it gererates a sync() instruction which is a heavyweight barrier which is only needed if you're writing device drivers.

            __sync_lock_release() is sufficient

            realistically, __atomic_test_and_set() and __atomic_clear() should be used these days.

            stewart-ibm Stewart Smith added a comment - The use of __sync_synchronize() here is incorrect as it gererates a sync() instruction which is a heavyweight barrier which is only needed if you're writing device drivers. __sync_lock_release() is sufficient realistically, __atomic_test_and_set() and __atomic_clear() should be used these days.

            stewart-ibm, right. Good mutex implementation should never need to issue heavyweight sync.

            In 5.5 and 10.0 old mutex implementation was kept intact as much as possible, for which this sync is a must.

            In 10.1 there was an aim (MDEV-6654) to reimplement InnoDB mutexes. This was suspended because InnoDB also reimplemented mutexes in MySQL-5.7. We'll probably get back to this when we have InnoDB from 5.7.

            svoj Sergey Vojtovich added a comment - stewart-ibm , right. Good mutex implementation should never need to issue heavyweight sync. In 5.5 and 10.0 old mutex implementation was kept intact as much as possible, for which this sync is a must. In 10.1 there was an aim ( MDEV-6654 ) to reimplement InnoDB mutexes. This was suspended because InnoDB also reimplemented mutexes in MySQL-5.7. We'll probably get back to this when we have InnoDB from 5.7.
            stewart-ibm Stewart Smith added a comment -

            You should be able to remove the __sync_synchronize() and everything work. There's no reason for the sync instruction to be there.

            stewart-ibm Stewart Smith added a comment - You should be able to remove the __sync_synchronize() and everything work. There's no reason for the sync instruction to be there.

            I do not plan to re-visit this issue myself. But be sure to read the mailing list discussion on this in the archive, there was a very detailed discussion on what kind of barriers are needed in the current implementation, and why.

            knielsen Kristian Nielsen added a comment - I do not plan to re-visit this issue myself. But be sure to read the mailing list discussion on this in the archive, there was a very detailed discussion on what kind of barriers are needed in the current implementation, and why.

            stewart-ibm, we added __sync_synchronize() to make sure "lock_word" is stored before "waiters" are loaded. That is to ensure StoreLoad order.

            You say it's not needed. If so, how can we ensure StoreLoad order in this case?

            svoj Sergey Vojtovich added a comment - stewart-ibm , we added __sync_synchronize() to make sure "lock_word" is stored before "waiters" are loaded. That is to ensure StoreLoad order. You say it's not needed. If so, how can we ensure StoreLoad order in this case?
            jplindst Jan Lindström (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 64477 ] MariaDB v4 [ 148415 ]

            People

              svoj Sergey Vojtovich
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.