Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7026

Race in InnoDB/XtraDB mutex implementation can stall or hang the server

    XMLWordPrintable

Details

    Description

      This failure is seen in Buildbot. It is a race that occurs somewhat rarely,
      but it can be consistently reproduced manually by running tests in a loop.

      The failure can occur in different tests, for example like this:

      http://buildbot.askmonty.org/buildbot/builders/bintar-trusty-p8-debug/builds/173

      worker[2] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.2 - pid: 31323, winpid: 31323] to create a pid file.
      rpl.rpl_incident 'stmt'                  w2 [ fail ]
              Test ended at 2014-10-31 12:04:27
       
      CURRENT_TEST: rpl.rpl_incident
       
       
      Failed to start mysqld.2
      mysqltest failed but provided no output
      The result from queries just before the failure was:
      < snip >
      call mtr.add_suppression("Slave SQL.*The incident LOST_EVENTS occured on the master.* 1590");
      include/wait_for_slave_sql_error.inc [errno=1590]
      Last_SQL_Error = 'The incident LOST_EVENTS occured on the master. Message: <none>'
      **** On Slave ****
      SELECT * FROM t1;
      a
      1
      2
      3
      SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
      START SLAVE;
      SELECT * FROM t1;
      a
      1
      2
      3
      4
      include/check_slave_is_running.inc
      DROP TABLE t1;
      include/rpl_end.inc

      More failures are seen in a cross-reference search:

      http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=-p8&dt=&bbnum=&typ=&info=&fail_name=&fail_variant=&fail_info_short=&fail_info_full=Failed+to+start+mysqld&limit=100

      It looks like it only fails in debug builds on Power8, but this is not 100%
      confirmed.

      Looking into the error log, we consistently see that it hangs during creation
      of the Doublewrite buffer:

      InnoDB: Doublewrite buffer not found: creating new

      Normally there should be a message after this that the doublewrite buffer was
      created successfully, but when the error occurs, instead the server hangs at
      this point.

      Here is a GDB stacktrace from when it hangs:

      Thread 1 (Thread 0x3fff82fb5d90 (LWP 74233)):
      #0  0x00003fff82f22190 in __pthread_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30c8) at pthread_cond_wait.c:187
      #1  0x0000000010b4f0ac in safe_cond_wait (cond=0x10029bf3150, mp=0x10029bf30a0, file=0x10d1a638 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c", line=207)
          at /home/skysql/bzr/knielsen/5.5/mysys/thr_mutex.c:493
      #2  0x0000000010a7eae4 in os_cond_wait (cond=0x10029bf3150, mutex=0x10029bf30a0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:207
      #3  0x0000000010a7f6d8 in os_event_wait_low (event=0x10029bf30a0, reset_sig_count=2) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/os/os0sync.c:609
      #4  0x00000000108f544c in sync_array_wait_event (arr=0x100292584d0, index=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0arr.c:459
      #5  0x00000000108f9a20 in mutex_spin_wait (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/sync/sync0sync.c:636
      #6  0x00000000109e48d4 in mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:222
      #7  0x00000000109e4a18 in pfs_mutex_enter_func (mutex=0x10029bf2ed8, file_name=0x10d01150 "/home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c", line=985)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/include/sync0sync.ic:251
      #8  0x00000000109ee980 in fil_mutex_enter_and_prepare_for_io (space_id=0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:985
      #9  0x00000000109f97c8 in _fil_io (type=11, sync=0, space_id=0, zip_size=0, block_offset=157, byte_offset=0, len=16384, buf=0x3fff7b538000, message=0x3fff7af542c0, trx=0x0)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/fil/fil0fil.c:5446
      #10 0x00000000109933fc in buf_flush_write_block_low (bpage=0x3fff7af542c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1192
      #11 0x0000000010993c64 in buf_flush_page (buf_pool=0x10029c07798, bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1400
      #12 0x0000000010994010 in buf_flush_try_neighbors (space=0, offset=157, flush_type=BUF_FLUSH_LIST, n_flushed=48, n_to_flush=18446744073709551614)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1541
      #13 0x00000000109944c8 in buf_flush_page_and_try_neighbors (bpage=0x3fff7af542c0, flush_type=BUF_FLUSH_LIST, n_to_flush=18446744073709551614, count=0x3fffd3831ed0)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1638
      #14 0x00000000109949dc in buf_flush_flush_list_batch (buf_pool=0x10029c07798, min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1768
      #15 0x0000000010994b48 in buf_flush_batch (buf_pool=0x10029c07798, flush_type=BUF_FLUSH_LIST, min_n=18446744073709551614, lsn_limit=18446744073709551615)
          at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:1848
      #16 0x00000000109952a8 in buf_flush_list (min_n=18446744073709551614, lsn_limit=18446744073709551615) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/buf/buf0flu.c:2065
      #17 0x0000000010a53a1c in log_preflush_pool_modified_pages (new_oldest=18446744073709551615, sync=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:1874
      #18 0x0000000010a548c8 in log_make_checkpoint_at (lsn=18446744073709551615, write_always=1) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/log/log0log.c:2313
      #19 0x000000001091f0e8 in trx_sys_create_doublewrite_buf () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/trx/trx0sys.c:407
      #20 0x00000000108f2fbc in innobase_start_or_create_for_mysql () at /home/skysql/bzr/knielsen/5.5/storage/xtradb/srv/srv0start.c:2080
      #21 0x0000000010867b90 in innobase_init (p=0x100292581c0) at /home/skysql/bzr/knielsen/5.5/storage/xtradb/handler/ha_innodb.cc:3225
      #22 0x000000001047364c in ha_initialize_handlerton (plugin=0x100291a3d68) at /home/skysql/bzr/knielsen/5.5/sql/handler.cc:472
      #23 0x0000000010255b68 in plugin_initialize (tmp_root=0x3fffd3833690, plugin=0x100291a3d68, argc=0x11126e70 <remaining_argc>, argv=0x10029128880, options_only=false)
          at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1372
      #24 0x0000000010256718 in plugin_init (argc=0x11126e70 <remaining_argc>, argv=0x10029128880, flags=0) at /home/skysql/bzr/knielsen/5.5/sql/sql_plugin.cc:1613
      #25 0x000000001015a080 in init_server_components () at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4337
      #26 0x000000001015b808 in mysqld_main (argc=100, argv=0x10029128880) at /home/skysql/bzr/knielsen/5.5/sql/mysqld.cc:4934
      #27 0x00000000101509d8 in main (argc=15, argv=0x3fffd3833f98) at /home/skysql/bzr/knielsen/5.5/sql/main.cc:25

      Here is how I reproduced it, running the test in a loop and waiting for it to
      hang:

      bzr branch lp:maria/5.5
      cd 5.5
      export PATH=/opt/at7.0/bin:$PATH EXTRA_FLAGS="-O2 -Wuninitialized -DFORCE_INIT_OF_VARS" EXTRA_CONFIGS="--with-embedded-privilege-control" AM_EXTRA_MAKEFLAGS="VERBOSE=1"
      cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON
      time make -j4 package
      (cd mysql-test && for i in `seq 40000` ; do echo XXX $i XXX ; time ./mtr --parallel=4 rpl.last_insert_id rpl.rpl_row_blob_innodb --verbose-restart || exit 1 ; done)

      Attachments

        Issue Links

          Activity

            People

              svoj Sergey Vojtovich
              knielsen Kristian Nielsen
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.