Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13497

concurrently used sequence hangs in SEQUENCE::next_value

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.3.0
    • 10.3.1
    • Sequences
    • None

    Description

      A workload where many workers concurrently pull a number from a InnoDB sequence and insert the number into another InnoDB table using transactions, occasionally causes all worker threads to hang in SEQUENCE::next_value. Everything is fine when no transactions are used (autocommit) or when the sequence object is using the MyISAM engine.

      How to repeat:

      You need Lua-enabled sysbench, i.e. from https://github.com/hgxl64/sysbench-mariadb.git. Then use the attached test script sequence_native.lua:

      mysql -S ... -u root -e "drop database if exists sbtest"
      mysql -S ... -u root -e "create database sbtest"
      sysbench --test=sequence_native.lua --oltp-tables-count=32 \
       --mysql-table-engine=InnoDB --sequence-cache=10 \
       --mysql-user=root --mysql-socket=... prepare
      sysbench --test=sequence_native.lua --oltp-tables-count=32 \
       --trx-size=10 --num-threads=2 --report-interval=1 --max-time=0 \
       --max-requests=0 --mysql-user=root --mysql-socket=... run
      

      You might need to alter the sequence cache size or number of threads. The Lua script understands a few options:
      --sequence-cache is used for the CACHE option when creating the sequence
      --sequence-engine is used for the ENGINE option of the sequence
      --trx-size gives the number of inserts in each transaction

      My guess is, that something goes wrong if the sequence cache runs empty for multiple worker threads and they have to dive for the base table at the same time. The bigger the sequence cache is, the more worker threads are needed to trigger the hang. I.e. with Cache=1000 I need 16 threads.

      Attached is also my.cnf and a stack trace from a hanging mysqld.

      Attachments

        1. hang.trace.txt.gz
          8 kB
        2. my.cnf
          0.3 kB
        3. sequence_native.lua
          2 kB

        Issue Links

          Activity

            This looks like an InnoDB bug. From the hang.trace.txt.gz:

            #4  0x00007fde473f63ab in lock_wait_suspend_thread (thr=thr@entry=0x7fdb8801d8a0) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/lock/lock0wait.cc:337
            #5  0x00007fde47486b77 in row_mysql_handle_errors (new_err=new_err@entry=0x7fdbec090d30, trx=trx@entry=0x7fde4409ed58, thr=thr@entry=0x7fdb8801d8a0, savept=savept@entry=0x0) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/row/row0mysql.cc:795
            #6  0x00007fde474a75ea in row_search_mvcc (buf=buf@entry=0x7fdb88018460 "\377\301\031\262", mode=mode@entry=PAGE_CUR_G, prebuilt=0x7fdb8801cca0, match_mode=match_mode@entry=0, direction=direction@entry=0) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/row/row0sel.cc:5653
            #7  0x00007fde473b210a in ha_innobase::index_read (this=0x7fdb88017830, buf=0x7fdb88018460 "\377\301\031\262", key_ptr=<optimized out>, key_len=<optimized out>, find_flag=<optimized out>) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/handler/ha_innodb.cc:9945
            #8  0x00007fde4739acb7 in ha_innobase::index_first (this=<optimized out>, buf=<optimized out>) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/handler/ha_innodb.cc:10373
            #9  0x00007fde473b16b9 in ha_innobase::rnd_next (this=0x7fdb88017830, buf=<optimized out>) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/handler/ha_innodb.cc:10473
            #10 0x00007fde47217467 in handler::ha_rnd_next (this=this@entry=0x7fdb88017830, buf=0x7fdb88018460 "\377\301\031\262") at /home/axel/mariadb-source/mariadb-10.3.0/sql/handler.cc:2593
            #11 0x00007fde47217535 in handler::update_first_row (this=0x7fdb88017830, new_data=0x7fdb88018420 "\377\211L\262") at /home/axel/mariadb-source/mariadb-10.3.0/sql/handler.cc:6041
            #12 0x00007fde477a7d18 in ha_sequence::write_row (this=0x7fdb88017fb0, buf=0x7fdb88018420 "\377\211L\262") at /home/axel/mariadb-source/mariadb-10.3.0/sql/ha_sequence.cc:216
            

            There should be no InnoDB table or record locking for the SEQUENCE pseudo-table.

            marko Marko Mäkelä added a comment - This looks like an InnoDB bug. From the hang.trace.txt.gz : #4 0x00007fde473f63ab in lock_wait_suspend_thread (thr=thr@entry=0x7fdb8801d8a0) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/lock/lock0wait.cc:337 #5 0x00007fde47486b77 in row_mysql_handle_errors (new_err=new_err@entry=0x7fdbec090d30, trx=trx@entry=0x7fde4409ed58, thr=thr@entry=0x7fdb8801d8a0, savept=savept@entry=0x0) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/row/row0mysql.cc:795 #6 0x00007fde474a75ea in row_search_mvcc (buf=buf@entry=0x7fdb88018460 "\377\301\031\262", mode=mode@entry=PAGE_CUR_G, prebuilt=0x7fdb8801cca0, match_mode=match_mode@entry=0, direction=direction@entry=0) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/row/row0sel.cc:5653 #7 0x00007fde473b210a in ha_innobase::index_read (this=0x7fdb88017830, buf=0x7fdb88018460 "\377\301\031\262", key_ptr=<optimized out>, key_len=<optimized out>, find_flag=<optimized out>) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/handler/ha_innodb.cc:9945 #8 0x00007fde4739acb7 in ha_innobase::index_first (this=<optimized out>, buf=<optimized out>) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/handler/ha_innodb.cc:10373 #9 0x00007fde473b16b9 in ha_innobase::rnd_next (this=0x7fdb88017830, buf=<optimized out>) at /home/axel/mariadb-source/mariadb-10.3.0/storage/innobase/handler/ha_innodb.cc:10473 #10 0x00007fde47217467 in handler::ha_rnd_next (this=this@entry=0x7fdb88017830, buf=0x7fdb88018460 "\377\301\031\262") at /home/axel/mariadb-source/mariadb-10.3.0/sql/handler.cc:2593 #11 0x00007fde47217535 in handler::update_first_row (this=0x7fdb88017830, new_data=0x7fdb88018420 "\377\211L\262") at /home/axel/mariadb-source/mariadb-10.3.0/sql/handler.cc:6041 #12 0x00007fde477a7d18 in ha_sequence::write_row (this=0x7fdb88017fb0, buf=0x7fdb88018420 "\377\211L\262") at /home/axel/mariadb-source/mariadb-10.3.0/sql/ha_sequence.cc:216 There should be no InnoDB table or record locking for the SEQUENCE pseudo-table.

            I implemented the InnoDB part of SEQUENCE in MDEV-10139 in the bb-10.2-ext branch, which is in the merge workflow: 10.2→bb-10.2-ext→10.3.
            Since the last merge from bb-10.2-ext before the 10.3.0 release, there are multiple fixes to SEQUENCE bugs.

            Because the test did not involve restarting the server, we can rule out this bug:
            MDEV-13015 After restart, InnoDB wrongly thinks that a SEQUENCE is a TABLE

            The stack trace indicates that InnoDB wrongly tried to acquire a transactional lock on a sequence. Sequences are not supposed to use any transactions or transactional locks. They are being persisted via the InnoDB redo log. Concurrency control inside InnoDB is supposed to rely solely on the buf_page_t::lock on the page that stores the state of the sequence. The sequence is internally maintained in a 1-page, 1-record clustered index that is updated-in-place.

            I believe that this bug was fixed in this MDEV-10139 follow-up commit that includes the following change:

            diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc
            index 106845f73fa..18ae2a8a4e8 100644
            --- a/storage/innobase/row/row0sel.cc
            +++ b/storage/innobase/row/row0sel.cc
            @@ -4481,8 +4481,6 @@ row_search_mvcc(
             	      || MVCC::is_view_active(trx->read_view)
             	      || srv_read_only_mode);
             
            -	trx_start_if_not_started(trx, false);
            -
             	if (trx->isolation_level <= TRX_ISO_READ_COMMITTED
             	    && prebuilt->select_lock_type != LOCK_NONE
             	    && trx->mysql_thd != NULL
            @@ -4533,9 +4531,14 @@ row_search_mvcc(
             			fputc('\n', stderr);
             			ut_error;
             		}
            +	} else if (prebuilt->table->no_rollback()) {
            +		/* NO_ROLLBACK tables do not support MVCC or locking. */
            +		prebuilt->select_lock_type = LOCK_NONE;
            +		prebuilt->sql_stat_start = FALSE;
             	} else if (prebuilt->select_lock_type == LOCK_NONE) {
             		/* This is a consistent read */
             		/* Assign a read view for the query */
            +		trx_start_if_not_started(trx, false);
             
             		if (!srv_read_only_mode) {
             			trx_assign_read_view(trx);
            @@ -4543,6 +4546,7 @@ row_search_mvcc(
             
             		prebuilt->sql_stat_start = FALSE;
             	} else {
            +		trx_start_if_not_started(trx, false);
             wait_table_again:
             		err = lock_table(0, prebuilt->table,
             				 prebuilt->select_lock_type == LOCK_S
            

            marko Marko Mäkelä added a comment - I implemented the InnoDB part of SEQUENCE in MDEV-10139 in the bb-10.2-ext branch, which is in the merge workflow: 10.2→bb-10.2-ext→10.3. Since the last merge from bb-10.2-ext before the 10.3.0 release , there are multiple fixes to SEQUENCE bugs. Because the test did not involve restarting the server, we can rule out this bug: MDEV-13015 After restart, InnoDB wrongly thinks that a SEQUENCE is a TABLE The stack trace indicates that InnoDB wrongly tried to acquire a transactional lock on a sequence. Sequences are not supposed to use any transactions or transactional locks. They are being persisted via the InnoDB redo log. Concurrency control inside InnoDB is supposed to rely solely on the buf_page_t::lock on the page that stores the state of the sequence. The sequence is internally maintained in a 1-page, 1-record clustered index that is updated-in-place. I believe that this bug was fixed in this MDEV-10139 follow-up commit that includes the following change: diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc index 106845f73fa..18ae2a8a4e8 100644 --- a/storage/innobase/row/row0sel.cc +++ b/storage/innobase/row/row0sel.cc @@ -4481,8 +4481,6 @@ row_search_mvcc( || MVCC::is_view_active(trx->read_view) || srv_read_only_mode); - trx_start_if_not_started(trx, false); - if (trx->isolation_level <= TRX_ISO_READ_COMMITTED && prebuilt->select_lock_type != LOCK_NONE && trx->mysql_thd != NULL @@ -4533,9 +4531,14 @@ row_search_mvcc( fputc('\n', stderr); ut_error; } + } else if (prebuilt->table->no_rollback()) { + /* NO_ROLLBACK tables do not support MVCC or locking. */ + prebuilt->select_lock_type = LOCK_NONE; + prebuilt->sql_stat_start = FALSE; } else if (prebuilt->select_lock_type == LOCK_NONE) { /* This is a consistent read */ /* Assign a read view for the query */ + trx_start_if_not_started(trx, false); if (!srv_read_only_mode) { trx_assign_read_view(trx); @@ -4543,6 +4546,7 @@ row_search_mvcc( prebuilt->sql_stat_start = FALSE; } else { + trx_start_if_not_started(trx, false); wait_table_again: err = lock_table(0, prebuilt->table, prebuilt->select_lock_type == LOCK_S

            axel, please confirm if this bug is repeatable with

            git checkout 8acf4d6f783e5a5f736624064989b88867143685~
            

            and fixed in

            git checkout 8acf4d6f783e5a5f736624064989b88867143685
            

            If this is the case, please close the bug as fixed in 10.3.1, because that fix is present in 10.3.1.

            marko Marko Mäkelä added a comment - axel , please confirm if this bug is repeatable with git checkout 8acf4d6f783e5a5f736624064989b88867143685~ and fixed in git checkout 8acf4d6f783e5a5f736624064989b88867143685 If this is the case, please close the bug as fixed in 10.3.1, because that fix is present in 10.3.1.

            I confirmed that the bug was fixed by commit 8acf4d6f783e5a5f736624064989b88867143685. I can immediately reproduce the hang with the parent of that commit, and not with that commit.

            marko Marko Mäkelä added a comment - I confirmed that the bug was fixed by commit 8acf4d6f783e5a5f736624064989b88867143685 . I can immediately reproduce the hang with the parent of that commit, and not with that commit.

            People

              axel Axel Schwenke
              axel Axel Schwenke
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.