Details

    Description

      We have a server running several processes of Mariadb (10.0.22) and tables running tokudb engine (tokudb-7.5.7) replicating from a master in a normal master-slave setup.

      We realised that one of the proceses was lagging a lot found a query that was stuck. The process was completely stuck (stop slave didn't even work, it was just hanging trying to kill the replication thread). The only solution was to kill -9 mysqld.

      After migrating the table to InnoDB the problem looks gone.
      The full bug report is here: https://bugs.launchpad.net/percona-server/+bug/1621852 but Percona also suggested to get this opened here as it might be something between MariaDB and TokuDB plugin.

      Attachments

        Activity

          plinux,
          Percona suspects MariaDB parallel replication to be the reason of the problem. Could you please review their assessment?

          elenst Elena Stepanova added a comment - plinux , Percona suspects MariaDB parallel replication to be the reason of the problem. Could you please review their assessment?

          Note that we do have parallel replication disabled:

          MariaDB SANITARIUM localhost (none) > show global variables like 'slave_parallel_mode';
          Empty set (0.00 sec)

          marostegui Manuel Arostegui added a comment - Note that we do have parallel replication disabled: MariaDB SANITARIUM localhost (none) > show global variables like 'slave_parallel_mode'; Empty set (0.00 sec)
          yongmei2004 Xie Yongmei added a comment - - edited

          Hi, I am trying to explain my understanding of this issue.
          This Xie Yongmei, from alibaba rds team.

          The root cause of this issue might be the way to signal rangelock's waiting list.

          The current solution of rangelock is:
          1) each write transaction should acquire rangelock before modification in index tree (actually FT in tokudb) to prevent concurrent read/write operation on the index row.
          2) read-only query should acquire rangelock at the callback of cursor get for snapshot read.

          3) the process of acquire rangelock (in toku_db_get_range_lock):
          I. call toku_db_start_range_lock to get rangelock (in fact, it's trylock semantics)

          • if conflict, notifies locktree to track it in pending list.

          II. if grant or deadlock, toku_db_start_range_lock just returns.
          III.if conflict, toku_db_get_range_lock will call toku_db_wait_range_lock: let itself be waiting on the condition variable in its own context.

          4) the process of release rangelock () when transaction commit or abort:
          I. release the rangelock it held
          II. retry all the rangelocks waiting on the same locktree

          • signal the condition variable in the request lock context, if succ

          The following scenario could happen:
          t1: txn1 call toku_db_start_range_lock, but found conflict, it told locktree's pending list to track it.
          t2: txn2 commit, it release the rangelock which txn1 was waiting for and retry to get rangelock for txn1 and signal txn1 to execute. because the rangelock has been tracked in locktree pending list on time t1.
          t3: txn1 call toku_db_wait_range_lock to sleep on it's own condition variable. but unfortunately it miss the signal, it won't wakeup until timeout occurs.

          the above example shows: there's no rangelock conflict, but transaction txn1 was waiting for a long time.

          The imlementation for tokudb rangelock is rare:
          It uses centralized waiting list (locktree's pending list) and centralized mutex; But, for each rangelock request, it has its own condition variable which is defined in its context and sleeps on its own condition variable.

          So, the wakeup process is tricky: the transaction releasing the rangelock is responsible for acquiring rangelock for blocking transaction and signal it to execute.

          The rough workaround is shown as below:
          Before sleep, it should verify whether there's still rangelock conflict with m_info->mutex held.
          If conflict dispears, remove it from locktree's pending list and return grant; otherwise sleep on its cv.

          yongmei2004 Xie Yongmei added a comment - - edited Hi, I am trying to explain my understanding of this issue. This Xie Yongmei, from alibaba rds team. The root cause of this issue might be the way to signal rangelock's waiting list. The current solution of rangelock is: 1) each write transaction should acquire rangelock before modification in index tree (actually FT in tokudb) to prevent concurrent read/write operation on the index row. 2) read-only query should acquire rangelock at the callback of cursor get for snapshot read. 3) the process of acquire rangelock (in toku_db_get_range_lock): I. call toku_db_start_range_lock to get rangelock (in fact, it's trylock semantics) if conflict, notifies locktree to track it in pending list. II. if grant or deadlock, toku_db_start_range_lock just returns. III.if conflict, toku_db_get_range_lock will call toku_db_wait_range_lock: let itself be waiting on the condition variable in its own context. 4) the process of release rangelock () when transaction commit or abort: I. release the rangelock it held II. retry all the rangelocks waiting on the same locktree signal the condition variable in the request lock context, if succ The following scenario could happen: t1: txn1 call toku_db_start_range_lock, but found conflict, it told locktree's pending list to track it. t2: txn2 commit, it release the rangelock which txn1 was waiting for and retry to get rangelock for txn1 and signal txn1 to execute. because the rangelock has been tracked in locktree pending list on time t1. t3: txn1 call toku_db_wait_range_lock to sleep on it's own condition variable. but unfortunately it miss the signal, it won't wakeup until timeout occurs. the above example shows: there's no rangelock conflict, but transaction txn1 was waiting for a long time. The imlementation for tokudb rangelock is rare: It uses centralized waiting list (locktree's pending list) and centralized mutex; But, for each rangelock request, it has its own condition variable which is defined in its context and sleeps on its own condition variable. So, the wakeup process is tricky: the transaction releasing the rangelock is responsible for acquiring rangelock for blocking transaction and signal it to execute. The rough workaround is shown as below: Before sleep, it should verify whether there's still rangelock conflict with m_info->mutex held. If conflict dispears, remove it from locktree's pending list and return grant; otherwise sleep on its cv.

          This is an update from Percona at: https://bugs.launchpad.net/bugs/1621852

          So far we believe this may be related to lock tree stalls. MariaDB has
          imported a patch a while ago to address this and we have been reviewing
          and improving the patch for Percona Server. This work is being tracked
          here https://jira.percona.com/browse/TDB-3, please login there to
          Percona JIRA and 'watch' for future updates. Marking it as opinion here
          as there is no other accurate matching state.
          

          marostegui Manuel Arostegui added a comment - This is an update from Percona at: https://bugs.launchpad.net/bugs/1621852 So far we believe this may be related to lock tree stalls. MariaDB has imported a patch a while ago to address this and we have been reviewing and improving the patch for Percona Server. This work is being tracked here https://jira.percona.com/browse/TDB-3, please login there to Percona JIRA and 'watch' for future updates. Marking it as opinion here as there is no other accurate matching state.
          danblack Daniel Black added a comment - - edited

          TDB-3 is fixed. Merged into MariaDB as https://github.com/MariaDB/server/commit/d145d1b6

          Closing might have been overly keen. Please check but I did follow the patches to the above commit.

          danblack Daniel Black added a comment - - edited TDB-3 is fixed. Merged into MariaDB as https://github.com/MariaDB/server/commit/d145d1b6 Closing might have been overly keen. Please check but I did follow the patches to the above commit.

          People

            plinux Lixun Peng
            marostegui Manuel Arostegui
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.