Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25590

Deadlock during ongoing transaction and RSU

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.5.9
    • Fix Version/s: None
    • Component/s: wsrep
    • Labels:
      None
    • Environment:

      Description

      Test case

      Galera provider loaded, one cluster node is enough.

      session_1:
      create database test;
      use test;
      create table t1(id int primary key auto_increment, k int);
      insert into t1(k) values (1),(2),(3),(101),(102),(103);
      begin;
      update t1 set k=k+1 where id<100;

      session_2:

      use test;
      set wsrep_OSU_method=RSU;
      alter table t1 add key(k);

      session_1:

      commit;

      Result:
      Both sessions are locked.

      Expected result:
      When session_1 commits, session_2 should continue and perform ALTER TABLE.

      I investigated it a little bit and here are my findings:
      1. session_1: holds MDL lock
      2. session_2: RSU is started, which causes Galera desync and then pause. Pause causes entering into LocalOrder lock with seqno N
      3. session_2: stops on MDL lock which is held by session_1
      4. session_1: commits. it tries to replicate, however session_2 keeps LocalOrder lock

      I also see that 37deed3f37561f264f65e162146bbc2ad35fb1a2 introduced Galera 4. With Galera 3 when we called wsrep_to_isolation_begin(), regardless of TOI or RSU we set thd->wsrep_exec_mode = TOTAL_ORDER. Then when session 2 detects deadlock, abort action is done if thd->wsrep_exec_mode == TOTAL_ORDER. So we perform abort for both TOI and RSU.

      Now with Galera4 we don't have wsrep_exec_mode.
      wsrep::client_state::m_toi and wsrep::client_state::m_rsu were introduced. We set them accordingly in wsrep_to_isolation_begin(). The logic that used to check for thd->wsrep_exec_mode previously was refactored to check if wsrep_thd_is_toi(), which returns true only in case of wsrep::client_state::m_toi.
      It looks like it was some mechanical refactoring, and wsrep::client_state::m_rsu was simply omitted.

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            Kamil Holubicki Kamil Holubicki
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.