Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19749

MDL scalability regression after backup locks

Details

    Description

      Background

      FLUSH TABLES WITH READ LOCK (FTWRL) is a predecessor of BACKUP STAGE facility (MDEV-5336). Upon FTWRL completion connections are still able to issue DQL, but DDL and DML are blocked. In other words no connection is able to modify tables and commit active transactions.

      It was achieved by taking two MDL_lock-s: global read lock (GLOBAL X-lock) and global commit lock (COMMIT X-lock). Correspondingly, statements that intend to modify data have to take protection against these locks. GLOBAL S-lock and COMMIT S-lock were acquired for this purpose.

      These two locks were separate entities, they didn't share data structures and locking primitives. And thus they were separate contention points.

      With BACKUP STAGE introduced by 7a9dfdd, connections have to take protection against ongoing FTWRL or BACKUP STAGE. It is in many ways similar to how it used to work before with GLOBAL S-lock and COMMIT S-lock. The culprit of this tragedy is GLOBAL and COMMIT namespaces being combined into single BACKUP namespace to make code simpler. Now we have single contention point with doubled load on BACKUP lock internals. In other words system throughput is halved.

      MDL_lock internals

      For the purpose of protection against ongoing FTWRL or BACKUP STAGE, MDL_BACKUP_DML/MDL_BACKUP_TRANS_DML and MDL_BACKUP_COMMIT have to be acquired in BACKUP namespace. These locks are mutually compatible, multiple connections are allowed to hold them.

      When there is no FTWRL or BACKUP STAGE ongoing, critical section is fairly simple, roughly speaking:

      wrlock(&backup->m_rwlock);
      if (!(backup->granted_bitmap & ticket->incompatible_granted_bitmap) &&
          !(backup->waiting_bitmap & ticket->incompatible_waiting_bitmap))
      {
        backup->granted_list.add(ticket);
        backup->granted_bitmap|= ticket->type_bit;
      }
      unlock(&backup->m_rwlock);
      

      What it does in other words is: make sure there is no ongoing or pending FTWRL or BACKUP STAGE and add current connection to lock holders.

      Proposed solution

      Multi-instance MDL_lock, which gives multiple contention points. Compatible locks (like MDL_BACKUP_COMMIT) will go into their specific instance,
      whereas heavyweight locks (like COMMIT X-lock aka MDL_BACKUP_WAIT_COMMIT) will have to exposes themselves via all instances.

      MDL_BACKUP_COMMIT example:

      backup= backup_instances[connection_id % num_instances];
      wrlock(&backup->m_rwlock);
      if (!(backup->granted_bitmap & ticket->incompatible_granted_bitmap) &&
          !(backup->waiting_bitmap & ticket->incompatible_waiting_bitmap))
      {
        backup->granted_list.add(ticket);
        backup->granted_bitmap|= ticket->type_bit;
      }
      unlock(&backup->m_rwlock);
      

      MDL_BACKUP_WAIT_COMMIT example (rough example, more complex in reality):

      for (i= 0; i < instances; i++)
      {
        backup= backup_instances[i];
        wrlock(&backup->m_rwlock);
        if (!(backup->granted_bitmap & ticket->incompatible_granted_bitmap) &&
            !(backup->waiting_bitmap & ticket->incompatible_waiting_bitmap))
        {
          backup->granted_list.add(ticket);
          backup->granted_bitmap|= ticket->type_bit;
        }
        unlock(&backup->m_rwlock);
      }
      

      Alternative solutions

      Can be fixed by implementing something similar to MySQL WL#7306 "Improve MDL performance and scalability by implementing lock-free lock acquisition for DML". It adds atomic variable before critical section. Basing on that atomic variable it can skip critical section if there're no concurrent heavyweight locks.
      Cons:
      1. overcomplicated heavyweight locks handling, they have to materialise locks that were not added to granted_list for the purpose of deadlock detection
      2. although it is much faster compared to original critical section, it is still single contention point

      Complications

      • Galera code in MDL (most probably shouldn't be there)
      • Replication code in MDL (most probably shouldn't be there)
      • MDL deadlock detector

      Extra stuff that should be moved out of critical section

      • ticket->m_time
      • performance schema handling

      One can argue that these consume just 1 cpu tick and are nothing compared to the rest of the critical section. However hot-path is pretty straightforward too and here is a story how villagers lost their happy holiday...

      Once upon a time 101 bright Villa Ribo software developers were standing in a line for a bathroom. For brushing of course. They did spend 3 minutes on average to complete their things. Full round completes in 300 minutes, total wait time 15150 developer minutes.

      At the same time 101 bright Villa Baggio software developers were also standing in a line for a bathroom. In additional to 3 minutes brushing they did 1 minute shaving. Full round completes in 400 minutes, total wait time 20200 developer minutes.

      Developers are expensive and they like their pay, Villa Baggio had to pay for extra 84 hours. Happy bearded Villa Riba developers are celebrating fiesta. While Villa Baggio developers are still queueing for a bathroom, and they don't have money for fiesta anyway.

      This is the cost of adding small things to critical sections: 1 extra minute in a critical section becomes 84 hours idling for the whole system.

      Attachments

        Issue Links

          Activity

            People

              monty Michael Widenius
              svoj Sergey Vojtovich
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.