Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19749

MDL scalability regression after backup locks

Details

    Description

      In 7a9dfdd GLOBAL and COMMIT namespaces were combined into BACKUP, which doubled load on BACKUP lock mutex.

      Can be fixed by implementing something similar to MySQL WL#7306 "Improve MDL performance and scalability by implementing lock-free lock acquisition for DML".

      Attachments

        Issue Links

          Activity

            svoj Sergey Vojtovich created issue -
            svoj Sergey Vojtovich made changes -
            Field Original Value New Value
            Summary Scalability regression after backup locks MDL scalability regression after backup locks
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            serg Sergei Golubchik made changes -
            Assignee Sergey Vojtovich [ svoj ]
            marko Marko Mäkelä made changes -

            The following patch has been used during some performance testing to remove the scalability bottleneck due to backup locks, so that we can better highlight scalability bottlenecks inside storage engines.

            diff --git a/sql/handler.cc b/sql/handler.cc
            index eb7b5b8..012ef20 100644
            --- a/sql/handler.cc
            +++ b/sql/handler.cc
            @@ -1567,7 +1567,7 @@ int ha_commit_trans(THD *thd, bool all)
               DBUG_PRINT("info", ("is_real_trans: %d  rw_trans:  %d  rw_ha_count: %d",
                                   is_real_trans, rw_trans, rw_ha_count));
             
            -  if (rw_trans)
            +  if (0 && rw_trans)
               {
                 /*
                   Acquire a metadata lock which will ensure that COMMIT is blocked
            diff --git a/sql/sql_base.cc b/sql/sql_base.cc
            index c41e08e..f9e3f34 100644
            --- a/sql/sql_base.cc
            +++ b/sql/sql_base.cc
            @@ -2100,7 +2100,7 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)
               }
             
               if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) &&
            -      table->s->table_category < TABLE_CATEGORY_INFORMATION)
            +      table->s->table_category < TABLE_CATEGORY_INFORMATION && 0)
               {
                 /*
                   We are not under LOCK TABLES and going to acquire write-lock/
            
            

            marko Marko Mäkelä added a comment - The following patch has been used during some performance testing to remove the scalability bottleneck due to backup locks, so that we can better highlight scalability bottlenecks inside storage engines. diff --git a/sql/handler.cc b/sql/handler.cc index eb7b5b8..012ef20 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ -1567,7 +1567,7 @@ int ha_commit_trans(THD *thd, bool all) DBUG_PRINT("info", ("is_real_trans: %d rw_trans: %d rw_ha_count: %d", is_real_trans, rw_trans, rw_ha_count));   - if (rw_trans) + if (0 && rw_trans) { /* Acquire a metadata lock which will ensure that COMMIT is blocked diff --git a/sql/sql_base.cc b/sql/sql_base.cc index c41e08e..f9e3f34 100644 --- a/sql/sql_base.cc +++ b/sql/sql_base.cc @@ -2100,7 +2100,7 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx) }   if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) && - table->s->table_category < TABLE_CATEGORY_INFORMATION) + table->s->table_category < TABLE_CATEGORY_INFORMATION && 0) { /* We are not under LOCK TABLES and going to acquire write-lock/
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 97498 ] MariaDB v4 [ 141332 ]
            marko Marko Mäkelä made changes -
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.6 [ 24028 ]
            Affects Version/s 10.7 [ 24805 ]
            Affects Version/s 10.8 [ 26121 ]
            Affects Version/s 10.9 [ 26905 ]
            Affects Version/s 10.10 [ 27530 ]
            Labels performance regression-10.4

            Possibly, this locking can be removed as part of implementing backup in the server process (MDEV-14992).

            marko Marko Mäkelä added a comment - Possibly, this locking can be removed as part of implementing backup in the server process ( MDEV-14992 ).
            marko Marko Mäkelä made changes -
            monty Michael Widenius added a comment - - edited

            I do not think it is possible to remove the backup lock in ha_commit_trans.
            This lock is essential to ensure that we can get a consistent backup, in any scenario (as far as I know).
            As long we support external backup tools, we need to have it, even if we have in server backup.
            It is also needed the lock to be able to create a consistent snapshot of the server.

            The lock is also critically needed for binlogs.

            It would be nice to be able to 'know in advance' that a backup will take place so that we could ignore all backup locks when there will not be any lock for existing transactions.
            What could be possible be done is to have a flag that disables all the backup locks. When we do BACKUP STAGE START, we would then enable the lock and wait for all transactions that was started outside of the lock to complete before continue.

            monty Michael Widenius added a comment - - edited I do not think it is possible to remove the backup lock in ha_commit_trans. This lock is essential to ensure that we can get a consistent backup, in any scenario (as far as I know). As long we support external backup tools, we need to have it, even if we have in server backup. It is also needed the lock to be able to create a consistent snapshot of the server. The lock is also critically needed for binlogs. It would be nice to be able to 'know in advance' that a backup will take place so that we could ignore all backup locks when there will not be any lock for existing transactions. What could be possible be done is to have a flag that disables all the backup locks. When we do BACKUP STAGE START, we would then enable the lock and wait for all transactions that was started outside of the lock to complete before continue.
            marko Marko Mäkelä made changes -
            Sprint Server 12.1 dev sprint [ 793 ]
            Assignee Marko Mäkelä [ marko ]
            monty Michael Widenius added a comment - - edited

            Disabling backup locks at rw_trans would make backups, both internal and externa impossible.
            This required to get a constant backup point for all engines and replication.
            It would also disable flush tables with read lock.

            One possible solution is to not do any backup locks before 'backup start' and have 'backup start' inform all threads that a backup is about to start.
            The there would be a wait until all current transactions has ended and all transactions are 'backup aware' and have started to take backup locks.

            Rough estimate for getting this done is about one week.

            Another option is to see if we can speed up the backup lock in ha_commit_trans() so that if there are no conflicting locks there would never be a thread context switch.
            Maybe this could be done faster with two atomic increments when backup is not running

            monty Michael Widenius added a comment - - edited Disabling backup locks at rw_trans would make backups, both internal and externa impossible. This required to get a constant backup point for all engines and replication. It would also disable flush tables with read lock. One possible solution is to not do any backup locks before 'backup start' and have 'backup start' inform all threads that a backup is about to start. The there would be a wait until all current transactions has ended and all transactions are 'backup aware' and have started to take backup locks. Rough estimate for getting this done is about one week. Another option is to see if we can speed up the backup lock in ha_commit_trans() so that if there are no conflicting locks there would never be a thread context switch. Maybe this could be done faster with two atomic increments when backup is not running
            marko Marko Mäkelä added a comment -

            I’m not familiar with this code, so it could take me significantly more than a week to understand the logic deep enough in order to fix this.

            I have a rough idea how we could minimize atomic writes or read-modify-writes of a shared memory location in the fast path. It would go something like this:

            1. The BACKUP statement sets a global flag or switches function pointers to note that a critical phase of backup is about to start.
            2. As a result of this, ha_commit_trans() and open_table() will start to register the THD as "backup aware" and acquire the locks as they would if the work-around patch from 2021-07-21 were not present.
            3. The BACKUP statement will poll until all active THD objects are registered "backup aware".
            4. Once the backup lock is released, the global atomic flag will be reset. As a result, open_table() and ha_commit_trans() will resume fast operation.
            marko Marko Mäkelä added a comment - I’m not familiar with this code, so it could take me significantly more than a week to understand the logic deep enough in order to fix this. I have a rough idea how we could minimize atomic writes or read-modify-writes of a shared memory location in the fast path. It would go something like this: The BACKUP statement sets a global flag or switches function pointers to note that a critical phase of backup is about to start. As a result of this, ha_commit_trans() and open_table() will start to register the THD as "backup aware" and acquire the locks as they would if the work-around patch from 2021-07-21 were not present. The BACKUP statement will poll until all active THD objects are registered "backup aware". Once the backup lock is released, the global atomic flag will be reset. As a result, open_table() and ha_commit_trans() will resume fast operation.
            monty Michael Widenius added a comment -

            I am working on the following solution (which is very much like Marko's suggestion) to remove some of the issues with BACKUP STAGE BLOCK_COMMIT;

            Adding before MDL_REQUEST_INIT() in ha_commit_trans()

            If (!(thd->backup_state= backup_is_running))

            { my_atomic_add32(&thd_in_commit, 1, MY_MEMORY_ORDER_RELAXED)) }

            else

            { MDL_REQUEST_INIT(...) }

            and when releasing:

            if (mdl_backup.ticket)

            { .... }

            else if (thd->backup_state)

            { my_atomic_add32(&thd_in_commit, -1, MY_MEMORY_ORDER_RELAXED)) thd->backup_state= 0; }

            If we have the above, we could in BACKUP_STAGE_START do something like

            backup_running= 1;
            do

            { while (thd_in_commit) sleep(1); }

            while (some_thd_has_backup_state_set())

            The effect would be to replace the MDL_backup lock in ha_commit_trans() with two atomic increment and two thread-local memory assignments when backup is not running. The cost is a slightly slower start of the backup.

            monty Michael Widenius added a comment - I am working on the following solution (which is very much like Marko's suggestion) to remove some of the issues with BACKUP STAGE BLOCK_COMMIT; Adding before MDL_REQUEST_INIT() in ha_commit_trans() If (!(thd->backup_state= backup_is_running)) { my_atomic_add32(&thd_in_commit, 1, MY_MEMORY_ORDER_RELAXED)) } else { MDL_REQUEST_INIT(...) } and when releasing: if (mdl_backup.ticket) { .... } else if (thd->backup_state) { my_atomic_add32(&thd_in_commit, -1, MY_MEMORY_ORDER_RELAXED)) thd->backup_state= 0; } If we have the above, we could in BACKUP_STAGE_START do something like backup_running= 1; do { while (thd_in_commit) sleep(1); } while (some_thd_has_backup_state_set()) The effect would be to replace the MDL_backup lock in ha_commit_trans() with two atomic increment and two thread-local memory assignments when backup is not running. The cost is a slightly slower start of the backup.
            monty Michael Widenius made changes -
            Assignee Marko Mäkelä [ marko ] Michael Widenius [ monty ]
            monty Michael Widenius made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä added a comment -

            monty, my idea was that the "fast" code path (no backup is running) would not run any atomic read-modify-write operations. This idea would work if the logic that handles the BACKUP statement could somehow count all active THD objects and wait until the atomic counter has reached that value, instead of the counter being 0.

            marko Marko Mäkelä added a comment - monty , my idea was that the "fast" code path (no backup is running) would not run any atomic read-modify-write operations. This idea would work if the logic that handles the BACKUP statement could somehow count all active THD objects and wait until the atomic counter has reached that value, instead of the counter being 0.
            monty Michael Widenius made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 10.6 [ 24028 ]
            serg Sergei Golubchik made changes -
            Sprint Server 12.1 dev sprint [ 793 ]
            monty Michael Widenius made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]
            monty Michael Widenius added a comment -

            Pushed to bb-10.11-monty for testing

            monty Michael Widenius added a comment - Pushed to bb-10.11-monty for testing
            marko Marko Mäkelä added a comment -

            monty, I revised the logic a little and created https://github.com/MariaDB/server/pull/3966 for this, currently as a draft. It seems to me that the global atomic counter may be redundant. I am still reviewing the logic for potential race conditions.

            marko Marko Mäkelä added a comment - monty , I revised the logic a little and created https://github.com/MariaDB/server/pull/3966 for this, currently as a draft. It seems to me that the global atomic counter may be redundant. I am still reviewing the logic for potential race conditions.
            marko Marko Mäkelä made changes -
            Assignee Michael Widenius [ monty ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Michael Widenius [ monty ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            People

              monty Michael Widenius
              svoj Sergey Vojtovich
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.