[MDEV-19749] MDL scalability regression after backup locks - Jira

Details

Type: Bug
Status: In Testing (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
Fix Version/s: 10.11
Component/s: Locking
Labels:
- performance
- regression-10.4

Description

In 7a9dfdd GLOBAL and COMMIT namespaces were combined into BACKUP, which doubled load on BACKUP lock mutex.

Can be fixed by implementing something similar to MySQL WL#7306 "Improve MDL performance and scalability by implementing lock-free lock acquisition for DML".

Attachments

Issue Links

is caused by

MDEV-5336 Implement BACKUP STAGE for safe external backups

Closed

relates to

MDEV-14992 BACKUP: in-server backup

Open

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2021-07-21 05:35

The following patch has been used during some performance testing to remove the scalability bottleneck due to backup locks, so that we can better highlight scalability bottlenecks inside storage engines.

diff --git a/sql/handler.cc b/sql/handler.cc

index eb7b5b8..012ef20 100644

--- a/sql/handler.cc

+++ b/sql/handler.cc

@@ -1567,7 +1567,7 @@ int ha_commit_trans(THD *thd, bool all)

   DBUG_PRINT("info", ("is_real_trans: %d  rw_trans:  %d  rw_ha_count: %d",

                       is_real_trans, rw_trans, rw_ha_count));

-  if (rw_trans)

+  if (0 && rw_trans)

/*

       Acquire a metadata lock which will ensure that COMMIT is blocked

diff --git a/sql/sql_base.cc b/sql/sql_base.cc

index c41e08e..f9e3f34 100644

--- a/sql/sql_base.cc

+++ b/sql/sql_base.cc

@@ -2100,7 +2100,7 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)

   if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) &&

-      table->s->table_category < TABLE_CATEGORY_INFORMATION)

+      table->s->table_category < TABLE_CATEGORY_INFORMATION && 0)

/*

       We are not under LOCK TABLES and going to acquire write-lock/

Marko Mäkelä added a comment - 2021-07-21 05:35 The following patch has been used during some performance testing to remove the scalability bottleneck due to backup locks, so that we can better highlight scalability bottlenecks inside storage engines. diff --git a/sql/handler.cc b/sql/handler.cc index eb7b5b8..012ef20 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ -1567,7 +1567,7 @@ int ha_commit_trans(THD *thd, bool all) DBUG_PRINT("info", ("is_real_trans: %d rw_trans: %d rw_ha_count: %d", is_real_trans, rw_trans, rw_ha_count)); - if (rw_trans) + if (0 && rw_trans) { /* Acquire a metadata lock which will ensure that COMMIT is blocked diff --git a/sql/sql_base.cc b/sql/sql_base.cc index c41e08e..f9e3f34 100644 --- a/sql/sql_base.cc +++ b/sql/sql_base.cc @@ -2100,7 +2100,7 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx) } if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) && - table->s->table_category < TABLE_CATEGORY_INFORMATION) + table->s->table_category < TABLE_CATEGORY_INFORMATION && 0) { /* We are not under LOCK TABLES and going to acquire write-lock/

Marko Mäkelä added a comment - 2023-12-12 14:15

Possibly, this locking can be removed as part of implementing backup in the server process (MDEV-14992).

Marko Mäkelä added a comment - 2023-12-12 14:15 Possibly, this locking can be removed as part of implementing backup in the server process ( MDEV-14992 ).

Michael Widenius added a comment - 2024-12-17 07:27 - edited

I do not think it is possible to remove the backup lock in ha_commit_trans.
This lock is essential to ensure that we can get a consistent backup, in any scenario (as far as I know).
As long we support external backup tools, we need to have it, even if we have in server backup.
It is also needed the lock to be able to create a consistent snapshot of the server.

The lock is also critically needed for binlogs.

It would be nice to be able to 'know in advance' that a backup will take place so that we could ignore all backup locks when there will not be any lock for existing transactions.
What could be possible be done is to have a flag that disables all the backup locks. When we do BACKUP STAGE START, we would then enable the lock and wait for all transactions that was started outside of the lock to complete before continue.

Michael Widenius added a comment - 2024-12-17 07:27 - edited I do not think it is possible to remove the backup lock in ha_commit_trans. This lock is essential to ensure that we can get a consistent backup, in any scenario (as far as I know). As long we support external backup tools, we need to have it, even if we have in server backup. It is also needed the lock to be able to create a consistent snapshot of the server. The lock is also critically needed for binlogs. It would be nice to be able to 'know in advance' that a backup will take place so that we could ignore all backup locks when there will not be any lock for existing transactions. What could be possible be done is to have a flag that disables all the backup locks. When we do BACKUP STAGE START, we would then enable the lock and wait for all transactions that was started outside of the lock to complete before continue.

Michael Widenius added a comment - 2025-04-03 12:04 - edited

Disabling backup locks at rw_trans would make backups, both internal and externa impossible.
This required to get a constant backup point for all engines and replication.
It would also disable flush tables with read lock.

One possible solution is to not do any backup locks before 'backup start' and have 'backup start' inform all threads that a backup is about to start.
The there would be a wait until all current transactions has ended and all transactions are 'backup aware' and have started to take backup locks.

Rough estimate for getting this done is about one week.

Another option is to see if we can speed up the backup lock in ha_commit_trans() so that if there are no conflicting locks there would never be a thread context switch.
Maybe this could be done faster with two atomic increments when backup is not running

Michael Widenius added a comment - 2025-04-03 12:04 - edited Disabling backup locks at rw_trans would make backups, both internal and externa impossible. This required to get a constant backup point for all engines and replication. It would also disable flush tables with read lock. One possible solution is to not do any backup locks before 'backup start' and have 'backup start' inform all threads that a backup is about to start. The there would be a wait until all current transactions has ended and all transactions are 'backup aware' and have started to take backup locks. Rough estimate for getting this done is about one week. Another option is to see if we can speed up the backup lock in ha_commit_trans() so that if there are no conflicting locks there would never be a thread context switch. Maybe this could be done faster with two atomic increments when backup is not running

Marko Mäkelä added a comment - 2025-04-07 14:03

I’m not familiar with this code, so it could take me significantly more than a week to understand the logic deep enough in order to fix this.

I have a rough idea how we could minimize atomic writes or read-modify-writes of a shared memory location in the fast path. It would go something like this:

The BACKUP statement sets a global flag or switches function pointers to note that a critical phase of backup is about to start.
As a result of this, ha_commit_trans() and open_table() will start to register the THD as "backup aware" and acquire the locks as they would if the work-around patch from 2021-07-21 were not present.
The BACKUP statement will poll until all active THD objects are registered "backup aware".
Once the backup lock is released, the global atomic flag will be reset. As a result, open_table() and ha_commit_trans() will resume fast operation.

Marko Mäkelä added a comment - 2025-04-07 14:03 I’m not familiar with this code, so it could take me significantly more than a week to understand the logic deep enough in order to fix this. I have a rough idea how we could minimize atomic writes or read-modify-writes of a shared memory location in the fast path. It would go something like this: The BACKUP statement sets a global flag or switches function pointers to note that a critical phase of backup is about to start. As a result of this, ha_commit_trans() and open_table() will start to register the THD as "backup aware" and acquire the locks as they would if the work-around patch from 2021-07-21 were not present. The BACKUP statement will poll until all active THD objects are registered "backup aware". Once the backup lock is released, the global atomic flag will be reset. As a result, open_table() and ha_commit_trans() will resume fast operation.

Michael Widenius added a comment - 2025-04-09 13:52

I am working on the following solution (which is very much like Marko's suggestion) to remove some of the issues with BACKUP STAGE BLOCK_COMMIT;

Adding before MDL_REQUEST_INIT() in ha_commit_trans()

If (!(thd->backup_state= backup_is_running))

{ my_atomic_add32(&thd_in_commit, 1, MY_MEMORY_ORDER_RELAXED)) }

else

{ MDL_REQUEST_INIT(...) }

and when releasing:

if (mdl_backup.ticket)

{ .... }

else if (thd->backup_state)

{ my_atomic_add32(&thd_in_commit, -1, MY_MEMORY_ORDER_RELAXED)) thd->backup_state= 0; }

If we have the above, we could in BACKUP_STAGE_START do something like

backup_running= 1;
do

{ while (thd_in_commit) sleep(1); }

while (some_thd_has_backup_state_set())

The effect would be to replace the MDL_backup lock in ha_commit_trans() with two atomic increment and two thread-local memory assignments when backup is not running. The cost is a slightly slower start of the backup.

Michael Widenius added a comment - 2025-04-09 13:52 I am working on the following solution (which is very much like Marko's suggestion) to remove some of the issues with BACKUP STAGE BLOCK_COMMIT; Adding before MDL_REQUEST_INIT() in ha_commit_trans() If (!(thd->backup_state= backup_is_running)) { my_atomic_add32(&thd_in_commit, 1, MY_MEMORY_ORDER_RELAXED)) } else { MDL_REQUEST_INIT(...) } and when releasing: if (mdl_backup.ticket) { .... } else if (thd->backup_state) { my_atomic_add32(&thd_in_commit, -1, MY_MEMORY_ORDER_RELAXED)) thd->backup_state= 0; } If we have the above, we could in BACKUP_STAGE_START do something like backup_running= 1; do { while (thd_in_commit) sleep(1); } while (some_thd_has_backup_state_set()) The effect would be to replace the MDL_backup lock in ha_commit_trans() with two atomic increment and two thread-local memory assignments when backup is not running. The cost is a slightly slower start of the backup.

Marko Mäkelä added a comment - 2025-04-09 14:05

monty, my idea was that the "fast" code path (no backup is running) would not run any atomic read-modify-write operations. This idea would work if the logic that handles the BACKUP statement could somehow count all active THD objects and wait until the atomic counter has reached that value, instead of the counter being 0.

Marko Mäkelä added a comment - 2025-04-09 14:05 monty , my idea was that the "fast" code path (no backup is running) would not run any atomic read-modify-write operations. This idea would work if the logic that handles the BACKUP statement could somehow count all active THD objects and wait until the atomic counter has reached that value, instead of the counter being 0.

Michael Widenius added a comment - 2025-04-10 16:00

Pushed to bb-10.11-monty for testing

Michael Widenius added a comment - 2025-04-10 16:00 Pushed to bb-10.11-monty for testing

Marko Mäkelä added a comment - 2025-04-11 07:07

monty, I revised the logic a little and created https://github.com/MariaDB/server/pull/3966 for this, currently as a draft. It seems to me that the global atomic counter may be redundant. I am still reviewing the logic for potential race conditions.

Marko Mäkelä added a comment - 2025-04-11 07:07 monty , I revised the logic a little and created https://github.com/MariaDB/server/pull/3966 for this, currently as a draft. It seems to me that the global atomic counter may be redundant. I am still reviewing the logic for potential race conditions.

Marko Mäkelä added a comment - 2025-04-14 12:59

As far as I understand from the preliminary performance test results, monty replaced the MDL_key::BACKUP acquisition in ha_commit_trans() but not in open_tables(), where the relevant section is starting as follows:

    enum enum_mdl_type mdl_type= MDL_BACKUP_DML;

    if (table->s->table_category != TABLE_CATEGORY_USER)

      mdl_type= MDL_BACKUP_SYS_DML;

    else if (table->s->online_backup)

      mdl_type= MDL_BACKUP_TRANS_DML;

    if (table_list->mdl_request.is_write_lock_request() &&

        ! (flags & (MYSQL_OPEN_IGNORE_GLOBAL_READ_LOCK |

                    MYSQL_OPEN_FORCE_SHARED_MDL |

                    MYSQL_OPEN_FORCE_SHARED_HIGH_PRIO_MDL |

                    MYSQL_OPEN_SKIP_SCOPED_MDL_LOCK)) &&

        ! ot_ctx->has_protection_against_grl(mdl_type))

      MDL_request protection_request;

      MDL_deadlock_handler mdl_deadlock_handler(ot_ctx);

      if (thd->has_read_only_protection())

        MYSQL_UNBIND_TABLE(table->file);

        table->vcol_cleanup_expr(thd);

        tc_release_table(table);

        DBUG_RETURN(TRUE);

      MDL_REQUEST_INIT(&protection_request, MDL_key::BACKUP, "", "", mdl_type,

                       MDL_STATEMENT);

It seems to me that it could be useful to check flags and possibly the presence of any pending BACKUP STAGE or FLUSH TABLES WITH READ LOCK request before assigning mdl_type or executing the rest of the above logic.

Marko Mäkelä added a comment - 2025-04-14 12:59 As far as I understand from the preliminary performance test results, monty replaced the MDL_key::BACKUP acquisition in ha_commit_trans() but not in open_tables() , where the relevant section is starting as follows: enum enum_mdl_type mdl_type= MDL_BACKUP_DML; if (table->s->table_category != TABLE_CATEGORY_USER) mdl_type= MDL_BACKUP_SYS_DML; else if (table->s->online_backup) mdl_type= MDL_BACKUP_TRANS_DML; if (table_list->mdl_request.is_write_lock_request() && ! (flags & (MYSQL_OPEN_IGNORE_GLOBAL_READ_LOCK | MYSQL_OPEN_FORCE_SHARED_MDL | MYSQL_OPEN_FORCE_SHARED_HIGH_PRIO_MDL | MYSQL_OPEN_SKIP_SCOPED_MDL_LOCK)) && ! ot_ctx->has_protection_against_grl(mdl_type)) { MDL_request protection_request; MDL_deadlock_handler mdl_deadlock_handler(ot_ctx); if (thd->has_read_only_protection()) { MYSQL_UNBIND_TABLE(table->file); table->vcol_cleanup_expr(thd); tc_release_table(table); DBUG_RETURN(TRUE); } MDL_REQUEST_INIT(&protection_request, MDL_key::BACKUP, "" , "" , mdl_type, MDL_STATEMENT); It seems to me that it could be useful to check flags and possibly the presence of any pending BACKUP STAGE or FLUSH TABLES WITH READ LOCK request before assigning mdl_type or executing the rest of the above logic.

Michael Widenius added a comment - 6 days ago

Eugine, please do not use std::array() as this allocates memory that is not tracked.
Instead use:
uint32 m_type_counter[MDL_BACKUP_END];

We should also use uint32 instead of uint32_t in MariaDB code.

Michael Widenius added a comment - 6 days ago Eugine, please do not use std::array() as this allocates memory that is not tracked. Instead use: uint32 m_type_counter [MDL_BACKUP_END] ; We should also use uint32 instead of uint32_t in MariaDB code.

Marko Mäkelä added a comment - 6 days ago

To my understanding, std::array is a fixed-size container similar to the basic array data type in C or C++. The following existing declaration in sql/mdl.cc that had been added in an earlier attempt to do something about this performance regression

std::array<uint32_t, MDL_BACKUP_END> m_type_counters;

should be equivalent to the ‘traditional’ array that you are proposing. As far as I understand, a benefit of std::array over ‘traditional’ arrays might be the compatibility with all of #include <algorithm>. Any allocation would be managed outside the std::array. Unlike std::vector, a std::array cannot be grown afterwards, and it does not keep track of the number of actually used elements.

Marko Mäkelä added a comment - 6 days ago To my understanding, std::array is a fixed-size container similar to the basic array data type in C or C++. The following existing declaration in sql/mdl.cc that had been added in an earlier attempt to do something about this performance regression std::array<uint32_t, MDL_BACKUP_END> m_type_counters; should be equivalent to the ‘traditional’ array that you are proposing. As far as I understand, a benefit of std::array over ‘traditional’ arrays might be the compatibility with all of #include <algorithm> . Any allocation would be managed outside the std::array . Unlike std::vector , a std::array cannot be grown afterwards, and it does not keep track of the number of actually used elements.

Marko Mäkelä added a comment - 6 days ago

The numeric value of MDL_BACKUP_END is 14. I was thinking if the array m_type_counters and the m_bitmap could be combined somehow. As far as I understand, the m_bitmap is a ‘lookup table’ of m_type_counters, which in turn is a ‘lookup table’ of m_list. We do seem to need at least uint16_t for each element of m_type_counters, because it is imaginable to have 2¹⁶ concurrent requests (from concurrently executing threads) on a particular metadata lock.

I was thinking how hard it would be to remove m_bitmap. This would involve replacing hog_lock_types_bitmap(), incompatible_waiting_types_bitmap(), incompatible_granted_types_bitmap() and refactoring their callers MDL_lock::reschedule_waiters() and MDL_lock::can_grant_lock(). The idea would be to consult m_type_counters directly. This looks rather complex. We might upgrade our build requirements from generic x86-64 to x86-64-v2 (which includes SSE4.2) or create target specific SIMD optimizations, such making use of the SSE4.1 pmaxuw instruction to determine if any pending requests of the lock types of interest exist. This would require the replacement of m_type_counters to store uint16_t instead of uint32_t, and the elements would have to be ordered in a particular way, because only AVX-512 introduced some mask registers.

Marko Mäkelä added a comment - 6 days ago The numeric value of MDL_BACKUP_END is 14. I was thinking if the array m_type_counters and the m_bitmap could be combined somehow. As far as I understand, the m_bitmap is a ‘lookup table’ of m_type_counters , which in turn is a ‘lookup table’ of m_list . We do seem to need at least uint16_t for each element of m_type_counters , because it is imaginable to have 2¹⁶ concurrent requests (from concurrently executing threads) on a particular metadata lock. I was thinking how hard it would be to remove m_bitmap . This would involve replacing hog_lock_types_bitmap() , incompatible_waiting_types_bitmap() , incompatible_granted_types_bitmap() and refactoring their callers MDL_lock::reschedule_waiters() and MDL_lock::can_grant_lock() . The idea would be to consult m_type_counters directly. This looks rather complex. We might upgrade our build requirements from generic x86-64 to x86-64-v2 (which includes SSE4.2) or create target specific SIMD optimizations, such making use of the SSE4.1 pmaxuw instruction to determine if any pending requests of the lock types of interest exist. This would require the replacement of m_type_counters to store uint16_t instead of uint32_t , and the elements would have to be ordered in a particular way, because only AVX-512 introduced some mask registers.

Sergey Vojtovich added a comment - 6 days ago

marko, what you suggest makes sense, but it is unrelated to this issue.

Sergey Vojtovich added a comment - 6 days ago marko , what you suggest makes sense, but it is unrelated to this issue.

Marko Mäkelä added a comment - 5 days ago

svoj, right, my comments from today are only related to an earlier change that had been tagged with MDEV-19749 and that monty was apparently commenting on.

Marko Mäkelä added a comment - 5 days ago svoj , right, my comments from today are only related to an earlier change that had been tagged with MDEV-19749 and that monty was apparently commenting on.

Michael Widenius added a comment - 5 days ago

Proposed fix for improving the BACKUP BLOCK_COMMIT lock pushed to bb-10.11-monty

Michael Widenius added a comment - 5 days ago Proposed fix for improving the BACKUP BLOCK_COMMIT lock pushed to bb-10.11-monty

Michael Widenius added a comment - 5 days ago

@marko, there is no point in using std::array for something that does not need it!
We should avoid using std:: operators as these are harder to debug than what we already have and can cause more code to be generated because of usage of templates.

Michael Widenius added a comment - 5 days ago @marko, there is no point in using std::array for something that does not need it! We should avoid using std:: operators as these are harder to debug than what we already have and can cause more code to be generated because of usage of templates.

People

Assignee:: Michael Widenius

Reporter:: Sergey Vojtovich

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2019-06-13 14:30

Updated:: 6 days ago 14:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server