[MDEV-19749] MDL scalability regression after backup locks - Jira

Details

Type: Bug
Status: In Review (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
Fix Version/s: 10.11
Component/s: Locking
Labels:
- performance
- regression-10.4

Description

In 7a9dfdd GLOBAL and COMMIT namespaces were combined into BACKUP, which doubled load on BACKUP lock mutex.

Can be fixed by implementing something similar to MySQL WL#7306 "Improve MDL performance and scalability by implementing lock-free lock acquisition for DML".

Attachments

Issue Links

is caused by

MDEV-5336 Implement BACKUP STAGE for safe external backups

Closed

relates to

MDEV-14992 BACKUP: in-server backup

Open

Activity

Ascending order - Click to sort in descending order

Sergey Vojtovich created issue - 2019-06-13 14:30

Sergey Vojtovich made changes - 2019-06-17 19:24

Field	Original Value	New Value
Summary	Scalability regression after backup locks	MDL scalability regression after backup locks

Julien Fritsch made changes - 2020-05-29 09:46

Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]

Julien Fritsch made changes - 2020-05-29 10:10

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]

Sergei Golubchik made changes - 2020-08-16 12:28

Assignee

Sergey Vojtovich [ svoj ]

Marko Mäkelä made changes - 2021-02-11 09:36

Link

This issue is caused by ~~MDEV-5336~~ [ ~~MDEV-5336~~ ]

Marko Mäkelä added a comment - 2021-07-21 05:35

The following patch has been used during some performance testing to remove the scalability bottleneck due to backup locks, so that we can better highlight scalability bottlenecks inside storage engines.

diff --git a/sql/handler.cc b/sql/handler.cc

index eb7b5b8..012ef20 100644

--- a/sql/handler.cc

+++ b/sql/handler.cc

@@ -1567,7 +1567,7 @@ int ha_commit_trans(THD *thd, bool all)

   DBUG_PRINT("info", ("is_real_trans: %d  rw_trans:  %d  rw_ha_count: %d",

                       is_real_trans, rw_trans, rw_ha_count));

-  if (rw_trans)

+  if (0 && rw_trans)

/*

       Acquire a metadata lock which will ensure that COMMIT is blocked

diff --git a/sql/sql_base.cc b/sql/sql_base.cc

index c41e08e..f9e3f34 100644

--- a/sql/sql_base.cc

+++ b/sql/sql_base.cc

@@ -2100,7 +2100,7 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx)

   if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) &&

-      table->s->table_category < TABLE_CATEGORY_INFORMATION)

+      table->s->table_category < TABLE_CATEGORY_INFORMATION && 0)

/*

       We are not under LOCK TABLES and going to acquire write-lock/

Marko Mäkelä added a comment - 2021-07-21 05:35 The following patch has been used during some performance testing to remove the scalability bottleneck due to backup locks, so that we can better highlight scalability bottlenecks inside storage engines. diff --git a/sql/handler.cc b/sql/handler.cc index eb7b5b8..012ef20 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ -1567,7 +1567,7 @@ int ha_commit_trans(THD *thd, bool all) DBUG_PRINT("info", ("is_real_trans: %d rw_trans: %d rw_ha_count: %d", is_real_trans, rw_trans, rw_ha_count)); - if (rw_trans) + if (0 && rw_trans) { /* Acquire a metadata lock which will ensure that COMMIT is blocked diff --git a/sql/sql_base.cc b/sql/sql_base.cc index c41e08e..f9e3f34 100644 --- a/sql/sql_base.cc +++ b/sql/sql_base.cc @@ -2100,7 +2100,7 @@ bool open_table(THD *thd, TABLE_LIST *table_list, Open_table_context *ot_ctx) } if (!(flags & MYSQL_OPEN_HAS_MDL_LOCK) && - table->s->table_category < TABLE_CATEGORY_INFORMATION) + table->s->table_category < TABLE_CATEGORY_INFORMATION && 0) { /* We are not under LOCK TABLES and going to acquire write-lock/

Sergei Golubchik made changes - 2021-12-06 21:33

Workflow

MariaDB v3 [ 97498 ]

MariaDB v4 [ 141332 ]

Marko Mäkelä made changes - 2022-06-27 08:27

Affects Version/s		10.5 [ 23123 ]
Affects Version/s		10.6 [ 24028 ]
Affects Version/s		10.7 [ 24805 ]
Affects Version/s		10.8 [ 26121 ]
Affects Version/s		10.9 [ 26905 ]
Affects Version/s		10.10 [ 27530 ]
Labels		performance regression-10.4

Marko Mäkelä added a comment - 2023-12-12 14:15

Possibly, this locking can be removed as part of implementing backup in the server process (MDEV-14992).

Marko Mäkelä added a comment - 2023-12-12 14:15 Possibly, this locking can be removed as part of implementing backup in the server process ( MDEV-14992 ).

Marko Mäkelä made changes - 2023-12-12 14:15

Link

This issue relates to MDEV-14992 [ MDEV-14992 ]

Michael Widenius added a comment - 2024-12-17 07:27 - edited

I do not think it is possible to remove the backup lock in ha_commit_trans.
This lock is essential to ensure that we can get a consistent backup, in any scenario (as far as I know).
As long we support external backup tools, we need to have it, even if we have in server backup.
It is also needed the lock to be able to create a consistent snapshot of the server.

The lock is also critically needed for binlogs.

It would be nice to be able to 'know in advance' that a backup will take place so that we could ignore all backup locks when there will not be any lock for existing transactions.
What could be possible be done is to have a flag that disables all the backup locks. When we do BACKUP STAGE START, we would then enable the lock and wait for all transactions that was started outside of the lock to complete before continue.

Michael Widenius added a comment - 2024-12-17 07:27 - edited I do not think it is possible to remove the backup lock in ha_commit_trans. This lock is essential to ensure that we can get a consistent backup, in any scenario (as far as I know). As long we support external backup tools, we need to have it, even if we have in server backup. It is also needed the lock to be able to create a consistent snapshot of the server. The lock is also critically needed for binlogs. It would be nice to be able to 'know in advance' that a backup will take place so that we could ignore all backup locks when there will not be any lock for existing transactions. What could be possible be done is to have a flag that disables all the backup locks. When we do BACKUP STAGE START, we would then enable the lock and wait for all transactions that was started outside of the lock to complete before continue.

Marko Mäkelä made changes - 2025-04-03 08:59

Sprint		Server 12.1 dev sprint [ 793 ]
Assignee		Marko Mäkelä [ marko ]

Michael Widenius added a comment - 2025-04-03 12:04 - edited

Disabling backup locks at rw_trans would make backups, both internal and externa impossible.
This required to get a constant backup point for all engines and replication.
It would also disable flush tables with read lock.

One possible solution is to not do any backup locks before 'backup start' and have 'backup start' inform all threads that a backup is about to start.
The there would be a wait until all current transactions has ended and all transactions are 'backup aware' and have started to take backup locks.

Rough estimate for getting this done is about one week.

Another option is to see if we can speed up the backup lock in ha_commit_trans() so that if there are no conflicting locks there would never be a thread context switch.
Maybe this could be done faster with two atomic increments when backup is not running

Michael Widenius added a comment - 2025-04-03 12:04 - edited Disabling backup locks at rw_trans would make backups, both internal and externa impossible. This required to get a constant backup point for all engines and replication. It would also disable flush tables with read lock. One possible solution is to not do any backup locks before 'backup start' and have 'backup start' inform all threads that a backup is about to start. The there would be a wait until all current transactions has ended and all transactions are 'backup aware' and have started to take backup locks. Rough estimate for getting this done is about one week. Another option is to see if we can speed up the backup lock in ha_commit_trans() so that if there are no conflicting locks there would never be a thread context switch. Maybe this could be done faster with two atomic increments when backup is not running

Marko Mäkelä added a comment - 6 days ago

I’m not familiar with this code, so it could take me significantly more than a week to understand the logic deep enough in order to fix this.

I have a rough idea how we could minimize atomic writes or read-modify-writes of a shared memory location in the fast path. It would go something like this:

The BACKUP statement sets a global flag or switches function pointers to note that a critical phase of backup is about to start.
As a result of this, ha_commit_trans() and open_table() will start to register the THD as "backup aware" and acquire the locks as they would if the work-around patch from 2021-07-21 were not present.
The BACKUP statement will poll until all active THD objects are registered "backup aware".
Once the backup lock is released, the global atomic flag will be reset. As a result, open_table() and ha_commit_trans() will resume fast operation.

Marko Mäkelä added a comment - 6 days ago I’m not familiar with this code, so it could take me significantly more than a week to understand the logic deep enough in order to fix this. I have a rough idea how we could minimize atomic writes or read-modify-writes of a shared memory location in the fast path. It would go something like this: The BACKUP statement sets a global flag or switches function pointers to note that a critical phase of backup is about to start. As a result of this, ha_commit_trans() and open_table() will start to register the THD as "backup aware" and acquire the locks as they would if the work-around patch from 2021-07-21 were not present. The BACKUP statement will poll until all active THD objects are registered "backup aware". Once the backup lock is released, the global atomic flag will be reset. As a result, open_table() and ha_commit_trans() will resume fast operation.

Michael Widenius added a comment - 4 days ago

I am working on the following solution (which is very much like Marko's suggestion) to remove some of the issues with BACKUP STAGE BLOCK_COMMIT;

Adding before MDL_REQUEST_INIT() in ha_commit_trans()

If (!(thd->backup_state= backup_is_running))

{ my_atomic_add32(&thd_in_commit, 1, MY_MEMORY_ORDER_RELAXED)) }

else

{ MDL_REQUEST_INIT(...) }

and when releasing:

if (mdl_backup.ticket)

{ .... }

else if (thd->backup_state)

{ my_atomic_add32(&thd_in_commit, -1, MY_MEMORY_ORDER_RELAXED)) thd->backup_state= 0; }

If we have the above, we could in BACKUP_STAGE_START do something like

backup_running= 1;
do

{ while (thd_in_commit) sleep(1); }

while (some_thd_has_backup_state_set())

The effect would be to replace the MDL_backup lock in ha_commit_trans() with two atomic increment and two thread-local memory assignments when backup is not running. The cost is a slightly slower start of the backup.

Michael Widenius added a comment - 4 days ago I am working on the following solution (which is very much like Marko's suggestion) to remove some of the issues with BACKUP STAGE BLOCK_COMMIT; Adding before MDL_REQUEST_INIT() in ha_commit_trans() If (!(thd->backup_state= backup_is_running)) { my_atomic_add32(&thd_in_commit, 1, MY_MEMORY_ORDER_RELAXED)) } else { MDL_REQUEST_INIT(...) } and when releasing: if (mdl_backup.ticket) { .... } else if (thd->backup_state) { my_atomic_add32(&thd_in_commit, -1, MY_MEMORY_ORDER_RELAXED)) thd->backup_state= 0; } If we have the above, we could in BACKUP_STAGE_START do something like backup_running= 1; do { while (thd_in_commit) sleep(1); } while (some_thd_has_backup_state_set()) The effect would be to replace the MDL_backup lock in ha_commit_trans() with two atomic increment and two thread-local memory assignments when backup is not running. The cost is a slightly slower start of the backup.

Michael Widenius made changes - 4 days ago

Assignee

Marko Mäkelä [ marko ]

Michael Widenius [ monty ]

Michael Widenius made changes - 4 days ago

Status

Open [ 1 ]

In Progress [ 3 ]

Marko Mäkelä added a comment - 4 days ago

monty, my idea was that the "fast" code path (no backup is running) would not run any atomic read-modify-write operations. This idea would work if the logic that handles the BACKUP statement could somehow count all active THD objects and wait until the atomic counter has reached that value, instead of the counter being 0.

Marko Mäkelä added a comment - 4 days ago monty , my idea was that the "fast" code path (no backup is running) would not run any atomic read-modify-write operations. This idea would work if the logic that handles the BACKUP statement could somehow count all active THD objects and wait until the atomic counter has reached that value, instead of the counter being 0.

Michael Widenius made changes - 4 days ago

Fix Version/s		10.11 [ 27614 ]
Fix Version/s	10.6 [ 24028 ]

Sergei Golubchik made changes - 4 days ago

Sprint

Server 12.1 dev sprint [ 793 ]

Michael Widenius made changes - 3 days ago

Status

In Progress [ 3 ]

In Testing [ 10301 ]

Michael Widenius added a comment - 3 days ago

Pushed to bb-10.11-monty for testing

Michael Widenius added a comment - 3 days ago Pushed to bb-10.11-monty for testing

Marko Mäkelä added a comment - 2 days ago

monty, I revised the logic a little and created https://github.com/MariaDB/server/pull/3966 for this, currently as a draft. It seems to me that the global atomic counter may be redundant. I am still reviewing the logic for potential race conditions.

Marko Mäkelä added a comment - 2 days ago monty , I revised the logic a little and created https://github.com/MariaDB/server/pull/3966 for this, currently as a draft. It seems to me that the global atomic counter may be redundant. I am still reviewing the logic for potential race conditions.

Marko Mäkelä made changes - 2 days ago

Assignee

Michael Widenius [ monty ]

Marko Mäkelä [ marko ]

Marko Mäkelä made changes - 2 days ago

Status

In Testing [ 10301 ]

Stalled [ 10000 ]

Marko Mäkelä made changes - 2 days ago

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Marko Mäkelä made changes - 2 days ago

Assignee	Marko Mäkelä [ marko ]	Michael Widenius [ monty ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

People

Assignee:: Michael Widenius

Reporter:: Sergey Vojtovich

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2019-06-13 14:30

Updated:: 3 days ago 11:47

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration