[MDEV-7818] Deadlock occurring with parallel replication and FTWRL - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.0(EOL)
Fix Version/s: 10.0.23, 10.1.9
Component/s: Replication
Labels:
- parallelslave
- verified

Sprint:
10.0.22, 10.1.9-1, 10.1.9-2

Description

Thread 5 is locked by global read lock but FTWRL never completes and stays in waiting for commit lock; replication thread stays locked forever until FTWRL is killed

MariaDB(db-02)[(none)]> show processlist\G

*************************** 1. row ***************************

      Id: 3

    User: system user

    Host:

      db: NULL

 Command: Connect

    Time: 12612

   State: Waiting for prior transaction to start commit before starting next transaction

    Info: NULL

Progress: 0.000

*************************** 2. row ***************************

      Id: 4

    User: system user

    Host:

      db: NULL

 Command: Connect

    Time: 12612

   State: Waiting for prior transaction to commit

    Info: NULL

Progress: 0.000

*************************** 3. row ***************************

      Id: 5

    User: system user

    Host:

      db: crypto_data

 Command: Connect

    Time: 12612

   State: Waiting for global read lock

    Info: INSERT INTO `market_orders_ok` (`label`, `marketid`, `ordertype`, `price`, `quantity`, `total`) VALU

Progress: 0.000

*************************** 4. row ***************************

      Id: 6

    User: system user

    Host:

      db: NULL

 Command: Connect

    Time: 12612

   State: Waiting for prior transaction to start commit before starting next transaction

    Info: NULL

Progress: 0.000

*************************** 5. row ***************************

      Id: 7

    User: system user

    Host:

      db: NULL

 Command: Connect

    Time: 1651381

   State: Waiting for master to send event

    Info: NULL

Progress: 0.000

*************************** 6. row ***************************

      Id: 8

    User: system user

    Host:

      db: NULL

 Command: Connect

    Time: 13263

   State: Waiting for room in worker thread event queue

    Info: NULL

Progress: 0.000

*************************** 7. row ***************************

      Id: 39

    User: monitoring

    Host: localhost

      db: NULL

 Command: Sleep

    Time: 59

   State:

    Info: NULL

Progress: 0.000

*************************** 8. row ***************************

      Id: 1568378

    User: tanje6acfc0c1213cfb63

    Host: localhost

      db: NULL

 Command: Sleep

    Time: 3

   State:

    Info: NULL

Progress: 0.000

*************************** 9. row ***************************

      Id: 27871929

    User: backup

    Host: localhost

      db: NULL

 Command: Query

    Time: 12612

   State: Waiting for commit lock

    Info: FLUSH TABLES WITH READ LOCK

Progress: 0.000

*************************** 10. row ***************************

      Id: 28082858

    User: tanje6acfc0c1213cfb63

    Host: localhost

      db: NULL

 Command: Query

    Time: 0

   State: init

    Info: show processlist

Progress: 0.000

10 rows in set (0.00 sec)

Attachments

Issue Links

relates to

MDEV-8318 Assertion `!pool->busy' failed in pool_mark_busy(rpl_parallel_thread_pool*) on concurrent FTWRL

Closed

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Kristian Nielsen added a comment - 2015-06-02 13:03

Current idea:

When FTWRL starts, it first checks all parallel replication worker
threads. It finds the most recent GTID started by any of them. It then sets
a flag to tell the threads not to start on any newer GTIDs, and then waits
for all earlier GTIDs to fully commit. It also sets a flag to tell START
SLAVE, STOP SLAVE, and the SQL thread to not start any new slave activity.

Once all worker threads have reached their designated point, FTWLR continues
to take the global read lock. Once that is obtained, it clears the flags and
signals worker threads and other slave code that it can proceed. At this
point, the lock is held, so no real activity will be possible until the lock
is cleared with UNLOCK TABLES.

This should hopefully fix the deadlock, at least I got the test case of
Elena to pass with a preliminary patch along these lines.

Some care will probably be needed to guard against other deadlocks against
concurrent START SLAVE / STOP SLAVE, hopefully I can get that solved.

Kristian Nielsen added a comment - 2015-06-02 13:03 Current idea: When FTWRL starts, it first checks all parallel replication worker threads. It finds the most recent GTID started by any of them. It then sets a flag to tell the threads not to start on any newer GTIDs, and then waits for all earlier GTIDs to fully commit. It also sets a flag to tell START SLAVE, STOP SLAVE, and the SQL thread to not start any new slave activity. Once all worker threads have reached their designated point, FTWLR continues to take the global read lock. Once that is obtained, it clears the flags and signals worker threads and other slave code that it can proceed. At this point, the lock is held, so no real activity will be possible until the lock is cleared with UNLOCK TABLES. This should hopefully fix the deadlock, at least I got the test case of Elena to pass with a preliminary patch along these lines. Some care will probably be needed to guard against other deadlocks against concurrent START SLAVE / STOP SLAVE, hopefully I can get that solved.

Kristian Nielsen added a comment - 2015-06-09 17:51 - edited

I have a fix for this now (that just removes the deadlock, without breaking
existing backup tools or otherwise changing user-visible behaviour).

It is in three parts. The actual fix is the second patch. The first patch is
just unrelated refactoring of the parallel replication code to make the
second patch cleaner. The third patch is a test case.

http://lists.askmonty.org/pipermail/commits/2015-June/008059.html
http://lists.askmonty.org/pipermail/commits/2015-June/008060.html
http://lists.askmonty.org/pipermail/commits/2015-June/008061.html

Elena: You asked some days ago - this patch should be ready, if you want to
apply it in further testing.

EDIT: Updated with newest version of patches after fixing ~~MDEV-8318~~.

Kristian Nielsen added a comment - 2015-06-09 17:51 - edited I have a fix for this now (that just removes the deadlock, without breaking existing backup tools or otherwise changing user-visible behaviour). It is in three parts. The actual fix is the second patch. The first patch is just unrelated refactoring of the parallel replication code to make the second patch cleaner. The third patch is a test case. http://lists.askmonty.org/pipermail/commits/2015-June/008059.html http://lists.askmonty.org/pipermail/commits/2015-June/008060.html http://lists.askmonty.org/pipermail/commits/2015-June/008061.html Elena: You asked some days ago - this patch should be ready, if you want to apply it in further testing. EDIT: Updated with newest version of patches after fixing MDEV-8318 .

Elena Stepanova added a comment - 2015-06-14 18:45

See ~~MDEV-8318~~. I made it a separate report not to pollute this one too much.

Elena Stepanova added a comment - 2015-06-14 18:45 See MDEV-8318 . I made it a separate report not to pollute this one too much.

Michael Widenius added a comment - 2015-10-22 10:02

Review of ~~MDEV-7818~~: Deadlock occurring with parallel replication and FTWRL

http://lists.askmonty.org/pipermail/commits/2015-June/008059.html
In this patch you added:

+/*
+ Do not start parallel execution of this event group until all prior groups
+ have reached the commit phase that are not safe to run in parallel with.
+*/
+static bool
+do_gco_wait(rpl_group_info *rgi, group_commit_orderer *gco,
+ bool *did_enter_cond, PSI_stage_info *old_stage)

You did however remove this comment:

/*
Register ourself to wait for the previous commit, if we need to do
such registration and that previous commit has not already
occured.
-
Also do not start parallel execution of this event group until all
prior groups have reached the commit phase that are not safe to run
in parallel with.
*/

Why did you remove the first part of the comment?
Is that not true anymore?

In sql/rpl_parallel.h you added:

+ /*
+ The largest sub_id that has started its transaction. Protected by
+ LOCK_parallel_entry.
+
+ (Transactions can start out-of-order, so this value signifies that no
+ transactions with larger sub_id have started, but not necessarily that all
+ transactions with smaller sub_id have started).
+ */
+ uint64 largest_started_sub_id;

But this is not used anywhere.

http://lists.askmonty.org/pipermail/commits/2015-June/008060.html

+static void
+do_ftwrl_wait(rpl_group_info *rgi,
+ bool *did_enter_cond, PSI_stage_info *old_stage)
+{
+ THD *thd= rgi->thd;
+ rpl_parallel_entry *entry= rgi->parallel_entry;
+ uint64 sub_id= rgi->gtid_sub_id;
+
+ mysql_mutex_assert_owner(&entry->LOCK_parallel_entry);

Would be good to have a comment for the following test, like:

/*
If ftwrl is activate, wait until ftwrl is finnished if we are a
new transaction that started after the ftwrl command was given
*/

Note that if you would store MAX_INT in entry->pause_sub_id when it's not
active, then the following test would be much easier:

+ if (unlikely(entry->pause_sub_id > 0) && sub_id > entry->pause_sub_id)
+ {
+ thd->ENTER_COND(&entry->COND_parallel_entry, &entry->LOCK_parallel_entry,
+ &stage_waiting_for_ftwrl, old_stage);

As you don't have an EXIT_COND() matching the above ENTER_COND, please
add a comment that EXIT_COND() will be called by the caller
(handle_rpl_parallel_thread())

<cut>

Wouldn't a better name for the following function be:
rpl_unpause_after_ftwrl(THD *thd) ?

void
+rpl_unpause_for_ftwrl(THD *thd)
+{
+ uint32 i;
+ rpl_parallel_thread_pool *pool= &global_rpl_thread_pool;
+
+ DBUG_ASSERT(pool->busy);
+
+ for (i= 0; i < pool->count; ++i)
+ {
+ rpl_parallel_entry *e;
+ rpl_parallel_thread *rpt= pool->threads[i];
+
+ mysql_mutex_lock(&rpt->LOCK_rpl_thread);
+ if (!rpt->current_owner)
+

{ + mysql_mutex_unlock(&rpt->LOCK_rpl_thread); + continue; Can the above thread ever have e->pause_sub_id != 0 ? For example if the thread did have a current_owner during pause and not an owner now ? + }

+ e= rpt->current_entry;
+ mysql_mutex_lock(&e->LOCK_parallel_entry);
+ mysql_mutex_unlock(&rpt->LOCK_rpl_thread);
+ e->pause_sub_id= 0;
+ mysql_cond_broadcast(&e->COND_parallel_entry);

Don't you need to unlock mysql_mutex_lock(&e->LOCK_parallel_entry) here ?

+ }

+ mysql_mutex_lock(&pool->LOCK_rpl_thread_pool);
+ pool_mark_not_busy(pool);
+ mysql_cond_broadcast(&pool->COND_rpl_thread_pool);
+ mysql_mutex_unlock(&pool->LOCK_rpl_thread_pool);

As pool_mark_not_busy() is already doing the broadcast, you can remove
it from above.

If we always have to take the above mutexes when calling pool_mark_not_busy()
it would be better if we move the lock and unlock inside pool_mark_not_busy()

<cut>
+rpl_pause_for_ftwrl(THD *thd)
+{
<cut>

+ for (i= 0; i < pool->count; ++i)
+ {
+ PSI_stage_info old_stage;
+ rpl_parallel_entry *e;
+ rpl_parallel_thread *rpt= pool->threads[i];
+
+ mysql_mutex_lock(&rpt->LOCK_rpl_thread);
+ if (!rpt->current_owner)
+

{ + mysql_mutex_unlock(&rpt->LOCK_rpl_thread); + continue; + }

+ e= rpt->current_entry;
+ mysql_mutex_lock(&e->LOCK_parallel_entry);
+ mysql_mutex_unlock(&rpt->LOCK_rpl_thread);
+ ++e->need_sub_id_signal;
+ if (!e->pause_sub_id)
+ e->pause_sub_id= e->largest_started_sub_id;

Why the above test?
In which case can pause_sub_id be != 0 here ?
If it's != 0, what does it mean?

+ thd->ENTER_COND(&e->COND_parallel_entry, &e->LOCK_parallel_entry,
+ &stage_waiting_for_ftwrl_threads_to_pause, &old_stage);
+ while (e->last_committed_sub_id < e->pause_sub_id && !err)
+ {
+ if (thd->check_killed())
+

{ + thd->send_kill_message(); + err= 1; + break; + }

+ mysql_cond_wait(&e->COND_parallel_entry, &e->LOCK_parallel_entry);
+ };
+ -e>need_sub_id_signal;
+ thd->EXIT_COND(&old_stage);
+ if (err)
+ break;
+ }
+
+ if (err)
+ rpl_unpause_for_ftwrl(thd);
+ return err;
+}

@@ -1106,7 +1302,14 @@ rpl_parallel_change_thread_count(rpl_parallel_thread_pool *pool,
*/
for (i= 0; i < pool->count; ++i)
{

rpl_parallel_thread *rpt= pool->get_thread(NULL, NULL);
+ rpl_parallel_thread *rpt;
+
+ mysql_mutex_lock(&pool->LOCK_rpl_thread_pool);
+ while ((rpt= pool->free_list) == NULL)
+ mysql_cond_wait(&pool->COND_rpl_thread_pool, &pool->LOCK_rpl_thread_pool);
+ pool->free_list= rpt->next;
+ mysql_mutex_unlock(&pool->LOCK_rpl_thread_pool);
+ mysql_mutex_lock(&rpt->LOCK_rpl_thread);

I see you used the original code from get_thread(), but you don't reset
rpt->current_owner or rpt->current_entry anymore. Is that not needed?

@@ -1496,8 +1703,14 @@ rpl_parallel_thread_pool::get_thread(rpl_parallel_thread **owner,
rpl_parallel_thread *rpt;

mysql_mutex_lock(&LOCK_rpl_thread_pool);

while ((rpt= free_list) == NULL)
+ for (;
+ { + while (unlikely(busy)) + mysql_cond_wait(&COND_rpl_thread_pool, &LOCK_rpl_thread_pool); + if ((rpt= free_list) != NULL) + break; mysql_cond_wait(&COND_rpl_thread_pool, &LOCK_rpl_thread_pool); + }

Why not use:

while (unlikely(busy) || !(rpt= free_list))
mysql_cond_wait(&COND_rpl_thread_pool, &LOCK_rpl_thread_pool);

— a/sql/sql_parse.cc
+++ b/sql/sql_parse.cc
@@ -4259,6 +4259,17 @@ case SQLCOM_PREPARE:
break;
}

+ if (lex->type & REFRESH_READ_LOCK)
+

{ + /* + We need to pause any parallel replication slave workers during FLUSH + TABLES WITH READ LOCK. Otherwise we might cause a deadlock, as + worker threads eun run in arbitrary order but need to commit in a + specific given order. + */ + if (rpl_pause_for_ftwrl(thd)) + goto error; + }

/*
reload_acl_and_cache() will tell us if we are allowed to write to the
binlog or not.
@@ -4289,6 +4300,8 @@ case SQLCOM_PREPARE:
if (!res)
my_ok(thd);
}
+ if (lex->type & REFRESH_READ_LOCK)
+ rpl_unpause_for_ftwrl(thd);

break;
}

Why have the above code in sql_parse.cc, instead of in
reload_acl_and_cache() around the call for

'thd->global_read_lock.lock_global_read_lock(thd))'

This would avoid calling rpl_pause_for_ftwrl(thd) if tables are locked
and in other possible error cases detected early.

On the other hand, you may also need to take care of the code just above
that calls

flush_tables_with_read_lock(thd, all_tables)

This happens when you give tables as a argument to FLUSH TABLES WITH READ LOCK.

I would assume that this can also cause a dead lock.

The easy fix is probably to add rpl_pause / rpl_unpause also to this
function just before it's calling lock_table_names().

------------

http://lists.askmonty.org/pipermail/commits/2015-June/008061.html

I didn't see test for doing FLUSH TABLES WITH READ LOCK, 2 in a row.

This should detect if there ever was any issues with
rpl_unpause_for_ftwrl() not beeing properly called after
rpl_pause...().

Michael Widenius added a comment - 2015-10-22 10:02 Review of MDEV-7818 : Deadlock occurring with parallel replication and FTWRL http://lists.askmonty.org/pipermail/commits/2015-June/008059.html In this patch you added: +/* + Do not start parallel execution of this event group until all prior groups + have reached the commit phase that are not safe to run in parallel with. +*/ +static bool +do_gco_wait(rpl_group_info *rgi, group_commit_orderer *gco, + bool *did_enter_cond, PSI_stage_info *old_stage) You did however remove this comment: /* Register ourself to wait for the previous commit, if we need to do such registration and that previous commit has not already occured. - Also do not start parallel execution of this event group until all prior groups have reached the commit phase that are not safe to run in parallel with. */ Why did you remove the first part of the comment? Is that not true anymore? In sql/rpl_parallel.h you added: + /* + The largest sub_id that has started its transaction. Protected by + LOCK_parallel_entry. + + (Transactions can start out-of-order, so this value signifies that no + transactions with larger sub_id have started, but not necessarily that all + transactions with smaller sub_id have started). + */ + uint64 largest_started_sub_id; But this is not used anywhere. http://lists.askmonty.org/pipermail/commits/2015-June/008060.html +static void +do_ftwrl_wait(rpl_group_info *rgi, + bool *did_enter_cond, PSI_stage_info *old_stage) +{ + THD *thd= rgi->thd; + rpl_parallel_entry *entry= rgi->parallel_entry; + uint64 sub_id= rgi->gtid_sub_id; + + mysql_mutex_assert_owner(&entry->LOCK_parallel_entry); Would be good to have a comment for the following test, like: /* If ftwrl is activate, wait until ftwrl is finnished if we are a new transaction that started after the ftwrl command was given */ Note that if you would store MAX_INT in entry->pause_sub_id when it's not active, then the following test would be much easier: + if (unlikely(entry->pause_sub_id > 0) && sub_id > entry->pause_sub_id) + { + thd->ENTER_COND(&entry->COND_parallel_entry, &entry->LOCK_parallel_entry, + &stage_waiting_for_ftwrl, old_stage); As you don't have an EXIT_COND() matching the above ENTER_COND, please add a comment that EXIT_COND() will be called by the caller (handle_rpl_parallel_thread()) <cut> Wouldn't a better name for the following function be: rpl_unpause_after_ftwrl(THD *thd) ? void +rpl_unpause_for_ftwrl(THD *thd) +{ + uint32 i; + rpl_parallel_thread_pool *pool= &global_rpl_thread_pool; + + DBUG_ASSERT(pool->busy); + + for (i= 0; i < pool->count; ++i) + { + rpl_parallel_entry *e; + rpl_parallel_thread *rpt= pool->threads [i] ; + + mysql_mutex_lock(&rpt->LOCK_rpl_thread); + if (!rpt->current_owner) + { + mysql_mutex_unlock(&rpt->LOCK_rpl_thread); + continue; Can the above thread ever have e->pause_sub_id != 0 ? For example if the thread did have a current_owner during pause and not an owner now ? + } + e= rpt->current_entry; + mysql_mutex_lock(&e->LOCK_parallel_entry); + mysql_mutex_unlock(&rpt->LOCK_rpl_thread); + e->pause_sub_id= 0; + mysql_cond_broadcast(&e->COND_parallel_entry); Don't you need to unlock mysql_mutex_lock(&e->LOCK_parallel_entry) here ? + } + mysql_mutex_lock(&pool->LOCK_rpl_thread_pool); + pool_mark_not_busy(pool); + mysql_cond_broadcast(&pool->COND_rpl_thread_pool); + mysql_mutex_unlock(&pool->LOCK_rpl_thread_pool); As pool_mark_not_busy() is already doing the broadcast, you can remove it from above. If we always have to take the above mutexes when calling pool_mark_not_busy() it would be better if we move the lock and unlock inside pool_mark_not_busy() <cut> +rpl_pause_for_ftwrl(THD *thd) +{ <cut> + for (i= 0; i < pool->count; ++i) + { + PSI_stage_info old_stage; + rpl_parallel_entry *e; + rpl_parallel_thread *rpt= pool->threads [i] ; + + mysql_mutex_lock(&rpt->LOCK_rpl_thread); + if (!rpt->current_owner) + { + mysql_mutex_unlock(&rpt->LOCK_rpl_thread); + continue; + } + e= rpt->current_entry; + mysql_mutex_lock(&e->LOCK_parallel_entry); + mysql_mutex_unlock(&rpt->LOCK_rpl_thread); + ++e->need_sub_id_signal; + if (!e->pause_sub_id) + e->pause_sub_id= e->largest_started_sub_id; Why the above test? In which case can pause_sub_id be != 0 here ? If it's != 0, what does it mean? + thd->ENTER_COND(&e->COND_parallel_entry, &e->LOCK_parallel_entry, + &stage_waiting_for_ftwrl_threads_to_pause, &old_stage); + while (e->last_committed_sub_id < e->pause_sub_id && !err) + { + if (thd->check_killed()) + { + thd->send_kill_message(); + err= 1; + break; + } + mysql_cond_wait(&e->COND_parallel_entry, &e->LOCK_parallel_entry); + }; + - e >need_sub_id_signal; + thd->EXIT_COND(&old_stage); + if (err) + break; + } + + if (err) + rpl_unpause_for_ftwrl(thd); + return err; +} @@ -1106,7 +1302,14 @@ rpl_parallel_change_thread_count(rpl_parallel_thread_pool *pool, */ for (i= 0; i < pool->count; ++i) { rpl_parallel_thread *rpt= pool->get_thread(NULL, NULL); + rpl_parallel_thread *rpt; + + mysql_mutex_lock(&pool->LOCK_rpl_thread_pool); + while ((rpt= pool->free_list) == NULL) + mysql_cond_wait(&pool->COND_rpl_thread_pool, &pool->LOCK_rpl_thread_pool); + pool->free_list= rpt->next; + mysql_mutex_unlock(&pool->LOCK_rpl_thread_pool); + mysql_mutex_lock(&rpt->LOCK_rpl_thread); I see you used the original code from get_thread(), but you don't reset rpt->current_owner or rpt->current_entry anymore. Is that not needed? @@ -1496,8 +1703,14 @@ rpl_parallel_thread_pool::get_thread(rpl_parallel_thread **owner, rpl_parallel_thread *rpt; mysql_mutex_lock(&LOCK_rpl_thread_pool); while ((rpt= free_list) == NULL) + for (; + { + while (unlikely(busy)) + mysql_cond_wait(&COND_rpl_thread_pool, &LOCK_rpl_thread_pool); + if ((rpt= free_list) != NULL) + break; mysql_cond_wait(&COND_rpl_thread_pool, &LOCK_rpl_thread_pool); + } Why not use: while (unlikely(busy) || !(rpt= free_list)) mysql_cond_wait(&COND_rpl_thread_pool, &LOCK_rpl_thread_pool); — a/sql/sql_parse.cc +++ b/sql/sql_parse.cc @@ -4259,6 +4259,17 @@ case SQLCOM_PREPARE: break; } + if (lex->type & REFRESH_READ_LOCK) + { + /* + We need to pause any parallel replication slave workers during FLUSH + TABLES WITH READ LOCK. Otherwise we might cause a deadlock, as + worker threads eun run in arbitrary order but need to commit in a + specific given order. + */ + if (rpl_pause_for_ftwrl(thd)) + goto error; + } /* reload_acl_and_cache() will tell us if we are allowed to write to the binlog or not. @@ -4289,6 +4300,8 @@ case SQLCOM_PREPARE: if (!res) my_ok(thd); } + if (lex->type & REFRESH_READ_LOCK) + rpl_unpause_for_ftwrl(thd); break; } Why have the above code in sql_parse.cc, instead of in reload_acl_and_cache() around the call for 'thd->global_read_lock.lock_global_read_lock(thd))' This would avoid calling rpl_pause_for_ftwrl(thd) if tables are locked and in other possible error cases detected early. On the other hand, you may also need to take care of the code just above that calls flush_tables_with_read_lock(thd, all_tables) This happens when you give tables as a argument to FLUSH TABLES WITH READ LOCK. I would assume that this can also cause a dead lock. The easy fix is probably to add rpl_pause / rpl_unpause also to this function just before it's calling lock_table_names(). ------------ http://lists.askmonty.org/pipermail/commits/2015-June/008061.html I didn't see test for doing FLUSH TABLES WITH READ LOCK, 2 in a row. This should detect if there ever was any issues with rpl_unpause_for_ftwrl() not beeing properly called after rpl_pause...().

Kristian Nielsen added a comment - 2015-11-14 11:40

Pushed to 10.0 and 10.1:

http://lists.askmonty.org/pipermail/commits/2015-November/008627.html
http://lists.askmonty.org/pipermail/commits/2015-November/008628.html

Kristian Nielsen added a comment - 2015-11-14 11:40 Pushed to 10.0 and 10.1: http://lists.askmonty.org/pipermail/commits/2015-November/008627.html http://lists.askmonty.org/pipermail/commits/2015-November/008628.html

MariaDB Server

Deadlock occurring with parallel replication and FTWRL

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration