[MDEV-27039] Trying to lock mutex ... when the mutex was already locked | SIGABRT in safe_mutex_lock | Hangs Created: 2021-11-13 Updated: 2022-03-10 Resolved: 2022-01-03 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - Spider |
| Affects Version/s: | 10.6, 10.7, 10.8 |
| Fix Version/s: | 10.6.6, 10.7.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Roel Van de Paar | Assignee: | Andrei Elkin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | cross-mysqld-interaction, hang, regression | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
Very similar to
Leads to:
Bug confirmed present in: Bug (or feature/syntax) confirmed not present in (unless sporadicity caused it not to crash/fail): |
| Comments |
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
The bug is reproducible on 10.6.3 but not on 10.6.0. | |||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
The result of git-bisect:
| |||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
Elkin The bug seems to be due to the change in the replication component. Please check it. | |||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
In particular, this change which is not explained in the commit comment and is not present in the upstream patch. It comes from a different upstream commit and I suspect it's just wrong there. | |||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
What happens in the above back trace: LOCK_global_system_variables is acquired in sys_var::update(). Then, spider_param_init_sql_alloc_size() tries to access the Spider's system variable spider_init_sql_alloc_size and tries to take LOCK_global_system_variables lock (dead lock!). | |||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
As serg pointed out, removing ha_flush_logs() fixes the problem. The following makeshift patch also fix the problem (note: this cannot be applied as is):
| |||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
Elkin Please check comments made by serg and me. If the log flushing is necessary for setting the binlog_checksum variable, we may have to change the Spider implementation. However, if it is wrongly there, please remove it. | |||||||||||||||||||||||||||||||||||||||
| Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
Elkin FYI, | |||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-12-21 ] | |||||||||||||||||||||||||||||||||||||||
|
I suspect that ha_flush_logs() there is not needed. | |||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2021-12-26 ] | |||||||||||||||||||||||||||||||||||||||
|
Elkin FYI, some related info (testcase, notes) in | |||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-12-27 ] | |||||||||||||||||||||||||||||||||||||||
|
Indeed, ha_flush_logs() is unnecessary. Unlike the upstream, where it arrived from, Mariadb exploits different | |||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-12-27 ] | |||||||||||||||||||||||||||||||||||||||
|
Howdy, Sergei. I'm inviting you to check 'cos you've been looking into anyway. Thank you. /ndrei. | |||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-12-27 ] | |||||||||||||||||||||||||||||||||||||||
|
the commit de548dbb0f2089 is, of course, ok, as discussed above. | |||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-12-27 ] | |||||||||||||||||||||||||||||||||||||||
|
serg, no need to add more tests than those that the commit message points. They test PURGE and more. | |||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-12-27 ] | |||||||||||||||||||||||||||||||||||||||
|
what about the test for | |||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-12-28 ] | |||||||||||||||||||||||||||||||||||||||
|
serg, You're right. My focus here was too much replication-centric. I adding the description based one to spider 's own suite. | |||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2021-12-29 ] | |||||||||||||||||||||||||||||||||||||||
|
A regression test is added as a separate commit (to be fixed up at future pushing). Cheers. | |||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2022-01-02 ] | |||||||||||||||||||||||||||||||||||||||
|
2611abb3562 is to to push too, thanks! |