[MDEV-8125] Assertion `!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id' failed at 10.1-patched/sql/rpl_parallel.cc:189: void finish_event_group(rpl_parallel_thread*, uint64, rpl_parallel_entry*, rpl_group_info*) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.1(EOL)
Fix Version/s: 10.1.5
Component/s: Replication
Labels:
None

Description

_Note: line numbers from commit 2f25c653ade1e73aa2b1aa77af9a4898bacb2330, patched with http://lists.askmonty.org/pipermail/commits/2015-May/007819.html_

10.1-patched/sql/rpl_parallel.cc:189: void finish_event_group(rpl_parallel_thread*, uint64, rpl_parallel_entry*, rpl_group_info*): Assert

ion `!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id' failed.

150508 19:54:55 [ERROR] mysqld got signal 6 ;

#0  0x00007f6bf0861f8c in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0

#1  0x00007f6bf2c882e4 in my_write_core (sig=6) at /home/elenst/git/10.1-patched/mysys/stacktrace.c:456

#2  0x00007f6bf2644aef in handle_fatal_signal (sig=6) at /home/elenst/git/10.1-patched/sql/signal_handler.cc:266

#3  <signal handler called>

#4  0x00007f6befecc425 in raise () from /lib/x86_64-linux-gnu/libc.so.6

#5  0x00007f6befecfb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6

#6  0x00007f6befec50ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6

#7  0x00007f6befec5192 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6

#8  0x00007f6bf25b0c7b in finish_event_group (rpt=0x7f6bc3c5dc40, sub_id=14255, entry=0x7f6bc3c8c370, rgi=0x7f6bc1ecd000) at /home/elenst/git/10.1-patched/sql/rpl_parallel.cc:189

#9  0x00007f6bf25b2da8 in handle_rpl_parallel_thread (arg=0x7f6bc3c5dc40) at /home/elenst/git/10.1-patched/sql/rpl_parallel.cc:980

#10 0x00007f6bf085ce9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0

#11 0x00007f6beff89cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6

#12 0x0000000000000000 in ?? ()

To reproduce,

clone lp:~elenst/randgen/rqg-mdev8125
cd rqg-mdev8125

run

perl ./runall-trials.pl --trials=10 --threads=8 --duration=200 --queries=100M --reporters=Backtrace,ErrorLog,ReplicationThreadRestarter --mysqld=--log_bin_trust_function_creators=1 --engine=InnoDB --grammar=mdev8125.yy --gendata=conf/runtime/metadata_stability.zz --rpl_mode=statement --mysqld=--slave-parallel-mode=optimistic --mysqld=--slave-parallel-threads=20 --basedir=<basedir> --vardir=<vardir>

It will run the same test 10 times (or until the first crash).
The master vardir will be <vardir>. The slave vardir will be <vardir>_slave.
Data and logs will be in these folders, correspondingly.

If it crashes sporadically on another reason, you can try to run the same line with --force option. In this case it will run all 10 times, and will save vardirs for the failed attempts.

It can also hit ~~MDEV-8113~~ and hang (visually, it will just stop saying anything for long time, which shouldn't normally happen because trials are fairly short, 200 seconds each plus some time to sync servers, and it's verbose enough; usually it prints complaints about semantic errors – duplicate column names, and so on; as long as it doesn't abort, it's normal).
Currently this type of deadlock isn't recognized by the tool, so the only reliable way to get out of it is interrupt everything and start again.
But maybe you already have a patch for the bug, it should help.

Attachments

Issue Links

relates to

MDEV-21107 Assertion `!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id' failed in finish_event_group

Confirmed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2015-05-09 11:14

Please let me know if it's not reproducible, I'll set it up on perro or will think of something else.

Elena Stepanova added a comment - 2015-05-09 11:14 Please let me know if it's not reproducible, I'll set it up on perro or will think of something else.

Kristian Nielsen added a comment - 2015-05-19 21:40 - edited

It looks like this may be a duplicate of ~~MDEV-8113~~, just with different
symptoms.

At least, with ~~MDEV-8113~~ patch applied, and with
--slave-skip-errors=1359,1360, I was able to run 60 trails in a row without
errors. Without ~~MDEV-8113~~ patch, I got the assertion once (as well as the
~~MDEV-8113~~ hang a handful of times).

I also checked the point of crash. It happens right after the slave SQL
thread is restarted, where the first event after restart is a ddl. And the
assertion happens on the transaction following the initial ddl.

The ~~MDEV-8113~~ problem was just an error in the handling of the
group_commit_orderer object when initial event is ddl, and the assertion
concerns wrong handling of group_commit_orderer. So seems reasonable to mark
this as a duplicate of ~~MDEV-8113~~.

Kristian Nielsen added a comment - 2015-05-19 21:40 - edited It looks like this may be a duplicate of MDEV-8113 , just with different symptoms. At least, with MDEV-8113 patch applied, and with --slave-skip-errors=1359,1360, I was able to run 60 trails in a row without errors. Without MDEV-8113 patch, I got the assertion once (as well as the MDEV-8113 hang a handful of times). I also checked the point of crash. It happens right after the slave SQL thread is restarted, where the first event after restart is a ddl. And the assertion happens on the transaction following the initial ddl. The MDEV-8113 problem was just an error in the handling of the group_commit_orderer object when initial event is ddl, and the assertion concerns wrong handling of group_commit_orderer. So seems reasonable to mark this as a duplicate of MDEV-8113 .

Elena Stepanova added a comment - 2015-05-19 21:45

I agree, I also haven't been able to reproduce it on a server with the patch for ~~MDEV-8113~~. Please feel free to close it, I will re-open or create another one if I encounter it again.

Elena Stepanova added a comment - 2015-05-19 21:45 I agree, I also haven't been able to reproduce it on a server with the patch for MDEV-8113 . Please feel free to close it, I will re-open or create another one if I encounter it again.

Kristian Nielsen added a comment - 2015-05-19 21:46

Duplicate of ~~MDEV-8113~~, as per above discussion.

Kristian Nielsen added a comment - 2015-05-19 21:46 Duplicate of MDEV-8113 , as per above discussion.

People

Assignee:: Kristian Nielsen

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2015-05-08 20:41

Updated:: 2019-11-21 07:58

Resolved:: 2015-05-19 21:46

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Assertion `!tmp_gco->next_gco || tmp_gco->last_sub_id > sub_id' failed at 10.1-patched/sql/rpl_parallel.cc:189: void finish_event_group(rpl_parallel_thread, uint64, rpl_parallel_entry, rpl_group_info*)