[MDEV-14146] galera.GAL-480 fails with a deadlock or a crash Created: 2017-10-26  Updated: 2017-10-30  Resolved: 2017-10-30

Status: Closed
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.2
Fix Version/s: 10.2.10

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Seppo Jaakola
Resolution: Fixed Votes: 0
Labels: None
Environment:

libgalera 25.3.14



 Description   

10.2 e99c7c8334f842a debug build

galera.GAL-480 'innodb'                  w2 [ fail ]
        Test ended at 2017-10-26 03:58:20
 
CURRENT_TEST: galera.GAL-480
mysqltest: At line 13: query 'ALTER TABLE t1 DROP COLUMN f1' failed: 1213: Deadlock found when trying to get lock; try restarting transaction
 
The result from queries just before the failure was:
connection node_1;
CREATE TABLE t1 (f1 CHAR(10), f0 integer) ENGINE=InnoDB;
FLUSH TABLE t1 FOR EXPORT;
UNLOCK TABLES;
ALTER TABLE t1 DROP COLUMN f1;
SET SESSION wsrep_osu_method='RSU';
ALTER TABLE t1 ADD COLUMN f1 CHAR(10);
ALTER TABLE t1 DROP COLUMN f1;
 
 - saving '/home/elenst/git/10.2/mysql-test/var/2/log/galera.GAL-480-innodb/' to '/home/elenst/git/10.2/mysql-test/var/log/galera.GAL-480-innodb/'
worker[2] > Restart  - not started
worker[2] > Restart  - not started
***Warnings generated in error logs during shutdown after running tests: galera.GAL-480
 
2017-10-26  3:58:20 140310951712512 [ERROR] WSREP: Node desync failed.: 11 (Resource temporarily unavailable)
2017-10-26  3:58:20 140310951712512 [Warning] WSREP: RSU desync failed 3 for schema: test, query: ALTER TABLE t1 DROP COLUMN f1
2017-10-26  3:58:20 140310951712512 [Warning] WSREP: ALTER TABLE isolation failure

Monty also reported seeing a crash:

- galera.GAL-480
  Fails with a core dump:
#5  0x00007fc37dfe4458 in abort () from /lib64/libc.so.6
#6  0x00007fc37940911e in galera::FSM<galera::Replicator::State,
galera::ReplicatorSMM::Transition, galera::EmptyGuard,
galera::EmptyAction>::shift_to (this=this@entry=0x41b9fb0,
state=state@entry=galera::Replicator::S_DONOR) at
galera/src/fsm.hpp:81
#7  0x00007fc3793fdeb2 in galera::ReplicatorSMM::desync
(this=0x41b9f60) at galera/src/replicator_smm.cpp:1659
#8  0x00007fc37940e778 in galera_desync (gh=<optimized out>) at
galera/src/wsrep_provider.cpp:1004
#9  0x00000000008cdb12 in wsrep_RSU_begin (thd=0x7fc30c000a98,
db_=0x0, table_=0x0) at /my/maria-10.2/sql/wsrep_mysqld.cc:1591
#10 0x00000000008ce60f in wsrep_to_isolation_begin
(thd=0x7fc30c000a98, db_=0x0, table_=0x0, table_list=0x7fc30c011380)
at /my/maria-10.2/sql/wsrep_mysqld.cc:1728
#11 0x000000000084508d in Sql_cmd_alter_table::execute
(this=0x7fc30c011f40, thd=0x7fc30c000a98) at /my/maria-10.2/sql/sql_al 



 Comments   
Comment by Seppo Jaakola [ 2017-10-30 ]

Which Galera library version was used when reproducing this issue?

I am using Galera library builds from git@github.com:MariaDB/galera.git and if I use a build from current HEAD of 3.x branch in this repo, the issue does not surface, no matter how many rounds I run on my laptop.

However, when I make a build from latest release tag: mariadb-25.3.19, I can see the crash with very first try.

Comment by Elena Stepanova [ 2017-10-30 ]

According to my note in the email to the mailing list, it was 25.3.14. Here it goes:

Here are also some failures encountered outside buildbot. It's possible that some of them are caused by using an old Galera library (25.3.14). If that's the reason, please indicate that in JIRA and feel free to close as fixed or not-a-bug.

Comment by Seppo Jaakola [ 2017-10-30 ]

galera.GAL-480 test was introduced in Galera version 25.3.20, and earlier Galera versions are supposed to show failure with this test.
Please change your test system to use Galera library 25.3.20 or later to make this test pass. (latest version available now is 25.3.22)

Generated at Thu Feb 08 08:11:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.