[MDEV-9560] Mariadb 10.1 Crashes when replicating from 10.0 Created: 2016-02-15  Updated: 2016-03-21  Resolved: 2016-02-23

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.23, 10.1.11
Fix Version/s: 10.0.25, 10.1.12

Type: Bug Priority: Major
Reporter: Sander Pilon Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: replication

Attachments: File mysql-bin.056373.decoded.2    
Issue Links:
Duplicate
duplicates MDEV-9567 mysqlbinlog fails to read binlog even... Closed

 Description   

When replicating from our 10.0 server, MariaDB 10.1.11 crashes. After the crash, it will not startup again unless all relay logs are deleted.

Servers running Ubuntu, both with 200G ram.

2016-02-15 15:12:47 140328625293056 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table Voorraadbeheer.InklaarLog; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704, Internal MariaDB error code: 1105
2016-02-15 15:12:47 140328625293056 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'Voorraadbeheer.InklaarPrijs', Gtid 0-765-1299574704, Internal MariaDB error code: 1610
160215 15:12:47 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.11-MariaDB-1~trusty-log
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=0
max_threads=502
thread_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1118986 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f95a6c2b008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fa0cddcb590 thread_stack 0x80000
2016-02-15 15:12:47 140328649512704 [Note] Slave I/O thread: connected to master 'slave@jen.managementboek.nl:3306',replication started in log 'mysql-bin.056376' at position 692280882
addr2line: 'mysqld': No such file
mysqld(my_print_stacktrace+0x2e)[0x559e045013fe]
mysqld(handle_fatal_signal+0x38d)[0x559e0402ef9d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fa0d02e5340]
mysqld(_Z10unpack_rowP14rpl_group_infoP5TABLEjPKhPK9st_bitmapPS4_PmS4_+0x89)[0x559e0410f0b9]
mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x220)[0x559e0410be80]
mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x31c)[0x559e040fe2cc]
mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14rpl_group_infoP19rpl_parallel_thread+0x1e1)[0x559e03e3a2e1]
mysqld(handle_slave_sql+0x261b)[0x559e03e3dd1b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fa0d02dd182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa0cfa0047d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): is an invalid pointer
Connection ID (thread ID): 5
Status: NOT_KILLED

gdb shows the following:

(gdb) bt
#0  0x0000555555be30b9 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) ()
#1  0x0000555555bdfe80 in Update_rows_log_event::do_exec_row(rpl_group_info*) ()
#2  0x0000555555bd22cc in Rows_log_event::do_apply_event(rpl_group_info*) ()
#3  0x000055555590e2e1 in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) ()
#4  0x0000555555911d1b in handle_slave_sql ()
#5  0x00007ffff6afe182 in start_thread (arg=0x7ffff5c85700) at pthread_create.c:312
#6  0x00007ffff622147d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Slave my.cnf

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
back_log=1500
 
# # UTF-8
character_set_server = utf8
collation_server = utf8_general_ci
 
#
# * Logging and Replication
#
#log_slow_queries       = /var/log/mysql/mysql-slow.log
long_query_time         = 1
log-queries-not-using-indexes
log-output              = TABLE
log_warnings            = 2
 
sync_binlog             = 0
binlog_checksum         = crc32
binlog_commit_wait_usec = 50000
binlog_commit_wait_count= 20
server-id               = 345 
log_bin                 = /data/bigmomma/mysql/log/mysql-bin.log
binlog_format           = MIXED
expire_logs_days        = 7
max_binlog_size         = 1G
log-slave-updates
slave_compressed_protocol = 1
relay-log               = /data/bigmomma/mysql/log/mysql-relay.log
 
 
# INNODB #
innodb_log_files_in_group      = 2
innodb_flush_log_at_trx_commit = 0
innodb_buffer_pool_size        = 40G
innodb_file_per_table
innodb_log_file_size=1G
innodb_flush_method=O_DIRECT
innodb_file_format=barracuda
innodb_io_capacity = 20000
innodb_buffer_pool_instances=10
innodb_log_buffer_size=512M
#innodb_adaptive_checkpoint = keep_average
#innodb_flush_neighbor_pages = 0
group_concat_max_len = 8096
 
[mysqldump]
quick
quote-names
max_allowed_packet      = 16M
ignore-table            = telefoongids.white_subscriber
ignore-table            = telefoongids.pink_subscriber

Master my.cnf

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock
 
# Here is entries for some specific programs
# The following values assume you have at least 32M ram
 
# This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe]
socket          = /var/run/mysqld/mysqld.sock
nice            = 0
 
[mysqld]
replicate-wild-ignore-table = %.Voorafgaand%_Klanten
replicate-wild-ignore-table = tmp.%
transaction-isolation =  READ-COMMITTED
 
user            = mysql
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
port            = 3306
basedir         = /usr
datadir         = /var/lib/mysql
tmpdir          = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
 
log-error = /data/bigmomma/mysql/log/error.log
innodb_stats_on_metadata = 0
lock_wait_timeout = 20
innodb_lock_wait_timeout = 20
 
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address            = 0.0.0.0
 
# GTIDs
master_info_repository = table
relay_log_info_repository = table
master-verify-checksum = on
slave-sql-verify-checksum = on
 
 
#
# * Fine Tuning
#
key_buffer              = 16M
#max_allowed_packet     = 16M
thread_stack            = 512K
thread_cache_size       = 8
myisam-recover         = BACKUP
 
# SAFETY #
max_allowed_packet             = 64M
max_connect_errors             = 1000000
sysdate_is_now                 = 1
 
max_connections=500
event_scheduler=ON
table_cache     = 3000
 
tmp_table_size                 = 64M
max_heap_table_size            = 64M
query_cache_type               = 1
query_cache_size               = 8M
max_connections                = 500
thread_cache_size              = 50
open_files_limit               = 65535
table_definition_cache         = 4096
table_open_cache               = 4096
 
max_sp_recursion_depth         = 2
 
back_log=1500
 
 
# UTF-8
character_set_server = utf8
collation_server = utf8_general_ci
 
 
#
# * Logging and Replication
#
#log_slow_queries       = /var/log/mysql/mysql-slow.log
long_query_time         = 1
log-queries-not-using-indexes
log-output              = TABLE
log_warnings            = 2
 
sync_binlog             = 0
binlog_checksum         = crc32
binlog_commit_wait_usec = 50000
binlog_commit_wait_count= 20
server-id               = 765
log_bin                 = /data/bigmomma/mysql/log/mysql-bin.log
binlog_format           = MIXED
expire_logs_days        = 7
max_binlog_size         = 1G
log-slave-updates
slave_compressed_protocol = 1
relay-log               = /data/bigmomma/mysql/log/mysql-relay.log
 
 
# INNODB #
innodb_log_files_in_group      = 2
innodb_flush_log_at_trx_commit = 0
innodb_buffer_pool_size        = 40G
innodb_file_per_table
innodb_log_file_size=1G
innodb_flush_method=O_DIRECT
innodb_file_format=barracuda
innodb_io_capacity = 20000
innodb_buffer_pool_instances=10
innodb_log_buffer_size=32M
 
group_concat_max_len = 8096
 
[mysqldump]
quick
quote-names
max_allowed_packet      = 16M
gtid                    = ON
ignore-table            = telefoongids.white_subscriber
ignore-table            = telefoongids.pink_subscriber
 
[mysql]
#no-auto-rehash # faster start of mysql but no tab completition
 
[isamchk]
key_buffer              = 16M



 Comments   
Comment by Elena Stepanova [ 2016-02-15 ]

CrewOne,

Can you paste the binlog event on which it chokes (mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704), and the output of

USE Voorraadbeheer;
SHOW CREATE TABLE InklaarPrijs;
SHOW CREATE TABLE InklaarLog;
SHOW TRIGGERS LIKE 'InklaarPrijs';
SHOW TRIGGERS LIKE 'InklaarLog';

and also attach your cnf file(s) from master and slave?

You said the server would not start again – what happens when it attempts to start? And what happens after you delete the relay logs?

Thanks.

Comment by Sander Pilon [ 2016-02-15 ]

When I delete the relay logs, the server starts. But obviously the replication does not.

I can't restart the replication after that because I do not know the correct position. I can restore the backup again and start the replication - but eventually it will crash. (Usually it will run for a while and replicate a bit before it crashes.)

But this is the second time this happened. Last week we had the exact same situation. This time is the same except with a fresh backup/restore cycle.

I've never managed to get any 10.1 version stable - not as a master and not as a slave (replicating from 10.0.) The idea was to replicate from 10.0 to 10.1 and see how stable 10.1 performs, but so far no luck.

We also tried 10.1 as a master once, but that crashed as well. So we went back to 10.0.

Comment by Sander Pilon [ 2016-02-15 ]

CREATE TABLE `InklaarPrijs` (
  `EAN` char(13) NOT NULL,
  `DT` timestamp(3) NULL DEFAULT NULL,
  `Listprijs` decimal(7,2) DEFAULT NULL,
  `Valuta` char(8) DEFAULT NULL,
  `KortingMP` decimal(5,2) DEFAULT NULL,
  PRIMARY KEY (`EAN`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
 
CREATE TABLE `InklaarLog` (
  `DT` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
  `EAN` char(13) NOT NULL,
  `Listprijs` decimal(7,2) DEFAULT NULL,
  `Valuta` char(8) DEFAULT NULL,
  `KortingMP` decimal(5,2) DEFAULT NULL,
  PRIMARY KEY (`DT`,`EAN`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
 
Before_Insert_Inkaarlog
BEGIN
	DECLARE kortingmp_ncoi DECIMAL(5,2);
	IF NEW.KortingMP IS NOT NULL THEN
		SELECT U.Kortingmp_NCOI INTO kortingmp_ncoi
		FROM Manpo.Boeken B
			JOIN Manpo.Uitgevers U USING ( UitgeverID )
		WHERE B.EAN = NEW.EAN;
		
		IF kortingmp_ncoi IS NOT NULL AND NEW.KortingMP = kortingmp_ncoi THEN
			SET NEW.KortingMP = null;
		END IF;
	END IF;
END
 
After_Insert_Inklaarlog
BEGIN
	INSERT INTO Voorraadbeheer.InklaarPrijs ( EAN, DT, Listprijs, Valuta, KortingMP )
		VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta, NEW.KortingMP )
		ON DUPLICATE KEY UPDATE DT=VALUES(DT),
			Listprijs = IFNULL(VALUES(Listprijs), Listprijs),
			Valuta = IFNULL(VALUES(Valuta), Valuta),
			KortingMP = IFNULL(VALUES(KortingMP), KortingMP);
END

Comment by Sander Pilon [ 2016-02-15 ]

This does not seem to do anything:

mysqlbinlog --stop-position 217054452 mysql-bin.056376   | tail -n 20
# at 217040113
# at 217041126
# at 217042139
# at 217043152
# at 217044165
# at 217045178
# at 217046191
# at 217047204
# at 217048217
# at 217049230
# at 217050243
# at 217051256
# at 217052269
# at 217053282
# at 217054295
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Comment by Elena Stepanova [ 2016-02-15 ]

Thanks.

This does not seem to do anything

Just dump the whole log then. Besides, we are more interested in the log 373, that's where the crash happened at the first place. Although, you can dump both (or all four of them).

Please use mysqlbinlog --verbose --base64-output=DECODE-ROWS mysql-bin.056373.

Regarding the structure of tables and triggers, is it certainly the same on master and slave?

Comment by Sander Pilon [ 2016-02-16 ]

I attached what I hoped is the relevant section. Also the complete, zipped, file which extracts to 2.2Gb of data.

file: (98Mb zipped) http://we.tl/cvkJu6nxr8

mysqlbinlog --stop-position=217064452 --verbose --base64-output=DECODE-ROWS mysql-bin.056373

Comment by Sander Pilon [ 2016-02-16 ]

The slave has been freshly restored and the structure has been verified using SQLyog. Even so, it's preferred that MariaDB does not crash - ever - even if structures aren't the same. [Doh... :)]

Comment by Elena Stepanova [ 2016-02-16 ]

CrewOne,
Different structures are not an excuse for the crash, but a hint for us to reproduce the problem and find the root cause. So, are they actually different?

Comment by Sander Pilon [ 2016-02-16 ]

Nope, the same as far as I can tell.

Comment by Elena Stepanova [ 2016-02-17 ]

CrewOne,

Thanks for the data. For future, you can upload files to ftp.askmonty.org/private. 2G is not a problem as long as it's okay for your upload. Also, only MariaDB developers have access to the data there, which allows to maintain some privacy.

I can reproduce at least a part of the problem, initially I thought it was an unrelated issue and filed it separately as MDEV-9567, but, now I start thinking it might be the whole reason. However, I'd like to try to check it. Could you please upload the binary log mysql-bin.056373 itself (not decoded)? Maybe I'll be able to get something this way. If you know which values were actually inserted by that unfortunate event, it could also help – since the event appears to be corrupted, there is no saying what was there.

Comment by Sander Pilon [ 2016-02-17 ]

I will upload a file called MDEV-9560.7z to that FTP shortly, it is the requested binlog file. (from the master - not the slave)

Also, I will have to continue development and that means I have to revert the slave back to a 10.0 server today. Maybe I can setup a 10.1 slave on a seperate machine later, if I can help doing that or running a debug build, let me know.

Comment by Sander Pilon [ 2016-02-17 ]

I have reinstalled 10.0.23 - but after I start it, it seems to crash with more or less the same error.

From gdb:

#0  0x0000555555bbd7b6 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) ()
#1  0x0000555555bb99dd in Update_rows_log_event::do_exec_row(rpl_group_info*) ()
#2  0x0000555555bab865 in Rows_log_event::do_apply_event(rpl_group_info*) ()
#3  0x00005555558d760d in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) ()
#4  0x00005555558da06e in handle_slave_sql ()
#5  0x00007ffff7570182 in start_thread (arg=0x7ffff60c3700) at pthread_create.c:312
#6  0x00007ffff645547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I had to erase all logs and database files, I hope replication will work now.

Comment by Elena Stepanova [ 2016-02-18 ]

CrewOne,

Thanks for the data.
I can now reproduce the crash, but there is something that does not add up.
The only way I know to get the error log that you pasted in the description is to set slave_skip_errors=1105 on slave; but it's not in the slave cnf file.
Is it possible that you have another cnf somewhere which is also used by the slave, or that you set this option on mysqld command line?

Comment by Sander Pilon [ 2016-02-18 ]

Indeed, you are right. I'm sorry, but the slave.cnf you have seems to be missing a part. (I edited out the top bit of comments, but seem to have been a bit too enthousiastic.)

replicate-wild-ignore-table = tmp.%
replicate-wild-ignore-table = Voorraadbeheer.VBMIH_tekst_cache
replicate-wild-ignore-table = Managementboek_Klanten.OrderschermRegels
replicate-wild-ignore-table = Voorraadbeheer.GeprinteVoorraad
replicate-wild-ignore-table = php_sessions.% 
replicate-ignore-db=Computerboek_Nieuwsbrieven
replicate-ignore-db=Juridischboek_Nieuwsbrieven
 
relay-log-space-limit=5G
slave-parallel-threads=4
slave-skip-errors=1105
transaction-isolation =  READ-COMMITTED

That particular line, skipping 1105 was introduced when we tried replicating but it stopped a lot with that error. I remember reading about that somehwere, but honestly... it has been a long time ago. I think it was when 10.1 hit the first GA release.

Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

Comment by Elena Stepanova [ 2016-02-18 ]

Okay, thanks, now it makes sense.

Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

I don't think so (will tell you for sure later today), but I think it won't crash, at least not this way, more likely replication will just abort as it usually does.
Skipping the error apparently re-shuffles things a bit and makes the server hit an error which it would otherwise never reach.

Comment by Elena Stepanova [ 2016-02-18 ]

So, here is a summary of what we have here. I can still be wrong in some details, if so, serg will know out the truth as he's investigating now.

Problem 1: corrupt event

It happens when 10.0 server executes the following with the binary log enabled in row format:

SET binlog_format=ROW;
DROP TABLE IF EXISTS mdev9560;
CREATE TABLE mdev9560 (ts TIMESTAMP(3));
INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123'); 

The important part here is TIMESTAMP(3).

When mysqlbinlog attempts to read this binlog, it reports a corrupted event:

*!100001 SET @@session.gtid_seq_no=3*//*!*/;
BEGIN
/*!*/;
# at 648
# at 695
#160218 19:41:03 server id 1  end_log_pos 695   Table_map: `test`.`mdev9560` mapped to number 173
#160218 19:41:03 server id 1  end_log_pos 731   Write_rows: table id 173 flags: STMT_END_F
### INSERT INTO `test`.`mdev9560`
### SET
###   @1=1588445526
### INSERT INTO `test`.`mdev9560`
### SET
###   @1=***Corrupted replication event was detected. Not printing the value***
# at 731
#160218 19:41:03 server id 1  end_log_pos 758   Xid = 1190
COMMIT/*!*/;
# at 758
#160218 19:41:09 server id 1  end_log_pos 801   Rotate to mysql-bin.000002  pos: 4
DELIMITER ;

However, replication to another 10.0 works on some reason.

Problem 2: assertion failure on 10.1 debug server

When 10.1 slave running debug server attempts to replicate the events above from 10.0 master, it goes down with the assertion failure:

Stack trace from 10.1 commit d80b8442a68093106e00a9a38b7b2c593002a72c

2016-02-18 19:44:01 140700899031808 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610
mysqld: /src/10.1/sql/log_event.cc:11696: virtual int Write_rows_log_event::do_exec_row(rpl_group_info*): Assertion `0' failed.
160218 19:44:01 [ERROR] mysqld got signal 6 ;
 
#7  0x00007ff778f951d2 in __assert_fail () from /lib64/libc.so.6
#8  0x000056524fb022a9 in Write_rows_log_event::do_exec_row (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:11696
#9  0x000056524fafdf5b in Rows_log_event::do_apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:9989
#10 0x000056524f72a043 in Log_event::apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.h:1343
#11 0x000056524f71fe00 in apply_event_and_update_pos (ev=0x7ff752487070, thd=0x7ff752422870, rgi=0x7ff752418000, rpt=0x0) at /src/10.1/sql/slave.cc:3374
#12 0x000056524f720884 in exec_relay_log_event (thd=0x7ff752422870, rli=0x7ff75b183e60, serial_rgi=0x7ff752418000) at /src/10.1/sql/slave.cc:3704
#13 0x000056524f72397f in handle_slave_sql (arg=0x7ff75b182240) at /src/10.1/sql/slave.cc:4773
#14 0x00007ff77aede0a4 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ff77904c04d in clone () from /lib64/libc.so.6

Problem 3: Replication failure on 10.1 release server

When 10.1 slave is running a non-debug server, it does not have the debug assertion, so it shows the replication failure instead:

From error log

2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610
2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 731, Gtid 0-1-3, Internal MariaDB error code: 1105
2016-02-18 19:46:15 140401416502016 [Warning] Slave: Unknown error Error_code: 1105
2016-02-18 19:46:15 140401416502016 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 610

Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

For this crash to happen, a more complicated scenario needs to be run on the 10.0 master (probably it's not minimal though):

SET binlog_format=ROW;
DROP TABLE IF EXISTS mdev9560, mdev9560_log;
 
CREATE TABLE mdev9560 (
   `DT` timestamp(3) NOT NULL,
   `EAN` char(13) NULL,
   `Listprijs` decimal(7,2) DEFAULT NULL,
   `Valuta` char(8) DEFAULT NULL
 );
CREATE TABLE mdev9560_log (
   `EAN` char(13) NOT NULL,
   `DT` timestamp(3) NULL,
   `Listprijs` decimal(7,2) DEFAULT NULL,
   `Valuta` char(8) DEFAULT NULL,
   PRIMARY KEY (`EAN`)
);
 
CREATE TRIGGER tr AFTER INSERT on mdev9560 FOR EACH ROW 
INSERT INTO mdev9560_log ( EAN, DT, Listprijs, Valuta )
 VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta )
 ON DUPLICATE KEY UPDATE DT=VALUES(DT),
 Listprijs = IFNULL(VALUES(Listprijs), Listprijs),
 Valuta = IFNULL(VALUES(Valuta), Valuta);
 
INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123','9999999999999',99.99,'EUR');
INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:07.123','9999999999999',99.99,'EUR');

On slave, server crashes with signal 11 and can't start anymore while replication is enabled (replication starts, crashes again and so on):

2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-5, Internal MariaDB error code: 1610
2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code
: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1681, Gtid 0-1-5, Internal MariaDB error code: 1105
2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-5, Internal MariaDB 
error code: 1610
2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560_log; Unknown error, Error_
code: 1105; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1739, Gtid 0-1-5, Int
ernal MariaDB error code: 1105
2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-6, Internal MariaDB erro
r code: 1610
2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code
: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1978, Gtid 0-1-6, Internal MariaDB error code: 1105
2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-6, Internal MariaDB 
error code: 1610
160218 19:56:09 [ERROR] mysqld got signal 11 ;

Stack trace from 10.1.11 binary tarball

#2  <signal handler called>
#3  unpack_row (rgi=rgi@entry=0x7f116d414600, table=0x7f116d489c08, colcnt=4, row_data=0x0, cols=cols@entry=0x7f116cc1a1e8, current_row_end=current_row_end@entry=0x7f116cc1a260, master_reclength=master_reclength@entry=0x7f116cc1a208, row_end=0x7f116d4ed0f2 "") at /home/buildbot/buildbot/build/sql/rpl_record.cc:226
#4  0x00005586fb1a10e8 in unpack_current_row (cols=0x7f116cc1a1e8, rgi=0x7f116d414600, this=0x7f116cc1a148) at /home/buildbot/buildbot/build/sql/log_event.h:4440
#5  Update_rows_log_event::do_exec_row (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:12463
#6  0x00005586fb193c92 in Rows_log_event::do_apply_event (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:9989
#7  0x00005586faef7ea1 in Log_event::apply_event (this=this@entry=0x7f116cc1a148, rgi=rgi@entry=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.h:1343
#8  0x00005586faeedd23 in apply_event_and_update_pos (ev=ev@entry=0x7f116cc1a148, thd=thd@entry=0x7f116d42d008, rgi=rgi@entry=0x7f116d414600, rpt=rpt@entry=0x0) at /home/buildbot/buildbot/build/sql/slave.cc:3374
#9  0x00005586faef142a in exec_relay_log_event (serial_rgi=0x7f116d414600, rli=0x7f1174e9e9a0, thd=0x7f116d42d008) at /home/buildbot/buildbot/build/sql/slave.cc:3704
#10 handle_slave_sql (arg=0x7f1174e9d000) at /home/buildbot/buildbot/build/sql/slave.cc:4773
#11 0x00007f11994180a4 in start_thread () from /lib64/libpthread.so.0


Since everything starts with a corrupted event on 10.0 master, which is recognized by mysqlbinlog from the same build, I suppose that the root problem is in 10.0. However, the crash is apparently a separate problem, maybe it can be even triggered by some other scenario, not involving timestamps, microseconds, and such.

As already discussed on IRC, I suggest to try to fix the 10.0 part of the problem before 10.0.24 release, so I raise it to a blocker for now. However, it won't fix already existing binary logs. I don't know why 10.0 => 10.0 replication works in the scenarios above, might be just some luck, in which case it's not reliable; but if there is any real logic behind this ability, it would be very good to keep it (and even have it in 10.1 as well), to allow to replicate from older 10.0 binary logs.

Comment by Sergei Golubchik [ 2016-02-18 ]

Problem 1: corrupt event

This is another manifestation of MDEV-5377. The event is not corrupted, but mysqlbinlog cannot parse it properly, and thinks that the event is corrupted.

Old MariaDB temporal data types with microseconds don't store enough metadata in RBR events, so it's only possible to read these RBR event on the slave that has exactly the same table structure as the master.

Because mysqlbinlog has no tables at all, it cannot parse these events correctly.

Comment by Sergei Golubchik [ 2016-02-18 ]

Problem 2: assertion failure on 10.1 debug server

and

Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

this will be fixed in the next 10.1 release (10.1.12)

Comment by Sergei Golubchik [ 2016-02-18 ]

Problem 3: Replication failure on 10.1 release server

This is a combined effect of MDEV-5377 and MDEV-5528.

Because of MDEV-5377 temporal times could not be always replicated using RBR. And they were incompatible with MySQL temporal types (because MariaDB had them before MySQL). Later we have added MySQL-compatible temporal types and in MDEV-5528 provided an option to use them by default. MySQL-compatible types can be safely replicated using RBR events.

So your 10.0 server creates old MariaDB timestamp(3) field. While your 10.1 server creates new MariaDB timestamp(3). If you use RBR, old timestamp(3) can be only replicated to identically defined table. But your 10.0 and 10.1 tables aren't identical, that's why the replication fails.

I'll see if we can recognize this situation on the slave and automatically create the compatible table structure.

But until then you can start your 10.1 server with --disable-mysql56-temporal-format option. The replication will work.

Comment by Elena Stepanova [ 2016-02-19 ]

CrewOne,

If you start your replication from a clean (empty) datadir, --disable-mysql56-temporal-format mentioned above might be a good workaround for you for now, even before you get fixes for crashes.
If you start replication from an already existing 10.1 directory where tables with microseconds have already been created, the workaround will be a little more complicated, tables will need to be ALTER-ed. If there are many tables, it should be possible to generate ALTER statements automatically based on the contents of INFORMATION_SCHEMA.COLUMNS.

Comment by Alexander Barkov [ 2016-02-19 ]

The patch is Ok to push.
I'd only add tests for the other data types (TIME and DATETIME).
Currently it tests only TIMESTAMP.

Comment by Sander Pilon [ 2016-02-22 ]

Is it safe to assume a fix will be in the next version? Currently replication is working 10.0->10.0 and I can wait a few weeks before trying 10.1 again.

Comment by Sergei Golubchik [ 2016-02-22 ]

Yes, it will be.
By the way, the release schedule is on our jira main page (http://mariadb.org/jira) and next 10.1 is planned for this week.

if you don't see the roadmap, please log in — it's a jira bug JRA-59335 that hopefully will be fixed soon

Generated at Thu Feb 08 07:35:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.