[MDEV-9560] Mariadb 10.1 Crashes when replicating from 10.0 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.23, 10.1.11
Fix Version/s: 10.0.25, 10.1.12
Component/s: Replication
Labels:
- replication

Description

When replicating from our 10.0 server, MariaDB 10.1.11 crashes. After the crash, it will not startup again unless all relay logs are deleted.

Servers running Ubuntu, both with 200G ram.

2016-02-15 15:12:47 140328625293056 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table Voorraadbeheer.InklaarLog; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704, Internal MariaDB error code: 1105

2016-02-15 15:12:47 140328625293056 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'Voorraadbeheer.InklaarPrijs', Gtid 0-765-1299574704, Internal MariaDB error code: 1610

160215 15:12:47 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.11-MariaDB-1~trusty-log

key_buffer_size=16777216

read_buffer_size=131072

max_used_connections=0

max_threads=502

thread_count=1

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1118986 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7f95a6c2b008

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7fa0cddcb590 thread_stack 0x80000

2016-02-15 15:12:47 140328649512704 [Note] Slave I/O thread: connected to master 'slave@jen.managementboek.nl:3306',replication started in log 'mysql-bin.056376' at position 692280882

addr2line: 'mysqld': No such file

mysqld(my_print_stacktrace+0x2e)[0x559e045013fe]

mysqld(handle_fatal_signal+0x38d)[0x559e0402ef9d]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fa0d02e5340]

mysqld(_Z10unpack_rowP14rpl_group_infoP5TABLEjPKhPK9st_bitmapPS4_PmS4_+0x89)[0x559e0410f0b9]

mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x220)[0x559e0410be80]

mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x31c)[0x559e040fe2cc]

mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14rpl_group_infoP19rpl_parallel_thread+0x1e1)[0x559e03e3a2e1]

mysqld(handle_slave_sql+0x261b)[0x559e03e3dd1b]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fa0d02dd182]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa0cfa0047d]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x0): is an invalid pointer

Connection ID (thread ID): 5

Status: NOT_KILLED

gdb shows the following:

(gdb) bt

#0  0x0000555555be30b9 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) ()

#1  0x0000555555bdfe80 in Update_rows_log_event::do_exec_row(rpl_group_info*) ()

#2  0x0000555555bd22cc in Rows_log_event::do_apply_event(rpl_group_info*) ()

#3  0x000055555590e2e1 in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) ()

#4  0x0000555555911d1b in handle_slave_sql ()

#5  0x00007ffff6afe182 in start_thread (arg=0x7ffff5c85700) at pthread_create.c:312

#6  0x00007ffff622147d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Slave my.cnf

[client]

port            = 3306

socket          = /var/run/mysqld/mysqld.sock

back_log=1500

# # UTF-8

character_set_server = utf8

collation_server = utf8_general_ci

# * Logging and Replication

#log_slow_queries       = /var/log/mysql/mysql-slow.log

long_query_time         = 1

log-queries-not-using-indexes

log-output              = TABLE

log_warnings            = 2

sync_binlog             = 0

binlog_checksum         = crc32

binlog_commit_wait_usec = 50000

binlog_commit_wait_count= 20

server-id               = 345

log_bin                 = /data/bigmomma/mysql/log/mysql-bin.log

binlog_format           = MIXED

expire_logs_days        = 7

max_binlog_size         = 1G

log-slave-updates

slave_compressed_protocol = 1

relay-log               = /data/bigmomma/mysql/log/mysql-relay.log

# INNODB #

innodb_log_files_in_group      = 2

innodb_flush_log_at_trx_commit = 0

innodb_buffer_pool_size        = 40G

innodb_file_per_table

innodb_log_file_size=1G

innodb_flush_method=O_DIRECT

innodb_file_format=barracuda

innodb_io_capacity = 20000

innodb_buffer_pool_instances=10

innodb_log_buffer_size=512M

#innodb_adaptive_checkpoint = keep_average

#innodb_flush_neighbor_pages = 0

group_concat_max_len = 8096

[mysqldump]

quick

quote-names

max_allowed_packet      = 16M

ignore-table            = telefoongids.white_subscriber

ignore-table            = telefoongids.pink_subscriber

Master my.cnf

[client]

port            = 3306

socket          = /var/run/mysqld/mysqld.sock

# Here is entries for some specific programs

# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed.

[mysqld_safe]

socket          = /var/run/mysqld/mysqld.sock

nice            = 0

[mysqld]

replicate-wild-ignore-table = %.Voorafgaand%_Klanten

replicate-wild-ignore-table = tmp.%

transaction-isolation =  READ-COMMITTED

user            = mysql

pid-file        = /var/run/mysqld/mysqld.pid

socket          = /var/run/mysqld/mysqld.sock

port            = 3306

basedir         = /usr

datadir         = /var/lib/mysql

tmpdir          = /tmp

lc-messages-dir = /usr/share/mysql

skip-external-locking

log-error = /data/bigmomma/mysql/log/error.log

innodb_stats_on_metadata = 0

lock_wait_timeout = 20

innodb_lock_wait_timeout = 20

# Instead of skip-networking the default is now to listen only on

# localhost which is more compatible and is not less secure.

bind-address            = 0.0.0.0

# GTIDs

master_info_repository = table

relay_log_info_repository = table

master-verify-checksum = on

slave-sql-verify-checksum = on

# * Fine Tuning

key_buffer              = 16M

#max_allowed_packet     = 16M

thread_stack            = 512K

thread_cache_size       = 8

myisam-recover         = BACKUP

# SAFETY #

max_allowed_packet             = 64M

max_connect_errors             = 1000000

sysdate_is_now                 = 1

max_connections=500

event_scheduler=ON

table_cache     = 3000

tmp_table_size                 = 64M

max_heap_table_size            = 64M

query_cache_type               = 1

query_cache_size               = 8M

max_connections                = 500

thread_cache_size              = 50

open_files_limit               = 65535

table_definition_cache         = 4096

table_open_cache               = 4096

max_sp_recursion_depth         = 2

back_log=1500

# UTF-8

character_set_server = utf8

collation_server = utf8_general_ci

# * Logging and Replication

#log_slow_queries       = /var/log/mysql/mysql-slow.log

long_query_time         = 1

log-queries-not-using-indexes

log-output              = TABLE

log_warnings            = 2

sync_binlog             = 0

binlog_checksum         = crc32

binlog_commit_wait_usec = 50000

binlog_commit_wait_count= 20

server-id               = 765

log_bin                 = /data/bigmomma/mysql/log/mysql-bin.log

binlog_format           = MIXED

expire_logs_days        = 7

max_binlog_size         = 1G

log-slave-updates

slave_compressed_protocol = 1

relay-log               = /data/bigmomma/mysql/log/mysql-relay.log

# INNODB #

innodb_log_files_in_group      = 2

innodb_flush_log_at_trx_commit = 0

innodb_buffer_pool_size        = 40G

innodb_file_per_table

innodb_log_file_size=1G

innodb_flush_method=O_DIRECT

innodb_file_format=barracuda

innodb_io_capacity = 20000

innodb_buffer_pool_instances=10

innodb_log_buffer_size=32M

group_concat_max_len = 8096

[mysqldump]

quick

quote-names

max_allowed_packet      = 16M

gtid                    = ON

ignore-table            = telefoongids.white_subscriber

ignore-table            = telefoongids.pink_subscriber

[mysql]

#no-auto-rehash # faster start of mysql but no tab completition

[isamchk]

key_buffer              = 16M

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mysql-bin.056373.decoded.2
5.68 MB
2016-02-16 10:48

Issue Links

duplicates

MDEV-9567 mysqlbinlog fails to read binlog event which inserts timestamp with wrong number of microseconds

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2016-02-15 17:28

CrewOne,

Can you paste the binlog event on which it chokes (mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704), and the output of

USE Voorraadbeheer;

SHOW CREATE TABLE InklaarPrijs;

SHOW CREATE TABLE InklaarLog;

SHOW TRIGGERS LIKE 'InklaarPrijs';

SHOW TRIGGERS LIKE 'InklaarLog';

and also attach your cnf file(s) from master and slave?

You said the server would not start again – what happens when it attempts to start? And what happens after you delete the relay logs?

Thanks.

Elena Stepanova added a comment - 2016-02-15 17:28 CrewOne , Can you paste the binlog event on which it chokes (mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704), and the output of USE Voorraadbeheer; SHOW CREATE TABLE InklaarPrijs; SHOW CREATE TABLE InklaarLog; SHOW TRIGGERS LIKE 'InklaarPrijs'; SHOW TRIGGERS LIKE 'InklaarLog'; and also attach your cnf file(s) from master and slave? You said the server would not start again – what happens when it attempts to start? And what happens after you delete the relay logs? Thanks.

Sander Pilon added a comment - 2016-02-15 17:54 - edited

When I delete the relay logs, the server starts. But obviously the replication does not.

I can't restart the replication after that because I do not know the correct position. I can restore the backup again and start the replication - but eventually it will crash. (Usually it will run for a while and replicate a bit before it crashes.)

But this is the second time this happened. Last week we had the exact same situation. This time is the same except with a fresh backup/restore cycle.

I've never managed to get any 10.1 version stable - not as a master and not as a slave (replicating from 10.0.) The idea was to replicate from 10.0 to 10.1 and see how stable 10.1 performs, but so far no luck.

We also tried 10.1 as a master once, but that crashed as well. So we went back to 10.0.

Sander Pilon added a comment - 2016-02-15 17:54 - edited When I delete the relay logs, the server starts. But obviously the replication does not. I can't restart the replication after that because I do not know the correct position. I can restore the backup again and start the replication - but eventually it will crash. (Usually it will run for a while and replicate a bit before it crashes.) But this is the second time this happened. Last week we had the exact same situation. This time is the same except with a fresh backup/restore cycle. I've never managed to get any 10.1 version stable - not as a master and not as a slave (replicating from 10.0.) The idea was to replicate from 10.0 to 10.1 and see how stable 10.1 performs, but so far no luck. We also tried 10.1 as a master once, but that crashed as well. So we went back to 10.0.

Sander Pilon added a comment - 2016-02-15 18:01

CREATE TABLE `InklaarPrijs` (

  `EAN` char(13) NOT NULL,

  `DT` timestamp(3) NULL DEFAULT NULL,

  `Listprijs` decimal(7,2) DEFAULT NULL,

  `Valuta` char(8) DEFAULT NULL,

  `KortingMP` decimal(5,2) DEFAULT NULL,

  PRIMARY KEY (`EAN`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

CREATE TABLE `InklaarLog` (

  `DT` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),

  `EAN` char(13) NOT NULL,

  `Listprijs` decimal(7,2) DEFAULT NULL,

  `Valuta` char(8) DEFAULT NULL,

  `KortingMP` decimal(5,2) DEFAULT NULL,

  PRIMARY KEY (`DT`,`EAN`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Before_Insert_Inkaarlog

BEGIN

	DECLARE kortingmp_ncoi DECIMAL(5,2);

	IF NEW.KortingMP IS NOT NULL THEN

		SELECT U.Kortingmp_NCOI INTO kortingmp_ncoi

		FROM Manpo.Boeken B

			JOIN Manpo.Uitgevers U USING ( UitgeverID )

		WHERE B.EAN = NEW.EAN;

		IF kortingmp_ncoi IS NOT NULL AND NEW.KortingMP = kortingmp_ncoi THEN

			SET NEW.KortingMP = null;

		END IF;

	END IF;

END

After_Insert_Inklaarlog

BEGIN

	INSERT INTO Voorraadbeheer.InklaarPrijs ( EAN, DT, Listprijs, Valuta, KortingMP )

		VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta, NEW.KortingMP )

		ON DUPLICATE KEY UPDATE DT=VALUES(DT),

			Listprijs = IFNULL(VALUES(Listprijs), Listprijs),

			Valuta = IFNULL(VALUES(Valuta), Valuta),

			KortingMP = IFNULL(VALUES(KortingMP), KortingMP);

END

Sander Pilon added a comment - 2016-02-15 18:01 CREATE TABLE `InklaarPrijs` ( `EAN` char(13) NOT NULL, `DT` timestamp(3) NULL DEFAULT NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL, `KortingMP` decimal(5,2) DEFAULT NULL, PRIMARY KEY (`EAN`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 CREATE TABLE `InklaarLog` ( `DT` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3), `EAN` char(13) NOT NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL, `KortingMP` decimal(5,2) DEFAULT NULL, PRIMARY KEY (`DT`,`EAN`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 Before_Insert_Inkaarlog BEGIN DECLARE kortingmp_ncoi DECIMAL(5,2); IF NEW.KortingMP IS NOT NULL THEN SELECT U.Kortingmp_NCOI INTO kortingmp_ncoi FROM Manpo.Boeken B JOIN Manpo.Uitgevers U USING ( UitgeverID ) WHERE B.EAN = NEW.EAN; IF kortingmp_ncoi IS NOT NULL AND NEW.KortingMP = kortingmp_ncoi THEN SET NEW.KortingMP = null; END IF; END IF; END After_Insert_Inklaarlog BEGIN INSERT INTO Voorraadbeheer.InklaarPrijs ( EAN, DT, Listprijs, Valuta, KortingMP ) VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta, NEW.KortingMP ) ON DUPLICATE KEY UPDATE DT=VALUES(DT), Listprijs = IFNULL(VALUES(Listprijs), Listprijs), Valuta = IFNULL(VALUES(Valuta), Valuta), KortingMP = IFNULL(VALUES(KortingMP), KortingMP); END

Sander Pilon added a comment - 2016-02-15 18:07

This does not seem to do anything:

mysqlbinlog --stop-position 217054452 mysql-bin.056376   | tail -n 20

# at 217040113

# at 217041126

# at 217042139

# at 217043152

# at 217044165

# at 217045178

# at 217046191

# at 217047204

# at 217048217

# at 217049230

# at 217050243

# at 217051256

# at 217052269

# at 217053282

# at 217054295

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Sander Pilon added a comment - 2016-02-15 18:07 This does not seem to do anything: mysqlbinlog --stop-position 217054452 mysql-bin.056376 | tail -n 20 # at 217040113 # at 217041126 # at 217042139 # at 217043152 # at 217044165 # at 217045178 # at 217046191 # at 217047204 # at 217048217 # at 217049230 # at 217050243 # at 217051256 # at 217052269 # at 217053282 # at 217054295 DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Elena Stepanova added a comment - 2016-02-15 19:19 - edited

Thanks.

This does not seem to do anything

Just dump the whole log then. Besides, we are more interested in the log 373, that's where the crash happened at the first place. Although, you can dump both (or all four of them).

Please use mysqlbinlog --verbose --base64-output=DECODE-ROWS mysql-bin.056373.

Regarding the structure of tables and triggers, is it certainly the same on master and slave?

Elena Stepanova added a comment - 2016-02-15 19:19 - edited Thanks. This does not seem to do anything Just dump the whole log then. Besides, we are more interested in the log 373, that's where the crash happened at the first place. Although, you can dump both (or all four of them). Please use mysqlbinlog --verbose --base64-output=DECODE-ROWS mysql-bin.056373 . Regarding the structure of tables and triggers, is it certainly the same on master and slave?

Sander Pilon added a comment - 2016-02-16 10:40 - edited

I attached what I hoped is the relevant section. Also the complete, zipped, file which extracts to 2.2Gb of data.

file: (98Mb zipped) http://we.tl/cvkJu6nxr8

mysqlbinlog --stop-position=217064452 --verbose --base64-output=DECODE-ROWS mysql-bin.056373

Sander Pilon added a comment - 2016-02-16 10:40 - edited I attached what I hoped is the relevant section. Also the complete, zipped, file which extracts to 2.2Gb of data. file: (98Mb zipped) http://we.tl/cvkJu6nxr8 mysqlbinlog --stop-position=217064452 --verbose --base64-output=DECODE-ROWS mysql-bin.056373

Sander Pilon added a comment - 2016-02-16 10:58 - edited

The slave has been freshly restored and the structure has been verified using SQLyog. Even so, it's preferred that MariaDB does not crash - ever - even if structures aren't the same. [Doh... :)]

Sander Pilon added a comment - 2016-02-16 10:58 - edited The slave has been freshly restored and the structure has been verified using SQLyog. Even so, it's preferred that MariaDB does not crash - ever - even if structures aren't the same. [Doh... :)]

Elena Stepanova added a comment - 2016-02-16 12:46

CrewOne,
Different structures are not an excuse for the crash, but a hint for us to reproduce the problem and find the root cause. So, are they actually different?

Elena Stepanova added a comment - 2016-02-16 12:46 CrewOne , Different structures are not an excuse for the crash, but a hint for us to reproduce the problem and find the root cause. So, are they actually different?

Sander Pilon added a comment - 2016-02-16 14:14

Nope, the same as far as I can tell.

Sander Pilon added a comment - 2016-02-16 14:14 Nope, the same as far as I can tell.

Elena Stepanova added a comment - 2016-02-17 02:19 - edited

CrewOne,

Thanks for the data. For future, you can upload files to ftp.askmonty.org/private. 2G is not a problem as long as it's okay for your upload. Also, only MariaDB developers have access to the data there, which allows to maintain some privacy.

I can reproduce at least a part of the problem, initially I thought it was an unrelated issue and filed it separately as ~~MDEV-9567~~, but, now I start thinking it might be the whole reason. However, I'd like to try to check it. Could you please upload the binary log mysql-bin.056373 itself (not decoded)? Maybe I'll be able to get something this way. If you know which values were actually inserted by that unfortunate event, it could also help – since the event appears to be corrupted, there is no saying what was there.

Elena Stepanova added a comment - 2016-02-17 02:19 - edited CrewOne , Thanks for the data. For future, you can upload files to ftp.askmonty.org/private. 2G is not a problem as long as it's okay for your upload. Also, only MariaDB developers have access to the data there, which allows to maintain some privacy. I can reproduce at least a part of the problem, initially I thought it was an unrelated issue and filed it separately as MDEV-9567 , but, now I start thinking it might be the whole reason. However, I'd like to try to check it. Could you please upload the binary log mysql-bin.056373 itself (not decoded)? Maybe I'll be able to get something this way. If you know which values were actually inserted by that unfortunate event, it could also help – since the event appears to be corrupted, there is no saying what was there.

Sander Pilon added a comment - 2016-02-17 10:39

I will upload a file called ~~MDEV-9560~~.7z to that FTP shortly, it is the requested binlog file. (from the master - not the slave)

Also, I will have to continue development and that means I have to revert the slave back to a 10.0 server today. Maybe I can setup a 10.1 slave on a seperate machine later, if I can help doing that or running a debug build, let me know.

Sander Pilon added a comment - 2016-02-17 10:39 I will upload a file called MDEV-9560 .7z to that FTP shortly, it is the requested binlog file. (from the master - not the slave) Also, I will have to continue development and that means I have to revert the slave back to a 10.0 server today. Maybe I can setup a 10.1 slave on a seperate machine later, if I can help doing that or running a debug build, let me know.

Sander Pilon added a comment - 2016-02-17 10:59 - edited

I have reinstalled 10.0.23 - but after I start it, it seems to crash with more or less the same error.

From gdb:

#0  0x0000555555bbd7b6 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) ()

#1  0x0000555555bb99dd in Update_rows_log_event::do_exec_row(rpl_group_info*) ()

#2  0x0000555555bab865 in Rows_log_event::do_apply_event(rpl_group_info*) ()

#3  0x00005555558d760d in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) ()

#4  0x00005555558da06e in handle_slave_sql ()

#5  0x00007ffff7570182 in start_thread (arg=0x7ffff60c3700) at pthread_create.c:312

#6  0x00007ffff645547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I had to erase all logs and database files, I hope replication will work now.

Sander Pilon added a comment - 2016-02-17 10:59 - edited I have reinstalled 10.0.23 - but after I start it, it seems to crash with more or less the same error. From gdb: #0 0x0000555555bbd7b6 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) () #1 0x0000555555bb99dd in Update_rows_log_event::do_exec_row(rpl_group_info*) () #2 0x0000555555bab865 in Rows_log_event::do_apply_event(rpl_group_info*) () #3 0x00005555558d760d in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) () #4 0x00005555558da06e in handle_slave_sql () #5 0x00007ffff7570182 in start_thread (arg=0x7ffff60c3700) at pthread_create.c:312 #6 0x00007ffff645547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 I had to erase all logs and database files, I hope replication will work now.

Elena Stepanova added a comment - 2016-02-18 02:56

CrewOne,

Thanks for the data.
I can now reproduce the crash, but there is something that does not add up.
The only way I know to get the error log that you pasted in the description is to set slave_skip_errors=1105 on slave; but it's not in the slave cnf file.
Is it possible that you have another cnf somewhere which is also used by the slave, or that you set this option on mysqld command line?

Elena Stepanova added a comment - 2016-02-18 02:56 CrewOne , Thanks for the data. I can now reproduce the crash, but there is something that does not add up. The only way I know to get the error log that you pasted in the description is to set slave_skip_errors=1105 on slave; but it's not in the slave cnf file. Is it possible that you have another cnf somewhere which is also used by the slave, or that you set this option on mysqld command line?

Sander Pilon added a comment - 2016-02-18 10:48 - edited

Indeed, you are right. I'm sorry, but the slave.cnf you have seems to be missing a part. (I edited out the top bit of comments, but seem to have been a bit too enthousiastic.)

replicate-wild-ignore-table = tmp.%

replicate-wild-ignore-table = Voorraadbeheer.VBMIH_tekst_cache

replicate-wild-ignore-table = Managementboek_Klanten.OrderschermRegels

replicate-wild-ignore-table = Voorraadbeheer.GeprinteVoorraad

replicate-wild-ignore-table = php_sessions.%

replicate-ignore-db=Computerboek_Nieuwsbrieven

replicate-ignore-db=Juridischboek_Nieuwsbrieven

relay-log-space-limit=5G

slave-parallel-threads=4

slave-skip-errors=1105

transaction-isolation =  READ-COMMITTED

That particular line, skipping 1105 was introduced when we tried replicating but it stopped a lot with that error. I remember reading about that somehwere, but honestly... it has been a long time ago. I think it was when 10.1 hit the first GA release.

Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

Sander Pilon added a comment - 2016-02-18 10:48 - edited Indeed, you are right. I'm sorry, but the slave.cnf you have seems to be missing a part. (I edited out the top bit of comments, but seem to have been a bit too enthousiastic.) replicate-wild-ignore-table = tmp.% replicate-wild-ignore-table = Voorraadbeheer.VBMIH_tekst_cache replicate-wild-ignore-table = Managementboek_Klanten.OrderschermRegels replicate-wild-ignore-table = Voorraadbeheer.GeprinteVoorraad replicate-wild-ignore-table = php_sessions.% replicate-ignore-db=Computerboek_Nieuwsbrieven replicate-ignore-db=Juridischboek_Nieuwsbrieven relay-log-space-limit=5G slave-parallel-threads=4 slave-skip-errors=1105 transaction-isolation = READ-COMMITTED That particular line, skipping 1105 was introduced when we tried replicating but it stopped a lot with that error. I remember reading about that somehwere, but honestly... it has been a long time ago. I think it was when 10.1 hit the first GA release. Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

Elena Stepanova added a comment - 2016-02-18 12:04

Okay, thanks, now it makes sense.

Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

I don't think so (will tell you for sure later today), but I think it won't crash, at least not this way, more likely replication will just abort as it usually does.
Skipping the error apparently re-shuffles things a bit and makes the server hit an error which it would otherwise never reach.

Elena Stepanova added a comment - 2016-02-18 12:04 Okay, thanks, now it makes sense. Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1? I don't think so (will tell you for sure later today), but I think it won't crash, at least not this way, more likely replication will just abort as it usually does. Skipping the error apparently re-shuffles things a bit and makes the server hit an error which it would otherwise never reach.

Elena Stepanova added a comment - 2016-02-18 20:09 - edited

So, here is a summary of what we have here. I can still be wrong in some details, if so, serg will know out the truth as he's investigating now.

Problem 1: corrupt event

It happens when 10.0 server executes the following with the binary log enabled in row format:

SET binlog_format=ROW;

DROP TABLE IF EXISTS mdev9560;

CREATE TABLE mdev9560 (ts TIMESTAMP(3));

INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123');

The important part here is TIMESTAMP(3).

When mysqlbinlog attempts to read this binlog, it reports a corrupted event:

*!100001 SET @@session.gtid_seq_no=3*//*!*/;

BEGIN

/*!*/;

# at 648

# at 695

#160218 19:41:03 server id 1  end_log_pos 695   Table_map: `test`.`mdev9560` mapped to number 173

#160218 19:41:03 server id 1  end_log_pos 731   Write_rows: table id 173 flags: STMT_END_F

### INSERT INTO `test`.`mdev9560`

### SET

###   @1=1588445526

### INSERT INTO `test`.`mdev9560`

### SET

###   @1=***Corrupted replication event was detected. Not printing the value***

# at 731

#160218 19:41:03 server id 1  end_log_pos 758   Xid = 1190

COMMIT/*!*/;

# at 758

#160218 19:41:09 server id 1  end_log_pos 801   Rotate to mysql-bin.000002  pos: 4

DELIMITER ;

However, replication to another 10.0 works on some reason.

Problem 2: assertion failure on 10.1 debug server

When 10.1 slave running debug server attempts to replicate the events above from 10.0 master, it goes down with the assertion failure:

Stack trace from 10.1 commit d80b8442a68093106e00a9a38b7b2c593002a72c
2016-02-18 19:44:01 140700899031808 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610
mysqld: /src/10.1/sql/log_event.cc:11696: virtual int Write_rows_log_event::do_exec_row(rpl_group_info*): Assertion `0' failed.
160218 19:44:01 [ERROR] mysqld got signal 6 ;

#7 0x00007ff778f951d2 in __assert_fail () from /lib64/libc.so.6
#8 0x000056524fb022a9 in Write_rows_log_event::do_exec_row (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:11696
#9 0x000056524fafdf5b in Rows_log_event::do_apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:9989
#10 0x000056524f72a043 in Log_event::apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.h:1343
#11 0x000056524f71fe00 in apply_event_and_update_pos (ev=0x7ff752487070, thd=0x7ff752422870, rgi=0x7ff752418000, rpt=0x0) at /src/10.1/sql/slave.cc:3374
#12 0x000056524f720884 in exec_relay_log_event (thd=0x7ff752422870, rli=0x7ff75b183e60, serial_rgi=0x7ff752418000) at /src/10.1/sql/slave.cc:3704
#13 0x000056524f72397f in handle_slave_sql (arg=0x7ff75b182240) at /src/10.1/sql/slave.cc:4773
#14 0x00007ff77aede0a4 in start_thread () from /lib64/libpthread.so.0
#15 0x00007ff77904c04d in clone () from /lib64/libc.so.6

Problem 3: Replication failure on 10.1 release server

When 10.1 slave is running a non-debug server, it does not have the debug assertion, so it shows the replication failure instead:

From error log
2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610
2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 731, Gtid 0-1-3, Internal MariaDB error code: 1105
2016-02-18 19:46:15 140401416502016 [Warning] Slave: Unknown error Error_code: 1105
2016-02-18 19:46:15 140401416502016 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 610

Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

For this crash to happen, a more complicated scenario needs to be run on the 10.0 master (probably it's not minimal though):

SET binlog_format=ROW;

DROP TABLE IF EXISTS mdev9560, mdev9560_log;

CREATE TABLE mdev9560 (

   `DT` timestamp(3) NOT NULL,

   `EAN` char(13) NULL,

   `Listprijs` decimal(7,2) DEFAULT NULL,

   `Valuta` char(8) DEFAULT NULL

);

CREATE TABLE mdev9560_log (

   `EAN` char(13) NOT NULL,

   `DT` timestamp(3) NULL,

   `Listprijs` decimal(7,2) DEFAULT NULL,

   `Valuta` char(8) DEFAULT NULL,

   PRIMARY KEY (`EAN`)

);

CREATE TRIGGER tr AFTER INSERT on mdev9560 FOR EACH ROW

INSERT INTO mdev9560_log ( EAN, DT, Listprijs, Valuta )

 VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta )

 ON DUPLICATE KEY UPDATE DT=VALUES(DT),

 Listprijs = IFNULL(VALUES(Listprijs), Listprijs),

 Valuta = IFNULL(VALUES(Valuta), Valuta);

INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123','9999999999999',99.99,'EUR');

INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:07.123','9999999999999',99.99,'EUR');

On slave, server crashes with signal 11 and can't start anymore while replication is enabled (replication starts, crashes again and so on):

2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-5, Internal MariaDB error code: 1610

2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code

: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1681, Gtid 0-1-5, Internal MariaDB error code: 1105

2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-5, Internal MariaDB

error code: 1610

2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560_log; Unknown error, Error_

code: 1105; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1739, Gtid 0-1-5, Int

ernal MariaDB error code: 1105

2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-6, Internal MariaDB erro

r code: 1610

2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code

: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1978, Gtid 0-1-6, Internal MariaDB error code: 1105

2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-6, Internal MariaDB

error code: 1610

160218 19:56:09 [ERROR] mysqld got signal 11 ;

Stack trace from 10.1.11 binary tarball
#2 <signal handler called>
#3 unpack_row (rgi=rgi@entry=0x7f116d414600, table=0x7f116d489c08, colcnt=4, row_data=0x0, cols=cols@entry=0x7f116cc1a1e8, current_row_end=current_row_end@entry=0x7f116cc1a260, master_reclength=master_reclength@entry=0x7f116cc1a208, row_end=0x7f116d4ed0f2 "") at /home/buildbot/buildbot/build/sql/rpl_record.cc:226
#4 0x00005586fb1a10e8 in unpack_current_row (cols=0x7f116cc1a1e8, rgi=0x7f116d414600, this=0x7f116cc1a148) at /home/buildbot/buildbot/build/sql/log_event.h:4440
#5 Update_rows_log_event::do_exec_row (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:12463
#6 0x00005586fb193c92 in Rows_log_event::do_apply_event (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:9989
#7 0x00005586faef7ea1 in Log_event::apply_event (this=this@entry=0x7f116cc1a148, rgi=rgi@entry=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.h:1343
#8 0x00005586faeedd23 in apply_event_and_update_pos (ev=ev@entry=0x7f116cc1a148, thd=thd@entry=0x7f116d42d008, rgi=rgi@entry=0x7f116d414600, rpt=rpt@entry=0x0) at /home/buildbot/buildbot/build/sql/slave.cc:3374
#9 0x00005586faef142a in exec_relay_log_event (serial_rgi=0x7f116d414600, rli=0x7f1174e9e9a0, thd=0x7f116d42d008) at /home/buildbot/buildbot/build/sql/slave.cc:3704
#10 handle_slave_sql (arg=0x7f1174e9d000) at /home/buildbot/buildbot/build/sql/slave.cc:4773
#11 0x00007f11994180a4 in start_thread () from /lib64/libpthread.so.0

Since everything starts with a corrupted event on 10.0 master, which is recognized by mysqlbinlog from the same build, I suppose that the root problem is in 10.0. However, the crash is apparently a separate problem, maybe it can be even triggered by some other scenario, not involving timestamps, microseconds, and such.

As already discussed on IRC, I suggest to try to fix the 10.0 part of the problem before 10.0.24 release, so I raise it to a blocker for now. However, it won't fix already existing binary logs. I don't know why 10.0 => 10.0 replication works in the scenarios above, might be just some luck, in which case it's not reliable; but if there is any real logic behind this ability, it would be very good to keep it (and even have it in 10.1 as well), to allow to replicate from older 10.0 binary logs.

Elena Stepanova added a comment - 2016-02-18 20:09 - edited So, here is a summary of what we have here. I can still be wrong in some details, if so, serg will know out the truth as he's investigating now. Problem 1: corrupt event It happens when 10.0 server executes the following with the binary log enabled in row format: SET binlog_format=ROW; DROP TABLE IF EXISTS mdev9560; CREATE TABLE mdev9560 (ts TIMESTAMP(3)); INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123'); The important part here is TIMESTAMP(3) . When mysqlbinlog attempts to read this binlog, it reports a corrupted event: *!100001 SET @@session.gtid_seq_no=3*//*!*/; BEGIN /*!*/; # at 648 # at 695 #160218 19:41:03 server id 1 end_log_pos 695 Table_map: `test`.`mdev9560` mapped to number 173 #160218 19:41:03 server id 1 end_log_pos 731 Write_rows: table id 173 flags: STMT_END_F ### INSERT INTO `test`.`mdev9560` ### SET ### @1=1588445526 ### INSERT INTO `test`.`mdev9560` ### SET ### @1=***Corrupted replication event was detected. Not printing the value*** # at 731 #160218 19:41:03 server id 1 end_log_pos 758 Xid = 1190 COMMIT/*!*/; # at 758 #160218 19:41:09 server id 1 end_log_pos 801 Rotate to mysql-bin.000002 pos: 4 DELIMITER ; However, replication to another 10.0 works on some reason. Problem 2: assertion failure on 10.1 debug server When 10.1 slave running debug server attempts to replicate the events above from 10.0 master, it goes down with the assertion failure: Stack trace from 10.1 commit d80b8442a68093106e00a9a38b7b2c593002a72c 2016-02-18 19:44:01 140700899031808 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610 mysqld: /src/10.1/sql/log_event.cc:11696: virtual int Write_rows_log_event::do_exec_row(rpl_group_info*): Assertion `0' failed. 160218 19:44:01 [ERROR] mysqld got signal 6 ; #7 0x00007ff778f951d2 in __assert_fail () from /lib64/libc.so.6 #8 0x000056524fb022a9 in Write_rows_log_event::do_exec_row (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:11696 #9 0x000056524fafdf5b in Rows_log_event::do_apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:9989 #10 0x000056524f72a043 in Log_event::apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.h:1343 #11 0x000056524f71fe00 in apply_event_and_update_pos (ev=0x7ff752487070, thd=0x7ff752422870, rgi=0x7ff752418000, rpt=0x0) at /src/10.1/sql/slave.cc:3374 #12 0x000056524f720884 in exec_relay_log_event (thd=0x7ff752422870, rli=0x7ff75b183e60, serial_rgi=0x7ff752418000) at /src/10.1/sql/slave.cc:3704 #13 0x000056524f72397f in handle_slave_sql (arg=0x7ff75b182240) at /src/10.1/sql/slave.cc:4773 #14 0x00007ff77aede0a4 in start_thread () from /lib64/libpthread.so.0 #15 0x00007ff77904c04d in clone () from /lib64/libc.so.6 Problem 3: Replication failure on 10.1 release server When 10.1 slave is running a non-debug server, it does not have the debug assertion, so it shows the replication failure instead: From error log 2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610 2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 731, Gtid 0-1-3, Internal MariaDB error code: 1105 2016-02-18 19:46:15 140401416502016 [Warning] Slave: Unknown error Error_code: 1105 2016-02-18 19:46:15 140401416502016 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 610 Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105 For this crash to happen, a more complicated scenario needs to be run on the 10.0 master (probably it's not minimal though): SET binlog_format=ROW; DROP TABLE IF EXISTS mdev9560, mdev9560_log; CREATE TABLE mdev9560 ( `DT` timestamp(3) NOT NULL, `EAN` char(13) NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL ); CREATE TABLE mdev9560_log ( `EAN` char(13) NOT NULL, `DT` timestamp(3) NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL, PRIMARY KEY (`EAN`) ); CREATE TRIGGER tr AFTER INSERT on mdev9560 FOR EACH ROW INSERT INTO mdev9560_log ( EAN, DT, Listprijs, Valuta ) VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta ) ON DUPLICATE KEY UPDATE DT=VALUES(DT), Listprijs = IFNULL(VALUES(Listprijs), Listprijs), Valuta = IFNULL(VALUES(Valuta), Valuta); INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123','9999999999999',99.99,'EUR'); INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:07.123','9999999999999',99.99,'EUR'); On slave, server crashes with signal 11 and can't start anymore while replication is enabled (replication starts, crashes again and so on): 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-5, Internal MariaDB error code: 1610 2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code : 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1681, Gtid 0-1-5, Internal MariaDB error code: 1105 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-5, Internal MariaDB error code: 1610 2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560_log; Unknown error, Error_ code: 1105; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1739, Gtid 0-1-5, Int ernal MariaDB error code: 1105 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-6, Internal MariaDB erro r code: 1610 2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code : 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1978, Gtid 0-1-6, Internal MariaDB error code: 1105 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-6, Internal MariaDB error code: 1610 160218 19:56:09 [ERROR] mysqld got signal 11 ; Stack trace from 10.1.11 binary tarball #2 <signal handler called> #3 unpack_row (rgi=rgi@entry=0x7f116d414600, table=0x7f116d489c08, colcnt=4, row_data=0x0, cols=cols@entry=0x7f116cc1a1e8, current_row_end=current_row_end@entry=0x7f116cc1a260, master_reclength=master_reclength@entry=0x7f116cc1a208, row_end=0x7f116d4ed0f2 "") at /home/buildbot/buildbot/build/sql/rpl_record.cc:226 #4 0x00005586fb1a10e8 in unpack_current_row (cols=0x7f116cc1a1e8, rgi=0x7f116d414600, this=0x7f116cc1a148) at /home/buildbot/buildbot/build/sql/log_event.h:4440 #5 Update_rows_log_event::do_exec_row (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:12463 #6 0x00005586fb193c92 in Rows_log_event::do_apply_event (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:9989 #7 0x00005586faef7ea1 in Log_event::apply_event (this=this@entry=0x7f116cc1a148, rgi=rgi@entry=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.h:1343 #8 0x00005586faeedd23 in apply_event_and_update_pos (ev=ev@entry=0x7f116cc1a148, thd=thd@entry=0x7f116d42d008, rgi=rgi@entry=0x7f116d414600, rpt=rpt@entry=0x0) at /home/buildbot/buildbot/build/sql/slave.cc:3374 #9 0x00005586faef142a in exec_relay_log_event (serial_rgi=0x7f116d414600, rli=0x7f1174e9e9a0, thd=0x7f116d42d008) at /home/buildbot/buildbot/build/sql/slave.cc:3704 #10 handle_slave_sql (arg=0x7f1174e9d000) at /home/buildbot/buildbot/build/sql/slave.cc:4773 #11 0x00007f11994180a4 in start_thread () from /lib64/libpthread.so.0 Since everything starts with a corrupted event on 10.0 master, which is recognized by mysqlbinlog from the same build, I suppose that the root problem is in 10.0. However, the crash is apparently a separate problem, maybe it can be even triggered by some other scenario, not involving timestamps, microseconds, and such. As already discussed on IRC, I suggest to try to fix the 10.0 part of the problem before 10.0.24 release, so I raise it to a blocker for now. However, it won't fix already existing binary logs. I don't know why 10.0 => 10.0 replication works in the scenarios above, might be just some luck, in which case it's not reliable; but if there is any real logic behind this ability, it would be very good to keep it (and even have it in 10.1 as well), to allow to replicate from older 10.0 binary logs.

Sergei Golubchik added a comment - 2016-02-18 22:18

Problem 1: corrupt event

This is another manifestation of ~~MDEV-5377~~. The event is not corrupted, but mysqlbinlog cannot parse it properly, and thinks that the event is corrupted.

Old MariaDB temporal data types with microseconds don't store enough metadata in RBR events, so it's only possible to read these RBR event on the slave that has exactly the same table structure as the master.

Because mysqlbinlog has no tables at all, it cannot parse these events correctly.

Sergei Golubchik added a comment - 2016-02-18 22:18 Problem 1: corrupt event This is another manifestation of MDEV-5377 . The event is not corrupted, but mysqlbinlog cannot parse it properly, and thinks that the event is corrupted. Old MariaDB temporal data types with microseconds don't store enough metadata in RBR events, so it's only possible to read these RBR event on the slave that has exactly the same table structure as the master. Because mysqlbinlog has no tables at all, it cannot parse these events correctly.

Sergei Golubchik added a comment - 2016-02-18 23:19 - edited

Problem 2: assertion failure on 10.1 debug server

and

Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

this will be fixed in the next 10.1 release (10.1.12)

Sergei Golubchik added a comment - 2016-02-18 23:19 - edited Problem 2: assertion failure on 10.1 debug server and Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105 this will be fixed in the next 10.1 release (10.1.12)

Sergei Golubchik added a comment - 2016-02-18 23:27

Problem 3: Replication failure on 10.1 release server

This is a combined effect of ~~MDEV-5377~~ and ~~MDEV-5528~~.

Because of ~~MDEV-5377~~ temporal times could not be always replicated using RBR. And they were incompatible with MySQL temporal types (because MariaDB had them before MySQL). Later we have added MySQL-compatible temporal types and in ~~MDEV-5528~~ provided an option to use them by default. MySQL-compatible types can be safely replicated using RBR events.

So your 10.0 server creates old MariaDB timestamp(3) field. While your 10.1 server creates new MariaDB timestamp(3). If you use RBR, old timestamp(3) can be only replicated to identically defined table. But your 10.0 and 10.1 tables aren't identical, that's why the replication fails.

I'll see if we can recognize this situation on the slave and automatically create the compatible table structure.

But until then you can start your 10.1 server with --disable-mysql56-temporal-format option. The replication will work.

Sergei Golubchik added a comment - 2016-02-18 23:27 Problem 3: Replication failure on 10.1 release server This is a combined effect of MDEV-5377 and MDEV-5528 . Because of MDEV-5377 temporal times could not be always replicated using RBR. And they were incompatible with MySQL temporal types (because MariaDB had them before MySQL). Later we have added MySQL-compatible temporal types and in MDEV-5528 provided an option to use them by default. MySQL-compatible types can be safely replicated using RBR events. So your 10.0 server creates old MariaDB timestamp(3) field. While your 10.1 server creates new MariaDB timestamp(3). If you use RBR, old timestamp(3) can be only replicated to identically defined table. But your 10.0 and 10.1 tables aren't identical, that's why the replication fails. I'll see if we can recognize this situation on the slave and automatically create the compatible table structure. But until then you can start your 10.1 server with --disable-mysql56-temporal-format option. The replication will work.

Elena Stepanova added a comment - 2016-02-19 01:07

CrewOne,

If you start your replication from a clean (empty) datadir, --disable-mysql56-temporal-format mentioned above might be a good workaround for you for now, even before you get fixes for crashes.
If you start replication from an already existing 10.1 directory where tables with microseconds have already been created, the workaround will be a little more complicated, tables will need to be ALTER-ed. If there are many tables, it should be possible to generate ALTER statements automatically based on the contents of INFORMATION_SCHEMA.COLUMNS.

Elena Stepanova added a comment - 2016-02-19 01:07 CrewOne , If you start your replication from a clean (empty) datadir, --disable-mysql56-temporal-format mentioned above might be a good workaround for you for now, even before you get fixes for crashes. If you start replication from an already existing 10.1 directory where tables with microseconds have already been created, the workaround will be a little more complicated, tables will need to be ALTER-ed. If there are many tables, it should be possible to generate ALTER statements automatically based on the contents of INFORMATION_SCHEMA.COLUMNS .

Alexander Barkov added a comment - 2016-02-19 17:53

The patch is Ok to push.
I'd only add tests for the other data types (TIME and DATETIME).
Currently it tests only TIMESTAMP.

Alexander Barkov added a comment - 2016-02-19 17:53 The patch is Ok to push. I'd only add tests for the other data types (TIME and DATETIME). Currently it tests only TIMESTAMP.

Sander Pilon added a comment - 2016-02-22 10:32

Is it safe to assume a fix will be in the next version? Currently replication is working 10.0->10.0 and I can wait a few weeks before trying 10.1 again.

Sander Pilon added a comment - 2016-02-22 10:32 Is it safe to assume a fix will be in the next version? Currently replication is working 10.0->10.0 and I can wait a few weeks before trying 10.1 again.

Sergei Golubchik added a comment - 2016-02-22 10:52

Yes, it will be.
By the way, the release schedule is on our jira main page (http://mariadb.org/jira) and next 10.1 is planned for this week.

if you don't see the roadmap, please log in — it's a jira bug JRA-59335 that hopefully will be fixed soon

Sergei Golubchik added a comment - 2016-02-22 10:52 Yes, it will be. By the way, the release schedule is on our jira main page ( http://mariadb.org/jira ) and next 10.1 is planned for this week. if you don't see the roadmap, please log in — it's a jira bug JRA-59335 that hopefully will be fixed soon

People

Assignee:: Sergei Golubchik

Reporter:: Sander Pilon

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2016-02-15 16:19

Updated:: 2016-03-21 10:43

Resolved:: 2016-02-23 12:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

Problem 1: corrupt event

Problem 2: assertion failure on 10.1 debug server

Problem 3: Replication failure on 10.1 release server

Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

Problem 1: corrupt event

Problem 2: assertion failure on 10.1 debug server

Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

Problem 3: Replication failure on 10.1 release server

People

Dates

Git Integration