Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9560

Mariadb 10.1 Crashes when replicating from 10.0

Details

    Description

      When replicating from our 10.0 server, MariaDB 10.1.11 crashes. After the crash, it will not startup again unless all relay logs are deleted.

      Servers running Ubuntu, both with 200G ram.

      2016-02-15 15:12:47 140328625293056 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table Voorraadbeheer.InklaarLog; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704, Internal MariaDB error code: 1105
      2016-02-15 15:12:47 140328625293056 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'Voorraadbeheer.InklaarPrijs', Gtid 0-765-1299574704, Internal MariaDB error code: 1610
      160215 15:12:47 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.1.11-MariaDB-1~trusty-log
      key_buffer_size=16777216
      read_buffer_size=131072
      max_used_connections=0
      max_threads=502
      thread_count=1
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1118986 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0x7f95a6c2b008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fa0cddcb590 thread_stack 0x80000
      2016-02-15 15:12:47 140328649512704 [Note] Slave I/O thread: connected to master 'slave@jen.managementboek.nl:3306',replication started in log 'mysql-bin.056376' at position 692280882
      addr2line: 'mysqld': No such file
      mysqld(my_print_stacktrace+0x2e)[0x559e045013fe]
      mysqld(handle_fatal_signal+0x38d)[0x559e0402ef9d]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fa0d02e5340]
      mysqld(_Z10unpack_rowP14rpl_group_infoP5TABLEjPKhPK9st_bitmapPS4_PmS4_+0x89)[0x559e0410f0b9]
      mysqld(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x220)[0x559e0410be80]
      mysqld(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x31c)[0x559e040fe2cc]
      mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14rpl_group_infoP19rpl_parallel_thread+0x1e1)[0x559e03e3a2e1]
      mysqld(handle_slave_sql+0x261b)[0x559e03e3dd1b]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fa0d02dd182]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa0cfa0047d]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): is an invalid pointer
      Connection ID (thread ID): 5
      Status: NOT_KILLED

      gdb shows the following:

      (gdb) bt
      #0  0x0000555555be30b9 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) ()
      #1  0x0000555555bdfe80 in Update_rows_log_event::do_exec_row(rpl_group_info*) ()
      #2  0x0000555555bd22cc in Rows_log_event::do_apply_event(rpl_group_info*) ()
      #3  0x000055555590e2e1 in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) ()
      #4  0x0000555555911d1b in handle_slave_sql ()
      #5  0x00007ffff6afe182 in start_thread (arg=0x7ffff5c85700) at pthread_create.c:312
      #6  0x00007ffff622147d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

      Slave my.cnf

      [client]
      port            = 3306
      socket          = /var/run/mysqld/mysqld.sock
      back_log=1500
       
      # # UTF-8
      character_set_server = utf8
      collation_server = utf8_general_ci
       
      #
      # * Logging and Replication
      #
      #log_slow_queries       = /var/log/mysql/mysql-slow.log
      long_query_time         = 1
      log-queries-not-using-indexes
      log-output              = TABLE
      log_warnings            = 2
       
      sync_binlog             = 0
      binlog_checksum         = crc32
      binlog_commit_wait_usec = 50000
      binlog_commit_wait_count= 20
      server-id               = 345 
      log_bin                 = /data/bigmomma/mysql/log/mysql-bin.log
      binlog_format           = MIXED
      expire_logs_days        = 7
      max_binlog_size         = 1G
      log-slave-updates
      slave_compressed_protocol = 1
      relay-log               = /data/bigmomma/mysql/log/mysql-relay.log
       
       
      # INNODB #
      innodb_log_files_in_group      = 2
      innodb_flush_log_at_trx_commit = 0
      innodb_buffer_pool_size        = 40G
      innodb_file_per_table
      innodb_log_file_size=1G
      innodb_flush_method=O_DIRECT
      innodb_file_format=barracuda
      innodb_io_capacity = 20000
      innodb_buffer_pool_instances=10
      innodb_log_buffer_size=512M
      #innodb_adaptive_checkpoint = keep_average
      #innodb_flush_neighbor_pages = 0
      group_concat_max_len = 8096
       
      [mysqldump]
      quick
      quote-names
      max_allowed_packet      = 16M
      ignore-table            = telefoongids.white_subscriber
      ignore-table            = telefoongids.pink_subscriber

      Master my.cnf

      [client]
      port            = 3306
      socket          = /var/run/mysqld/mysqld.sock
       
      # Here is entries for some specific programs
      # The following values assume you have at least 32M ram
       
      # This was formally known as [safe_mysqld]. Both versions are currently parsed.
      [mysqld_safe]
      socket          = /var/run/mysqld/mysqld.sock
      nice            = 0
       
      [mysqld]
      replicate-wild-ignore-table = %.Voorafgaand%_Klanten
      replicate-wild-ignore-table = tmp.%
      transaction-isolation =  READ-COMMITTED
       
      user            = mysql
      pid-file        = /var/run/mysqld/mysqld.pid
      socket          = /var/run/mysqld/mysqld.sock
      port            = 3306
      basedir         = /usr
      datadir         = /var/lib/mysql
      tmpdir          = /tmp
      lc-messages-dir = /usr/share/mysql
      skip-external-locking
       
      log-error = /data/bigmomma/mysql/log/error.log
      innodb_stats_on_metadata = 0
      lock_wait_timeout = 20
      innodb_lock_wait_timeout = 20
       
      #
      # Instead of skip-networking the default is now to listen only on
      # localhost which is more compatible and is not less secure.
      bind-address            = 0.0.0.0
       
      # GTIDs
      master_info_repository = table
      relay_log_info_repository = table
      master-verify-checksum = on
      slave-sql-verify-checksum = on
       
       
      #
      # * Fine Tuning
      #
      key_buffer              = 16M
      #max_allowed_packet     = 16M
      thread_stack            = 512K
      thread_cache_size       = 8
      myisam-recover         = BACKUP
       
      # SAFETY #
      max_allowed_packet             = 64M
      max_connect_errors             = 1000000
      sysdate_is_now                 = 1
       
      max_connections=500
      event_scheduler=ON
      table_cache     = 3000
       
      tmp_table_size                 = 64M
      max_heap_table_size            = 64M
      query_cache_type               = 1
      query_cache_size               = 8M
      max_connections                = 500
      thread_cache_size              = 50
      open_files_limit               = 65535
      table_definition_cache         = 4096
      table_open_cache               = 4096
       
      max_sp_recursion_depth         = 2
       
      back_log=1500
       
       
      # UTF-8
      character_set_server = utf8
      collation_server = utf8_general_ci
       
       
      #
      # * Logging and Replication
      #
      #log_slow_queries       = /var/log/mysql/mysql-slow.log
      long_query_time         = 1
      log-queries-not-using-indexes
      log-output              = TABLE
      log_warnings            = 2
       
      sync_binlog             = 0
      binlog_checksum         = crc32
      binlog_commit_wait_usec = 50000
      binlog_commit_wait_count= 20
      server-id               = 765
      log_bin                 = /data/bigmomma/mysql/log/mysql-bin.log
      binlog_format           = MIXED
      expire_logs_days        = 7
      max_binlog_size         = 1G
      log-slave-updates
      slave_compressed_protocol = 1
      relay-log               = /data/bigmomma/mysql/log/mysql-relay.log
       
       
      # INNODB #
      innodb_log_files_in_group      = 2
      innodb_flush_log_at_trx_commit = 0
      innodb_buffer_pool_size        = 40G
      innodb_file_per_table
      innodb_log_file_size=1G
      innodb_flush_method=O_DIRECT
      innodb_file_format=barracuda
      innodb_io_capacity = 20000
      innodb_buffer_pool_instances=10
      innodb_log_buffer_size=32M
       
      group_concat_max_len = 8096
       
      [mysqldump]
      quick
      quote-names
      max_allowed_packet      = 16M
      gtid                    = ON
      ignore-table            = telefoongids.white_subscriber
      ignore-table            = telefoongids.pink_subscriber
       
      [mysql]
      #no-auto-rehash # faster start of mysql but no tab completition
       
      [isamchk]
      key_buffer              = 16M

      Attachments

        Issue Links

          Activity

            CrewOne,

            Can you paste the binlog event on which it chokes (mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704), and the output of

            USE Voorraadbeheer;
            SHOW CREATE TABLE InklaarPrijs;
            SHOW CREATE TABLE InklaarLog;
            SHOW TRIGGERS LIKE 'InklaarPrijs';
            SHOW TRIGGERS LIKE 'InklaarLog';

            and also attach your cnf file(s) from master and slave?

            You said the server would not start again – what happens when it attempts to start? And what happens after you delete the relay logs?

            Thanks.

            elenst Elena Stepanova added a comment - CrewOne , Can you paste the binlog event on which it chokes (mysql-bin.056373, end_log_pos 217054452, Gtid 0-765-1299574704), and the output of USE Voorraadbeheer; SHOW CREATE TABLE InklaarPrijs; SHOW CREATE TABLE InklaarLog; SHOW TRIGGERS LIKE 'InklaarPrijs'; SHOW TRIGGERS LIKE 'InklaarLog'; and also attach your cnf file(s) from master and slave? You said the server would not start again – what happens when it attempts to start? And what happens after you delete the relay logs? Thanks.
            CrewOne Sander Pilon added a comment - - edited

            When I delete the relay logs, the server starts. But obviously the replication does not.

            I can't restart the replication after that because I do not know the correct position. I can restore the backup again and start the replication - but eventually it will crash. (Usually it will run for a while and replicate a bit before it crashes.)

            But this is the second time this happened. Last week we had the exact same situation. This time is the same except with a fresh backup/restore cycle.

            I've never managed to get any 10.1 version stable - not as a master and not as a slave (replicating from 10.0.) The idea was to replicate from 10.0 to 10.1 and see how stable 10.1 performs, but so far no luck.

            We also tried 10.1 as a master once, but that crashed as well. So we went back to 10.0.

            CrewOne Sander Pilon added a comment - - edited When I delete the relay logs, the server starts. But obviously the replication does not. I can't restart the replication after that because I do not know the correct position. I can restore the backup again and start the replication - but eventually it will crash. (Usually it will run for a while and replicate a bit before it crashes.) But this is the second time this happened. Last week we had the exact same situation. This time is the same except with a fresh backup/restore cycle. I've never managed to get any 10.1 version stable - not as a master and not as a slave (replicating from 10.0.) The idea was to replicate from 10.0 to 10.1 and see how stable 10.1 performs, but so far no luck. We also tried 10.1 as a master once, but that crashed as well. So we went back to 10.0.
            CrewOne Sander Pilon added a comment -

            CREATE TABLE `InklaarPrijs` (
              `EAN` char(13) NOT NULL,
              `DT` timestamp(3) NULL DEFAULT NULL,
              `Listprijs` decimal(7,2) DEFAULT NULL,
              `Valuta` char(8) DEFAULT NULL,
              `KortingMP` decimal(5,2) DEFAULT NULL,
              PRIMARY KEY (`EAN`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8
             
            CREATE TABLE `InklaarLog` (
              `DT` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
              `EAN` char(13) NOT NULL,
              `Listprijs` decimal(7,2) DEFAULT NULL,
              `Valuta` char(8) DEFAULT NULL,
              `KortingMP` decimal(5,2) DEFAULT NULL,
              PRIMARY KEY (`DT`,`EAN`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8
             
            Before_Insert_Inkaarlog
            BEGIN
            	DECLARE kortingmp_ncoi DECIMAL(5,2);
            	IF NEW.KortingMP IS NOT NULL THEN
            		SELECT U.Kortingmp_NCOI INTO kortingmp_ncoi
            		FROM Manpo.Boeken B
            			JOIN Manpo.Uitgevers U USING ( UitgeverID )
            		WHERE B.EAN = NEW.EAN;
            		
            		IF kortingmp_ncoi IS NOT NULL AND NEW.KortingMP = kortingmp_ncoi THEN
            			SET NEW.KortingMP = null;
            		END IF;
            	END IF;
            END
             
            After_Insert_Inklaarlog
            BEGIN
            	INSERT INTO Voorraadbeheer.InklaarPrijs ( EAN, DT, Listprijs, Valuta, KortingMP )
            		VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta, NEW.KortingMP )
            		ON DUPLICATE KEY UPDATE DT=VALUES(DT),
            			Listprijs = IFNULL(VALUES(Listprijs), Listprijs),
            			Valuta = IFNULL(VALUES(Valuta), Valuta),
            			KortingMP = IFNULL(VALUES(KortingMP), KortingMP);
            END

            CrewOne Sander Pilon added a comment - CREATE TABLE `InklaarPrijs` ( `EAN` char(13) NOT NULL, `DT` timestamp(3) NULL DEFAULT NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL, `KortingMP` decimal(5,2) DEFAULT NULL, PRIMARY KEY (`EAN`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8   CREATE TABLE `InklaarLog` ( `DT` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3), `EAN` char(13) NOT NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL, `KortingMP` decimal(5,2) DEFAULT NULL, PRIMARY KEY (`DT`,`EAN`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8   Before_Insert_Inkaarlog BEGIN DECLARE kortingmp_ncoi DECIMAL(5,2); IF NEW.KortingMP IS NOT NULL THEN SELECT U.Kortingmp_NCOI INTO kortingmp_ncoi FROM Manpo.Boeken B JOIN Manpo.Uitgevers U USING ( UitgeverID ) WHERE B.EAN = NEW.EAN; IF kortingmp_ncoi IS NOT NULL AND NEW.KortingMP = kortingmp_ncoi THEN SET NEW.KortingMP = null; END IF; END IF; END   After_Insert_Inklaarlog BEGIN INSERT INTO Voorraadbeheer.InklaarPrijs ( EAN, DT, Listprijs, Valuta, KortingMP ) VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta, NEW.KortingMP ) ON DUPLICATE KEY UPDATE DT=VALUES(DT), Listprijs = IFNULL(VALUES(Listprijs), Listprijs), Valuta = IFNULL(VALUES(Valuta), Valuta), KortingMP = IFNULL(VALUES(KortingMP), KortingMP); END
            CrewOne Sander Pilon added a comment -

            This does not seem to do anything:

            mysqlbinlog --stop-position 217054452 mysql-bin.056376   | tail -n 20
            # at 217040113
            # at 217041126
            # at 217042139
            # at 217043152
            # at 217044165
            # at 217045178
            # at 217046191
            # at 217047204
            # at 217048217
            # at 217049230
            # at 217050243
            # at 217051256
            # at 217052269
            # at 217053282
            # at 217054295
            DELIMITER ;
            # End of log file
            ROLLBACK /* added by mysqlbinlog */;
            /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
            /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

            CrewOne Sander Pilon added a comment - This does not seem to do anything: mysqlbinlog --stop-position 217054452 mysql-bin.056376 | tail -n 20 # at 217040113 # at 217041126 # at 217042139 # at 217043152 # at 217044165 # at 217045178 # at 217046191 # at 217047204 # at 217048217 # at 217049230 # at 217050243 # at 217051256 # at 217052269 # at 217053282 # at 217054295 DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
            elenst Elena Stepanova added a comment - - edited

            Thanks.

            This does not seem to do anything

            Just dump the whole log then. Besides, we are more interested in the log 373, that's where the crash happened at the first place. Although, you can dump both (or all four of them).

            Please use mysqlbinlog --verbose --base64-output=DECODE-ROWS mysql-bin.056373.

            Regarding the structure of tables and triggers, is it certainly the same on master and slave?

            elenst Elena Stepanova added a comment - - edited Thanks. This does not seem to do anything Just dump the whole log then. Besides, we are more interested in the log 373, that's where the crash happened at the first place. Although, you can dump both (or all four of them). Please use mysqlbinlog --verbose --base64-output=DECODE-ROWS mysql-bin.056373 . Regarding the structure of tables and triggers, is it certainly the same on master and slave?
            CrewOne Sander Pilon added a comment - - edited

            I attached what I hoped is the relevant section. Also the complete, zipped, file which extracts to 2.2Gb of data.

            file: (98Mb zipped) http://we.tl/cvkJu6nxr8

            mysqlbinlog --stop-position=217064452 --verbose --base64-output=DECODE-ROWS mysql-bin.056373

            CrewOne Sander Pilon added a comment - - edited I attached what I hoped is the relevant section. Also the complete, zipped, file which extracts to 2.2Gb of data. file: (98Mb zipped) http://we.tl/cvkJu6nxr8 mysqlbinlog --stop-position=217064452 --verbose --base64-output=DECODE-ROWS mysql-bin.056373
            CrewOne Sander Pilon added a comment - - edited

            The slave has been freshly restored and the structure has been verified using SQLyog. Even so, it's preferred that MariaDB does not crash - ever - even if structures aren't the same. [Doh... :)]

            CrewOne Sander Pilon added a comment - - edited The slave has been freshly restored and the structure has been verified using SQLyog. Even so, it's preferred that MariaDB does not crash - ever - even if structures aren't the same. [Doh... :)]

            CrewOne,
            Different structures are not an excuse for the crash, but a hint for us to reproduce the problem and find the root cause. So, are they actually different?

            elenst Elena Stepanova added a comment - CrewOne , Different structures are not an excuse for the crash, but a hint for us to reproduce the problem and find the root cause. So, are they actually different?
            CrewOne Sander Pilon added a comment -

            Nope, the same as far as I can tell.

            CrewOne Sander Pilon added a comment - Nope, the same as far as I can tell.
            elenst Elena Stepanova added a comment - - edited

            CrewOne,

            Thanks for the data. For future, you can upload files to ftp.askmonty.org/private. 2G is not a problem as long as it's okay for your upload. Also, only MariaDB developers have access to the data there, which allows to maintain some privacy.

            I can reproduce at least a part of the problem, initially I thought it was an unrelated issue and filed it separately as MDEV-9567, but, now I start thinking it might be the whole reason. However, I'd like to try to check it. Could you please upload the binary log mysql-bin.056373 itself (not decoded)? Maybe I'll be able to get something this way. If you know which values were actually inserted by that unfortunate event, it could also help – since the event appears to be corrupted, there is no saying what was there.

            elenst Elena Stepanova added a comment - - edited CrewOne , Thanks for the data. For future, you can upload files to ftp.askmonty.org/private. 2G is not a problem as long as it's okay for your upload. Also, only MariaDB developers have access to the data there, which allows to maintain some privacy. I can reproduce at least a part of the problem, initially I thought it was an unrelated issue and filed it separately as MDEV-9567 , but, now I start thinking it might be the whole reason. However, I'd like to try to check it. Could you please upload the binary log mysql-bin.056373 itself (not decoded)? Maybe I'll be able to get something this way. If you know which values were actually inserted by that unfortunate event, it could also help – since the event appears to be corrupted, there is no saying what was there.
            CrewOne Sander Pilon added a comment -

            I will upload a file called MDEV-9560.7z to that FTP shortly, it is the requested binlog file. (from the master - not the slave)

            Also, I will have to continue development and that means I have to revert the slave back to a 10.0 server today. Maybe I can setup a 10.1 slave on a seperate machine later, if I can help doing that or running a debug build, let me know.

            CrewOne Sander Pilon added a comment - I will upload a file called MDEV-9560 .7z to that FTP shortly, it is the requested binlog file. (from the master - not the slave) Also, I will have to continue development and that means I have to revert the slave back to a 10.0 server today. Maybe I can setup a 10.1 slave on a seperate machine later, if I can help doing that or running a debug build, let me know.
            CrewOne Sander Pilon added a comment - - edited

            I have reinstalled 10.0.23 - but after I start it, it seems to crash with more or less the same error.

            From gdb:

            #0  0x0000555555bbd7b6 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) ()
            #1  0x0000555555bb99dd in Update_rows_log_event::do_exec_row(rpl_group_info*) ()
            #2  0x0000555555bab865 in Rows_log_event::do_apply_event(rpl_group_info*) ()
            #3  0x00005555558d760d in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) ()
            #4  0x00005555558da06e in handle_slave_sql ()
            #5  0x00007ffff7570182 in start_thread (arg=0x7ffff60c3700) at pthread_create.c:312
            #6  0x00007ffff645547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

            I had to erase all logs and database files, I hope replication will work now.

            CrewOne Sander Pilon added a comment - - edited I have reinstalled 10.0.23 - but after I start it, it seems to crash with more or less the same error. From gdb: #0 0x0000555555bbd7b6 in unpack_row(rpl_group_info*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, unsigned char const*) () #1 0x0000555555bb99dd in Update_rows_log_event::do_exec_row(rpl_group_info*) () #2 0x0000555555bab865 in Rows_log_event::do_apply_event(rpl_group_info*) () #3 0x00005555558d760d in apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*) () #4 0x00005555558da06e in handle_slave_sql () #5 0x00007ffff7570182 in start_thread (arg=0x7ffff60c3700) at pthread_create.c:312 #6 0x00007ffff645547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 I had to erase all logs and database files, I hope replication will work now.

            CrewOne,

            Thanks for the data.
            I can now reproduce the crash, but there is something that does not add up.
            The only way I know to get the error log that you pasted in the description is to set slave_skip_errors=1105 on slave; but it's not in the slave cnf file.
            Is it possible that you have another cnf somewhere which is also used by the slave, or that you set this option on mysqld command line?

            elenst Elena Stepanova added a comment - CrewOne , Thanks for the data. I can now reproduce the crash, but there is something that does not add up. The only way I know to get the error log that you pasted in the description is to set slave_skip_errors=1105 on slave; but it's not in the slave cnf file. Is it possible that you have another cnf somewhere which is also used by the slave, or that you set this option on mysqld command line?
            CrewOne Sander Pilon added a comment - - edited

            Indeed, you are right. I'm sorry, but the slave.cnf you have seems to be missing a part. (I edited out the top bit of comments, but seem to have been a bit too enthousiastic.)

            replicate-wild-ignore-table = tmp.%
            replicate-wild-ignore-table = Voorraadbeheer.VBMIH_tekst_cache
            replicate-wild-ignore-table = Managementboek_Klanten.OrderschermRegels
            replicate-wild-ignore-table = Voorraadbeheer.GeprinteVoorraad
            replicate-wild-ignore-table = php_sessions.% 
            replicate-ignore-db=Computerboek_Nieuwsbrieven
            replicate-ignore-db=Juridischboek_Nieuwsbrieven
             
            relay-log-space-limit=5G
            slave-parallel-threads=4
            slave-skip-errors=1105
            transaction-isolation =  READ-COMMITTED

            That particular line, skipping 1105 was introduced when we tried replicating but it stopped a lot with that error. I remember reading about that somehwere, but honestly... it has been a long time ago. I think it was when 10.1 hit the first GA release.

            Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

            CrewOne Sander Pilon added a comment - - edited Indeed, you are right. I'm sorry, but the slave.cnf you have seems to be missing a part. (I edited out the top bit of comments, but seem to have been a bit too enthousiastic.) replicate-wild-ignore-table = tmp.% replicate-wild-ignore-table = Voorraadbeheer.VBMIH_tekst_cache replicate-wild-ignore-table = Managementboek_Klanten.OrderschermRegels replicate-wild-ignore-table = Voorraadbeheer.GeprinteVoorraad replicate-wild-ignore-table = php_sessions.% replicate-ignore-db=Computerboek_Nieuwsbrieven replicate-ignore-db=Juridischboek_Nieuwsbrieven   relay-log-space-limit=5G slave-parallel-threads=4 slave-skip-errors=1105 transaction-isolation = READ-COMMITTED That particular line, skipping 1105 was introduced when we tried replicating but it stopped a lot with that error. I remember reading about that somehwere, but honestly... it has been a long time ago. I think it was when 10.1 hit the first GA release. Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

            Okay, thanks, now it makes sense.

            Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1?

            I don't think so (will tell you for sure later today), but I think it won't crash, at least not this way, more likely replication will just abort as it usually does.
            Skipping the error apparently re-shuffles things a bit and makes the server hit an error which it would otherwise never reach.

            elenst Elena Stepanova added a comment - Okay, thanks, now it makes sense. Are you saying now that if I remove that line, replication will work from 10.0 -> 10.1? I don't think so (will tell you for sure later today), but I think it won't crash, at least not this way, more likely replication will just abort as it usually does. Skipping the error apparently re-shuffles things a bit and makes the server hit an error which it would otherwise never reach.
            elenst Elena Stepanova added a comment - - edited

            So, here is a summary of what we have here. I can still be wrong in some details, if so, serg will know out the truth as he's investigating now.

            Problem 1: corrupt event

            It happens when 10.0 server executes the following with the binary log enabled in row format:

            SET binlog_format=ROW;
            DROP TABLE IF EXISTS mdev9560;
            CREATE TABLE mdev9560 (ts TIMESTAMP(3));
            INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123'); 

            The important part here is TIMESTAMP(3).

            When mysqlbinlog attempts to read this binlog, it reports a corrupted event:

            *!100001 SET @@session.gtid_seq_no=3*//*!*/;
            BEGIN
            /*!*/;
            # at 648
            # at 695
            #160218 19:41:03 server id 1  end_log_pos 695   Table_map: `test`.`mdev9560` mapped to number 173
            #160218 19:41:03 server id 1  end_log_pos 731   Write_rows: table id 173 flags: STMT_END_F
            ### INSERT INTO `test`.`mdev9560`
            ### SET
            ###   @1=1588445526
            ### INSERT INTO `test`.`mdev9560`
            ### SET
            ###   @1=***Corrupted replication event was detected. Not printing the value***
            # at 731
            #160218 19:41:03 server id 1  end_log_pos 758   Xid = 1190
            COMMIT/*!*/;
            # at 758
            #160218 19:41:09 server id 1  end_log_pos 801   Rotate to mysql-bin.000002  pos: 4
            DELIMITER ;

            However, replication to another 10.0 works on some reason.

            Problem 2: assertion failure on 10.1 debug server

            When 10.1 slave running debug server attempts to replicate the events above from 10.0 master, it goes down with the assertion failure:

            Stack trace from 10.1 commit d80b8442a68093106e00a9a38b7b2c593002a72c

            2016-02-18 19:44:01 140700899031808 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610
            mysqld: /src/10.1/sql/log_event.cc:11696: virtual int Write_rows_log_event::do_exec_row(rpl_group_info*): Assertion `0' failed.
            160218 19:44:01 [ERROR] mysqld got signal 6 ;
             
            #7  0x00007ff778f951d2 in __assert_fail () from /lib64/libc.so.6
            #8  0x000056524fb022a9 in Write_rows_log_event::do_exec_row (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:11696
            #9  0x000056524fafdf5b in Rows_log_event::do_apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:9989
            #10 0x000056524f72a043 in Log_event::apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.h:1343
            #11 0x000056524f71fe00 in apply_event_and_update_pos (ev=0x7ff752487070, thd=0x7ff752422870, rgi=0x7ff752418000, rpt=0x0) at /src/10.1/sql/slave.cc:3374
            #12 0x000056524f720884 in exec_relay_log_event (thd=0x7ff752422870, rli=0x7ff75b183e60, serial_rgi=0x7ff752418000) at /src/10.1/sql/slave.cc:3704
            #13 0x000056524f72397f in handle_slave_sql (arg=0x7ff75b182240) at /src/10.1/sql/slave.cc:4773
            #14 0x00007ff77aede0a4 in start_thread () from /lib64/libpthread.so.0
            #15 0x00007ff77904c04d in clone () from /lib64/libc.so.6

            Problem 3: Replication failure on 10.1 release server

            When 10.1 slave is running a non-debug server, it does not have the debug assertion, so it shows the replication failure instead:

            From error log

            2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610
            2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 731, Gtid 0-1-3, Internal MariaDB error code: 1105
            2016-02-18 19:46:15 140401416502016 [Warning] Slave: Unknown error Error_code: 1105
            2016-02-18 19:46:15 140401416502016 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 610

            Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

            For this crash to happen, a more complicated scenario needs to be run on the 10.0 master (probably it's not minimal though):

            SET binlog_format=ROW;
            DROP TABLE IF EXISTS mdev9560, mdev9560_log;
             
            CREATE TABLE mdev9560 (
               `DT` timestamp(3) NOT NULL,
               `EAN` char(13) NULL,
               `Listprijs` decimal(7,2) DEFAULT NULL,
               `Valuta` char(8) DEFAULT NULL
             );
            CREATE TABLE mdev9560_log (
               `EAN` char(13) NOT NULL,
               `DT` timestamp(3) NULL,
               `Listprijs` decimal(7,2) DEFAULT NULL,
               `Valuta` char(8) DEFAULT NULL,
               PRIMARY KEY (`EAN`)
            );
             
            CREATE TRIGGER tr AFTER INSERT on mdev9560 FOR EACH ROW 
            INSERT INTO mdev9560_log ( EAN, DT, Listprijs, Valuta )
             VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta )
             ON DUPLICATE KEY UPDATE DT=VALUES(DT),
             Listprijs = IFNULL(VALUES(Listprijs), Listprijs),
             Valuta = IFNULL(VALUES(Valuta), Valuta);
             
            INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123','9999999999999',99.99,'EUR');
            INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:07.123','9999999999999',99.99,'EUR');

            On slave, server crashes with signal 11 and can't start anymore while replication is enabled (replication starts, crashes again and so on):

            2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-5, Internal MariaDB error code: 1610
            2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code
            : 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1681, Gtid 0-1-5, Internal MariaDB error code: 1105
            2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-5, Internal MariaDB 
            error code: 1610
            2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560_log; Unknown error, Error_
            code: 1105; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1739, Gtid 0-1-5, Int
            ernal MariaDB error code: 1105
            2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-6, Internal MariaDB erro
            r code: 1610
            2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code
            : 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1978, Gtid 0-1-6, Internal MariaDB error code: 1105
            2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-6, Internal MariaDB 
            error code: 1610
            160218 19:56:09 [ERROR] mysqld got signal 11 ;

            Stack trace from 10.1.11 binary tarball

            #2  <signal handler called>
            #3  unpack_row (rgi=rgi@entry=0x7f116d414600, table=0x7f116d489c08, colcnt=4, row_data=0x0, cols=cols@entry=0x7f116cc1a1e8, current_row_end=current_row_end@entry=0x7f116cc1a260, master_reclength=master_reclength@entry=0x7f116cc1a208, row_end=0x7f116d4ed0f2 "") at /home/buildbot/buildbot/build/sql/rpl_record.cc:226
            #4  0x00005586fb1a10e8 in unpack_current_row (cols=0x7f116cc1a1e8, rgi=0x7f116d414600, this=0x7f116cc1a148) at /home/buildbot/buildbot/build/sql/log_event.h:4440
            #5  Update_rows_log_event::do_exec_row (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:12463
            #6  0x00005586fb193c92 in Rows_log_event::do_apply_event (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:9989
            #7  0x00005586faef7ea1 in Log_event::apply_event (this=this@entry=0x7f116cc1a148, rgi=rgi@entry=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.h:1343
            #8  0x00005586faeedd23 in apply_event_and_update_pos (ev=ev@entry=0x7f116cc1a148, thd=thd@entry=0x7f116d42d008, rgi=rgi@entry=0x7f116d414600, rpt=rpt@entry=0x0) at /home/buildbot/buildbot/build/sql/slave.cc:3374
            #9  0x00005586faef142a in exec_relay_log_event (serial_rgi=0x7f116d414600, rli=0x7f1174e9e9a0, thd=0x7f116d42d008) at /home/buildbot/buildbot/build/sql/slave.cc:3704
            #10 handle_slave_sql (arg=0x7f1174e9d000) at /home/buildbot/buildbot/build/sql/slave.cc:4773
            #11 0x00007f11994180a4 in start_thread () from /lib64/libpthread.so.0


            Since everything starts with a corrupted event on 10.0 master, which is recognized by mysqlbinlog from the same build, I suppose that the root problem is in 10.0. However, the crash is apparently a separate problem, maybe it can be even triggered by some other scenario, not involving timestamps, microseconds, and such.

            As already discussed on IRC, I suggest to try to fix the 10.0 part of the problem before 10.0.24 release, so I raise it to a blocker for now. However, it won't fix already existing binary logs. I don't know why 10.0 => 10.0 replication works in the scenarios above, might be just some luck, in which case it's not reliable; but if there is any real logic behind this ability, it would be very good to keep it (and even have it in 10.1 as well), to allow to replicate from older 10.0 binary logs.

            elenst Elena Stepanova added a comment - - edited So, here is a summary of what we have here. I can still be wrong in some details, if so, serg will know out the truth as he's investigating now. Problem 1: corrupt event It happens when 10.0 server executes the following with the binary log enabled in row format: SET binlog_format=ROW; DROP TABLE IF EXISTS mdev9560; CREATE TABLE mdev9560 (ts TIMESTAMP(3)); INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123'); The important part here is TIMESTAMP(3) . When mysqlbinlog attempts to read this binlog, it reports a corrupted event: *!100001 SET @@session.gtid_seq_no=3*//*!*/; BEGIN /*!*/; # at 648 # at 695 #160218 19:41:03 server id 1 end_log_pos 695 Table_map: `test`.`mdev9560` mapped to number 173 #160218 19:41:03 server id 1 end_log_pos 731 Write_rows: table id 173 flags: STMT_END_F ### INSERT INTO `test`.`mdev9560` ### SET ### @1=1588445526 ### INSERT INTO `test`.`mdev9560` ### SET ### @1=***Corrupted replication event was detected. Not printing the value*** # at 731 #160218 19:41:03 server id 1 end_log_pos 758 Xid = 1190 COMMIT/*!*/; # at 758 #160218 19:41:09 server id 1 end_log_pos 801 Rotate to mysql-bin.000002 pos: 4 DELIMITER ; However, replication to another 10.0 works on some reason. Problem 2: assertion failure on 10.1 debug server When 10.1 slave running debug server attempts to replicate the events above from 10.0 master, it goes down with the assertion failure: Stack trace from 10.1 commit d80b8442a68093106e00a9a38b7b2c593002a72c 2016-02-18 19:44:01 140700899031808 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610 mysqld: /src/10.1/sql/log_event.cc:11696: virtual int Write_rows_log_event::do_exec_row(rpl_group_info*): Assertion `0' failed. 160218 19:44:01 [ERROR] mysqld got signal 6 ;   #7 0x00007ff778f951d2 in __assert_fail () from /lib64/libc.so.6 #8 0x000056524fb022a9 in Write_rows_log_event::do_exec_row (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:11696 #9 0x000056524fafdf5b in Rows_log_event::do_apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.cc:9989 #10 0x000056524f72a043 in Log_event::apply_event (this=0x7ff752487070, rgi=0x7ff752418000) at /src/10.1/sql/log_event.h:1343 #11 0x000056524f71fe00 in apply_event_and_update_pos (ev=0x7ff752487070, thd=0x7ff752422870, rgi=0x7ff752418000, rpt=0x0) at /src/10.1/sql/slave.cc:3374 #12 0x000056524f720884 in exec_relay_log_event (thd=0x7ff752422870, rli=0x7ff75b183e60, serial_rgi=0x7ff752418000) at /src/10.1/sql/slave.cc:3704 #13 0x000056524f72397f in handle_slave_sql (arg=0x7ff75b182240) at /src/10.1/sql/slave.cc:4773 #14 0x00007ff77aede0a4 in start_thread () from /lib64/libpthread.so.0 #15 0x00007ff77904c04d in clone () from /lib64/libc.so.6 Problem 3: Replication failure on 10.1 release server When 10.1 slave is running a non-debug server, it does not have the debug assertion, so it shows the replication failure instead: From error log 2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not read field '' of table 'test.mdev9560', Gtid 0-1-3, Internal MariaDB error code: 1610 2016-02-18 19:46:15 140401416502016 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 731, Gtid 0-1-3, Internal MariaDB error code: 1105 2016-02-18 19:46:15 140401416502016 [Warning] Slave: Unknown error Error_code: 1105 2016-02-18 19:46:15 140401416502016 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 610 Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105 For this crash to happen, a more complicated scenario needs to be run on the 10.0 master (probably it's not minimal though): SET binlog_format=ROW; DROP TABLE IF EXISTS mdev9560, mdev9560_log;   CREATE TABLE mdev9560 ( `DT` timestamp(3) NOT NULL, `EAN` char(13) NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL ); CREATE TABLE mdev9560_log ( `EAN` char(13) NOT NULL, `DT` timestamp(3) NULL, `Listprijs` decimal(7,2) DEFAULT NULL, `Valuta` char(8) DEFAULT NULL, PRIMARY KEY (`EAN`) );   CREATE TRIGGER tr AFTER INSERT on mdev9560 FOR EACH ROW INSERT INTO mdev9560_log ( EAN, DT, Listprijs, Valuta ) VALUES ( NEW.EAN, NEW.DT, NEW.Listprijs, NEW.Valuta ) ON DUPLICATE KEY UPDATE DT=VALUES(DT), Listprijs = IFNULL(VALUES(Listprijs), Listprijs), Valuta = IFNULL(VALUES(Valuta), Valuta);   INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:06.123','9999999999999',99.99,'EUR'); INSERT INTO mdev9560 VALUES ('2016-02-15 12:50:07.123','9999999999999',99.99,'EUR'); On slave, server crashes with signal 11 and can't start anymore while replication is enabled (replication starts, crashes again and so on): 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-5, Internal MariaDB error code: 1610 2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code : 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1681, Gtid 0-1-5, Internal MariaDB error code: 1105 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-5, Internal MariaDB error code: 1610 2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560_log; Unknown error, Error_ code: 1105; Unknown error, Error_code: 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1739, Gtid 0-1-5, Int ernal MariaDB error code: 1105 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560', Gtid 0-1-6, Internal MariaDB erro r code: 1610 2016-02-18 19:56:09 139713576487680 [Warning] Slave SQL: Could not execute Write_rows_v1 event on table test.mdev9560; Unknown error, Error_code : 1105; handler error No Error!; the event's master log mysql-bin.000001, end_log_pos 1978, Gtid 0-1-6, Internal MariaDB error code: 1105 2016-02-18 19:56:09 139713576487680 [ERROR] Slave SQL: Could not read field 'Valuta' of table 'test.mdev9560_log', Gtid 0-1-6, Internal MariaDB error code: 1610 160218 19:56:09 [ERROR] mysqld got signal 11 ; Stack trace from 10.1.11 binary tarball #2 <signal handler called> #3 unpack_row (rgi=rgi@entry=0x7f116d414600, table=0x7f116d489c08, colcnt=4, row_data=0x0, cols=cols@entry=0x7f116cc1a1e8, current_row_end=current_row_end@entry=0x7f116cc1a260, master_reclength=master_reclength@entry=0x7f116cc1a208, row_end=0x7f116d4ed0f2 "") at /home/buildbot/buildbot/build/sql/rpl_record.cc:226 #4 0x00005586fb1a10e8 in unpack_current_row (cols=0x7f116cc1a1e8, rgi=0x7f116d414600, this=0x7f116cc1a148) at /home/buildbot/buildbot/build/sql/log_event.h:4440 #5 Update_rows_log_event::do_exec_row (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:12463 #6 0x00005586fb193c92 in Rows_log_event::do_apply_event (this=0x7f116cc1a148, rgi=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.cc:9989 #7 0x00005586faef7ea1 in Log_event::apply_event (this=this@entry=0x7f116cc1a148, rgi=rgi@entry=0x7f116d414600) at /home/buildbot/buildbot/build/sql/log_event.h:1343 #8 0x00005586faeedd23 in apply_event_and_update_pos (ev=ev@entry=0x7f116cc1a148, thd=thd@entry=0x7f116d42d008, rgi=rgi@entry=0x7f116d414600, rpt=rpt@entry=0x0) at /home/buildbot/buildbot/build/sql/slave.cc:3374 #9 0x00005586faef142a in exec_relay_log_event (serial_rgi=0x7f116d414600, rli=0x7f1174e9e9a0, thd=0x7f116d42d008) at /home/buildbot/buildbot/build/sql/slave.cc:3704 #10 handle_slave_sql (arg=0x7f1174e9d000) at /home/buildbot/buildbot/build/sql/slave.cc:4773 #11 0x00007f11994180a4 in start_thread () from /lib64/libpthread.so.0 Since everything starts with a corrupted event on 10.0 master, which is recognized by mysqlbinlog from the same build, I suppose that the root problem is in 10.0. However, the crash is apparently a separate problem, maybe it can be even triggered by some other scenario, not involving timestamps, microseconds, and such. As already discussed on IRC, I suggest to try to fix the 10.0 part of the problem before 10.0.24 release, so I raise it to a blocker for now. However, it won't fix already existing binary logs. I don't know why 10.0 => 10.0 replication works in the scenarios above, might be just some luck, in which case it's not reliable; but if there is any real logic behind this ability, it would be very good to keep it (and even have it in 10.1 as well), to allow to replicate from older 10.0 binary logs.

            Problem 1: corrupt event

            This is another manifestation of MDEV-5377. The event is not corrupted, but mysqlbinlog cannot parse it properly, and thinks that the event is corrupted.

            Old MariaDB temporal data types with microseconds don't store enough metadata in RBR events, so it's only possible to read these RBR event on the slave that has exactly the same table structure as the master.

            Because mysqlbinlog has no tables at all, it cannot parse these events correctly.

            serg Sergei Golubchik added a comment - Problem 1: corrupt event This is another manifestation of MDEV-5377 . The event is not corrupted, but mysqlbinlog cannot parse it properly, and thinks that the event is corrupted. Old MariaDB temporal data types with microseconds don't store enough metadata in RBR events, so it's only possible to read these RBR event on the slave that has exactly the same table structure as the master. Because mysqlbinlog has no tables at all, it cannot parse these events correctly.
            serg Sergei Golubchik added a comment - - edited

            Problem 2: assertion failure on 10.1 debug server

            and

            Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105

            this will be fixed in the next 10.1 release (10.1.12)

            serg Sergei Golubchik added a comment - - edited Problem 2: assertion failure on 10.1 debug server and Problem 4: Hard crash on 10.1 release server running with slave-skip-errors=1105 this will be fixed in the next 10.1 release (10.1.12)

            Problem 3: Replication failure on 10.1 release server

            This is a combined effect of MDEV-5377 and MDEV-5528.

            Because of MDEV-5377 temporal times could not be always replicated using RBR. And they were incompatible with MySQL temporal types (because MariaDB had them before MySQL). Later we have added MySQL-compatible temporal types and in MDEV-5528 provided an option to use them by default. MySQL-compatible types can be safely replicated using RBR events.

            So your 10.0 server creates old MariaDB timestamp(3) field. While your 10.1 server creates new MariaDB timestamp(3). If you use RBR, old timestamp(3) can be only replicated to identically defined table. But your 10.0 and 10.1 tables aren't identical, that's why the replication fails.

            I'll see if we can recognize this situation on the slave and automatically create the compatible table structure.

            But until then you can start your 10.1 server with --disable-mysql56-temporal-format option. The replication will work.

            serg Sergei Golubchik added a comment - Problem 3: Replication failure on 10.1 release server This is a combined effect of MDEV-5377 and MDEV-5528 . Because of MDEV-5377 temporal times could not be always replicated using RBR. And they were incompatible with MySQL temporal types (because MariaDB had them before MySQL). Later we have added MySQL-compatible temporal types and in MDEV-5528 provided an option to use them by default. MySQL-compatible types can be safely replicated using RBR events. So your 10.0 server creates old MariaDB timestamp(3) field. While your 10.1 server creates new MariaDB timestamp(3). If you use RBR, old timestamp(3) can be only replicated to identically defined table. But your 10.0 and 10.1 tables aren't identical, that's why the replication fails. I'll see if we can recognize this situation on the slave and automatically create the compatible table structure. But until then you can start your 10.1 server with --disable-mysql56-temporal-format option. The replication will work.

            CrewOne,

            If you start your replication from a clean (empty) datadir, --disable-mysql56-temporal-format mentioned above might be a good workaround for you for now, even before you get fixes for crashes.
            If you start replication from an already existing 10.1 directory where tables with microseconds have already been created, the workaround will be a little more complicated, tables will need to be ALTER-ed. If there are many tables, it should be possible to generate ALTER statements automatically based on the contents of INFORMATION_SCHEMA.COLUMNS.

            elenst Elena Stepanova added a comment - CrewOne , If you start your replication from a clean (empty) datadir, --disable-mysql56-temporal-format mentioned above might be a good workaround for you for now, even before you get fixes for crashes. If you start replication from an already existing 10.1 directory where tables with microseconds have already been created, the workaround will be a little more complicated, tables will need to be ALTER-ed. If there are many tables, it should be possible to generate ALTER statements automatically based on the contents of INFORMATION_SCHEMA.COLUMNS .

            The patch is Ok to push.
            I'd only add tests for the other data types (TIME and DATETIME).
            Currently it tests only TIMESTAMP.

            bar Alexander Barkov added a comment - The patch is Ok to push. I'd only add tests for the other data types (TIME and DATETIME). Currently it tests only TIMESTAMP.
            CrewOne Sander Pilon added a comment -

            Is it safe to assume a fix will be in the next version? Currently replication is working 10.0->10.0 and I can wait a few weeks before trying 10.1 again.

            CrewOne Sander Pilon added a comment - Is it safe to assume a fix will be in the next version? Currently replication is working 10.0->10.0 and I can wait a few weeks before trying 10.1 again.

            Yes, it will be.
            By the way, the release schedule is on our jira main page (http://mariadb.org/jira) and next 10.1 is planned for this week.

            if you don't see the roadmap, please log in — it's a jira bug JRA-59335 that hopefully will be fixed soon

            serg Sergei Golubchik added a comment - Yes, it will be. By the way, the release schedule is on our jira main page ( http://mariadb.org/jira ) and next 10.1 is planned for this week. if you don't see the roadmap, please log in — it's a jira bug JRA-59335 that hopefully will be fixed soon

            People

              serg Sergei Golubchik
              CrewOne Sander Pilon
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.