Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33303

slave_parallel_mode=optimistic should not report the mode's specific temporary errors

Details

    Description

      I have server mariadb-10.5.16 with slave_parallel_mode=optimistic and see errors in log:
      ```
      2024-01-24 8:00:46 57 [ERROR] mysqld: Can't find record in 'test'
      2024-01-24 10:40:55 56 [ERROR] mysqld: Can't find record in 'test'
      2024-01-24 11:24:48 56 [ERROR] mysqld: Can't find record in 'test'
      2024-01-24 14:44:33 55 [ERROR] mysqld: Can't find record in 'test'
      2024-01-24 15:11:35 58 [ERROR] mysqld: Can't find record in 'test'
      2024-01-24 15:11:51 57 [ERROR] mysqld: Can't find record in 'test'
      2024-01-24 15:11:51 57 [ERROR] mysqld: Can't find record in 'test'
      2024-01-24 15:33:25 57 [ERROR] mysqld: Can't find record in 'test'
      ```

      how to repeat:
      ```
      drop database if exists my_table_test;
      create database if not exists my_table_test;
      use my_table_test;
      drop table if exists test;
      create table test(a int, b varchar(123));
      insert into test values(1, 'asdf');
      update test set b='zxf1' where a=1;
      update test set b='\n' where a=1;
      ```

      Replication is not crash and still works.
      ```
      Slave_IO_State: Waiting for master to send event
      Master_Host: 127.0.0.1
      Master_User: liujia2
      Master_Port: 3633
      Connect_Retry: 60
      Master_Log_File: mysql-bin.000052
      Read_Master_Log_Pos: 219035686
      Relay_Log_File: mysql-develop-0001-relay-bin.000104
      Relay_Log_Pos: 168065139
      Relay_Master_Log_File: mysql-bin.000052
      Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      Replicate_Do_DB:
      Replicate_Ignore_DB:
      Replicate_Do_Table:
      Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
      Replicate_Wild_Ignore_Table:
      Last_Errno: 0
      Last_Error:
      Skip_Counter: 0
      Exec_Master_Log_Pos: 219035209
      Relay_Log_Space: 570350217
      Until_Condition: None
      Until_Log_File:
      Until_Log_Pos: 0
      Master_SSL_Allowed: No
      Master_SSL_CA_File:
      Master_SSL_CA_Path:
      Master_SSL_Cert:
      Master_SSL_Cipher:
      Master_SSL_Key:
      Seconds_Behind_Master: 0
      Master_SSL_Verify_Server_Cert: No
      Last_IO_Errno: 0
      Last_IO_Error:
      Last_SQL_Errno: 0
      Last_SQL_Error:
      Replicate_Ignore_Server_Ids:
      Master_Server_Id: 1
      Master_SSL_Crl:
      Master_SSL_Crlpath:
      Using_Gtid: Slave_Pos
      Gtid_IO_Pos: 0-1-113030212
      Replicate_Do_Domain_Ids:
      Replicate_Ignore_Domain_Ids:
      Parallel_Mode: optimistic
      SQL_Delay: 0
      SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
      Slave_DDL_Groups: 8
      Slave_Non_Transactional_Groups: 2
      Slave_Transactional_Groups: 113030236
      ```

      Attachments

        Issue Links

          Activity

            Ouch, this is not supposed to occur. Thanks for the report, confirmed that this can occur with the very simple test provided.

            Here is a mysql-test-run testcase that reproduces, the below patch makes it easier to trigger:

            --source include/have_innodb.inc
            --source include/have_binlog_format_row.inc
            --source include/master-slave.inc
             
            --connection slave
            --source include/stop_slave.inc
            SET @old_retry=@@GLOBAL.slave_transaction_retries;
            SET GLOBAL slave_parallel_threads=4;
            SET @old_mode= @@GLOBAL.slave_parallel_mode;
            SET GLOBAL slave_parallel_mode= optimistic;
             
            --connection master
            drop database if exists my_table_test;
            create database if not exists my_table_test;
            use my_table_test;
            drop table if exists test;
            create table test(a int, b varchar(123)) ENGINE=InnoDB;
            SET gtid_seq_no= 100;
            insert into test values(1, 'asdf');
            update test set b='zxf1' where a=1;
            update test set b='\n' where a=1;
             
            --connection slave
            --source include/start_slave.inc
             
            --connection master
            --sync_slave_with_master
             
            --connection master
            DROP DATABASE my_table_test;
             
            --connection slave
            --source include/stop_slave.inc
            SET GLOBAL slave_transaction_retries= @old_retry;
            SET GLOBAL slave_parallel_threads=0;
            SET GLOBAL slave_parallel_mode= @old_mode;
            --source include/start_slave.inc
             
            --source include/rpl_end.inc
            

            diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
            index 13a981d4381..426d4ee481e 100644
            --- a/sql/log_event_server.cc
            +++ b/sql/log_event_server.cc
            @@ -7325,6 +7325,7 @@ Rows_log_event::write_row(rpl_group_info *rgi,
                 TODO: Add safety measures against infinite looping. 
                */
             
            +my_sleep(20000);
               if (table->s->sequence)
                 error= update_sequence();
               else while (unlikely(error= table->file->ha_write_row(table->record[0])))
            

            knielsen Kristian Nielsen added a comment - Ouch, this is not supposed to occur. Thanks for the report, confirmed that this can occur with the very simple test provided. Here is a mysql-test-run testcase that reproduces, the below patch makes it easier to trigger: --source include/have_innodb.inc --source include/have_binlog_format_row.inc --source include/master-slave.inc   --connection slave --source include/stop_slave.inc SET @old_retry=@@GLOBAL.slave_transaction_retries; SET GLOBAL slave_parallel_threads=4; SET @old_mode= @@GLOBAL.slave_parallel_mode; SET GLOBAL slave_parallel_mode= optimistic;   --connection master drop database if exists my_table_test; create database if not exists my_table_test; use my_table_test; drop table if exists test; create table test(a int, b varchar(123)) ENGINE=InnoDB; SET gtid_seq_no= 100; insert into test values(1, 'asdf'); update test set b='zxf1' where a=1; update test set b='\n' where a=1;   --connection slave --source include/start_slave.inc   --connection master --sync_slave_with_master   --connection master DROP DATABASE my_table_test;   --connection slave --source include/stop_slave.inc SET GLOBAL slave_transaction_retries= @old_retry; SET GLOBAL slave_parallel_threads=0; SET GLOBAL slave_parallel_mode= @old_mode; --source include/start_slave.inc   --source include/rpl_end.inc diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 13a981d4381..426d4ee481e 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -7325,6 +7325,7 @@ Rows_log_event::write_row(rpl_group_info *rgi, TODO: Add safety measures against infinite looping. */ +my_sleep(20000); if (table->s->sequence) error= update_sequence(); else while (unlikely(error= table->file->ha_write_row(table->record[0])))

            This was supposed to be fixed in MDEV-13577.
            However, the fix for that overlooked the case where the table has no unique (or primary) key. Then the missing row is caught from an index/table scan returning the error HA_ERR_END_OF_FILE (the patch for MDEV-13577 handled only an index lookup that returns HA_ERR_KEY_NOT_FOUND).

            This patch seems to fix the bug and avoid the errorneous messages in the error log:

            diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
            index 13a981d4381..28b1dd13658 100644
            --- a/sql/log_event_server.cc
            +++ b/sql/log_event_server.cc
            @@ -7814,6 +7815,12 @@ static int row_not_found_error(rpl_group_info *rgi)
                      ? HA_ERR_KEY_NOT_FOUND : HA_ERR_RECORD_CHANGED;
             }
             
            +static int end_of_file_error(rpl_group_info *rgi)
            +{
            +  return rgi->speculation != rpl_group_info::SPECULATE_OPTIMISTIC
            +         ? HA_ERR_END_OF_FILE : HA_ERR_RECORD_CHANGED;
            +}
            +
             /**
               Locate the current row in event's table.
             
            @@ -8061,6 +8068,8 @@ int Rows_log_event::find_row(rpl_group_info *rgi)
                   while ((error= table->file->ha_index_next(table->record[0])))
                   {
                     DBUG_PRINT("info",("no record matching the given row found"));
            +        if (error == HA_ERR_END_OF_FILE)
            +          error= end_of_file_error(rgi);
                     table->file->print_error(error, MYF(0));
                     table->file->ha_index_end();
                     goto end;
            @@ -8097,6 +8106,7 @@ int Rows_log_event::find_row(rpl_group_info *rgi)
                     break;
             
                   case HA_ERR_END_OF_FILE:
            +        error= end_of_file_error(rgi);
                     DBUG_PRINT("info", ("Record not found"));
                     table->file->ha_rnd_end();
                     goto end;
            

            knielsen Kristian Nielsen added a comment - This was supposed to be fixed in MDEV-13577 . However, the fix for that overlooked the case where the table has no unique (or primary) key. Then the missing row is caught from an index/table scan returning the error HA_ERR_END_OF_FILE (the patch for MDEV-13577 handled only an index lookup that returns HA_ERR_KEY_NOT_FOUND). This patch seems to fix the bug and avoid the errorneous messages in the error log: diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 13a981d4381..28b1dd13658 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -7814,6 +7815,12 @@ static int row_not_found_error(rpl_group_info *rgi) ? HA_ERR_KEY_NOT_FOUND : HA_ERR_RECORD_CHANGED; } +static int end_of_file_error(rpl_group_info *rgi) +{ + return rgi->speculation != rpl_group_info::SPECULATE_OPTIMISTIC + ? HA_ERR_END_OF_FILE : HA_ERR_RECORD_CHANGED; +} + /** Locate the current row in event's table. @@ -8061,6 +8068,8 @@ int Rows_log_event::find_row(rpl_group_info *rgi) while ((error= table->file->ha_index_next(table->record[0]))) { DBUG_PRINT("info",("no record matching the given row found")); + if (error == HA_ERR_END_OF_FILE) + error= end_of_file_error(rgi); table->file->print_error(error, MYF(0)); table->file->ha_index_end(); goto end; @@ -8097,6 +8106,7 @@ int Rows_log_event::find_row(rpl_group_info *rgi) break; case HA_ERR_END_OF_FILE: + error= end_of_file_error(rgi); DBUG_PRINT("info", ("Record not found")); table->file->ha_rnd_end(); goto end;
            liujia123 liujia added a comment -

            I used sysbench to reproduce the error.
            This is the content of the sysbench script, named read_only.lua.
            ```
            require("oltp_common")

            function prepare_statements()
            end

            function event()
            con:query("update test set b='zxf1' where a=1;")
            con:query("update test set b='\n' where a=1;")
            end
            ```

            This is the sysbench command.
            ```
            sysbench ./read_only.lua --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3633 --mysql-user=root --mysql-password=123456 --mysql-db=my_table_test --threads=10 --time=0 --report-interval=1 run > my_table_test.log 2>&1 &
            ```

            Run the sysbench command to quickly reproduce the problem.

            liujia123 liujia added a comment - I used sysbench to reproduce the error. This is the content of the sysbench script, named read_only.lua. ``` require("oltp_common") function prepare_statements() end function event() con:query("update test set b='zxf1' where a=1;") con:query("update test set b='\n' where a=1;") end ``` This is the sysbench command. ``` sysbench ./read_only.lua --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3633 --mysql-user=root --mysql-password=123456 --mysql-db=my_table_test --threads=10 --time=0 --report-interval=1 run > my_table_test.log 2>&1 & ``` Run the sysbench command to quickly reproduce the problem.

            Hi serg,

            Can you review a small patch for MDEV-33303? It's a follow-up to an earlier patch of yours:

              commit bb24663f5ad955a615172512c04779d219bb6645
              Author: Sergei Golubchik <serg@mariadb.org>
              Date:   Wed Jun 20 10:45:57 2018 +0200
             
                  MDEV-13577 slave_parallel_mode=optimistic should not report the mode's specific temporary errors
            

            The MDEV-13577 patch missed one code path which occurs for a table without primary key. So I added similar code for this case.

            Patch in branch bb-10.6-knielsen and in mail: https://lists.mariadb.org/hyperkitty/list/commits@lists.mariadb.org/thread/YBVEM3I4ZC4UMZOPJDU7BCGVJI36CIY6/

            The bug is present in 10.4, but I suggest to fix in 10.6 as the bug is not very serious (it occurs only in rare situations, and only results in a stray message in the error log).

            - Kristian.

            knielsen Kristian Nielsen added a comment - Hi serg , Can you review a small patch for MDEV-33303 ? It's a follow-up to an earlier patch of yours: commit bb24663f5ad955a615172512c04779d219bb6645 Author: Sergei Golubchik <serg@mariadb.org> Date: Wed Jun 20 10:45:57 2018 +0200   MDEV-13577 slave_parallel_mode=optimistic should not report the mode's specific temporary errors The MDEV-13577 patch missed one code path which occurs for a table without primary key. So I added similar code for this case. Patch in branch bb-10.6-knielsen and in mail: https://lists.mariadb.org/hyperkitty/list/commits@lists.mariadb.org/thread/YBVEM3I4ZC4UMZOPJDU7BCGVJI36CIY6/ The bug is present in 10.4, but I suggest to fix in 10.6 as the bug is not very serious (it occurs only in rare situations, and only results in a stray message in the error log). - Kristian.

            People

              knielsen Kristian Nielsen
              liujia123 liujia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.