Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13577

slave_parallel_mode=optimistic should not report the mode's specific temporary errors

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.22, 10.1.25, 10.1(EOL), 10.2(EOL)
    • 10.1.34, 10.2.16, 10.3.8
    • Replication
    • None
    • centos7 x86_64
    • 10.2.14, 10.3.6-1

    Description

      I have server mariadb-10.1.22 with slave_parallel_mode=optimistic and see (bogus?) errors in log:

      2017-08-16  9:34:05 139902864206592 [ERROR] mysqld: Can't find record in 'hr_user_request_log'
      2017-08-16  9:36:03 139902865722112 [ERROR] mysqld: Can't find record in 'sms_log'
      2017-08-16  9:43:11 139902864812800 [ERROR] mysqld: Can't find record in 'sms_log'
      2017-08-17  3:14:05 139902864812800 [ERROR] mysqld: Can't find record in 'sms_log'
      2017-08-17  4:01:17 139902864206592 [ERROR] mysqld: Can't find record in 'sms_log'
      2017-08-17 10:02:02 139902865419008 [ERROR] mysqld: Can't find record in 'sms_log'
      2017-08-17 10:14:06 139902864206592 [ERROR] mysqld: Can't find record in 'reg_user_tracked_stat'
      2017-08-17 11:06:03 139902865419008 [ERROR] mysqld: Can't find record in 'cv_journal'
      

      Replication is not crash and still works.

      CREATE TABLE `sms_log` (
        `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
        `sms_type` enum('','admin_login','admin_vpn','resume_buy','after_registration_reg_user','after_registration_hr','hr_welcome_after_registration','hr_start_after_registration','password_recovery','phone_confirmation','friends_resume','friends_resume_app','invite_reg_user','response_to_a_demo_vacancy_no_read','verification_code','my_resume','link_to_app','hr_basic_500_after_registration','small_town_after_registration','small_region_after_registration','hr_promotional') DEFAULT NULL,
        `receiver_type` enum('hr','reg_user','guest','admin') NOT NULL DEFAULT 'guest',
        `sms_text` mediumtext NOT NULL,
        `date_create` datetime NOT NULL,
        `date_sent` datetime NOT NULL,
        `sms_link_clicked` tinyint(3) NOT NULL DEFAULT '0',
        `sms_source` enum('desktop','admin','api','mobile_app_reg_user','mobile_app_hr','mobile_site_reg_user','mobile_site_hr','cron') DEFAULT NULL,
        `id_town` smallint(5) unsigned NOT NULL DEFAULT '0',
        `mobile_operator` enum('mts','beeline','megafon','tele2','other') NOT NULL DEFAULT 'other',
        `phone` varchar(20) NOT NULL DEFAULT '',
        `id_receiver` int(10) unsigned NOT NULL DEFAULT '0',
        `sms_gate` enum('superjobvip','superjobnew','devinotele') NOT NULL DEFAULT 'superjobnew',
        `sender_type` enum('hr','reg_user','guest','admin') DEFAULT NULL,
        `id_sender` int(10) DEFAULT NULL,
        `id_client` int(10) DEFAULT NULL,
        `last_error` varchar(255) NOT NULL DEFAULT '',
        PRIMARY KEY (`id`),
        KEY `date_create` (`date_create`),
        KEY `date_sent` (`date_sent`),
        KEY `mobile_operator` (`mobile_operator`),
        KEY `id_receiver` (`id_receiver`),
        KEY `id_sender` (`sender_type`,`id_client`,`id_sender`)
      ) ENGINE=InnoDB AUTO_INCREMENT=13685411 DEFAULT CHARSET=utf8
      

      binlog:

      # at 460866962
      #170817  3:14:05 server id 86  end_log_pos 460867000    GTID 0-86-586027225 trans
      /*!100001 SET @@session.gtid_seq_no=586027225*//*!*/;
      BEGIN
      /*!*/;
      # at 460867000
      # at 460867086
      #170817  3:14:05 server id 86  end_log_pos 460867206    Write_rows: table id 25 flags: STMT_END_F
      ### INSERT INTO `site_stat`.`sms_log`
      ### SET
      ###   @1=13676743
      ###   @2=10
      ###   @3=3
      ###   @4='---cleared---'
      ###   @5=2017-08-17 03:14:05
      ###   @6=0000-00-00 00:00:00
      ###   @7=0
      ###   @8=1
      ###   @9=0
      ###   @10=4
      ###   @11='79595959595'
      ###   @12=0
      ###   @13=3
      ###   @14=3
      ###   @15=NULL
      ###   @16=NULL
      ###   @17=''
      # at 460867206
      #170817  3:14:05 server id 86  end_log_pos 460867233    Xid = 678520262
      COMMIT/*!*/;
      # at 460867233
      #170817  3:14:05 server id 86  end_log_pos 460867273    GTID 0-86-586027226 cid=678520266 trans
      /*!100001 SET @@session.gtid_seq_no=586027226*//*!*/;
      BEGIN
      /*!*/;
      # at 460867273
      # at 460867359
      #170817  3:14:05 server id 86  end_log_pos 460867571    Update_rows: table id 25 flags: STMT_END_F
      ### UPDATE `site_stat`.`sms_log`
      ### WHERE
      ###   @1=13676743
      ###   @2=10
      ###   @3=3
      ###   @4='---cleared---'
      ###   @5=2017-08-17 03:14:05
      ###   @6=0000-00-00 00:00:00
      ###   @7=0
      ###   @8=1
      ###   @9=0
      ###   @10=4
      ###   @11='79595959595'
      ###   @12=0
      ###   @13=3
      ###   @14=3
      ###   @15=NULL
      ###   @16=NULL
      ###   @17=''
      ### SET
      ###   @1=13676743
      ###   @2=10
      ###   @3=3
      ###   @4='---cleared---'
      ###   @5=2017-08-17 03:14:05
      ###   @6=2017-08-17 03:14:05
      ###   @7=0
      ###   @8=1
      ###   @9=0
      ###   @10=4
      ###   @11='79595959595'
      ###   @12=0
      ###   @13=3
      ###   @14=3
      ###   @15=NULL
      ###   @16=NULL
      ###   @17=''
      # at 460867571
      #170817  3:14:05 server id 86  end_log_pos 460867598    Xid = 678520266
      COMMIT/*!*/;
      

      MariaDB [(none)]> show variables like 'slave%';
      +-------------------------------+------------+
      | Variable_name                 | Value      |
      +-------------------------------+------------+
      | slave_compressed_protocol     | OFF        |
      | slave_ddl_exec_mode           | IDEMPOTENT |
      | slave_domain_parallel_threads | 0          |
      | slave_exec_mode               | STRICT     |
      | slave_load_tmpdir             | /base/tmp  |
      | slave_max_allowed_packet      | 1073741824 |
      | slave_net_timeout             | 3600       |
      | slave_parallel_max_queued     | 131072     |
      | slave_parallel_mode           | optimistic |
      | slave_parallel_threads        | 4          |
      | slave_run_triggers_for_rbr    | NO         |
      | slave_skip_errors             | OFF        |
      | slave_sql_verify_checksum     | ON         |
      | slave_transaction_retries     | 10         |
      | slave_type_conversions        |            |
      +-------------------------------+------------+
      

      Attachments

        Issue Links

          Activity

            The error as such is easily reproducible with a simple concurrent test which inserts / updates rows with the same PK (see the test below). The result seems to be all right at the end, no inconsistencies; so, I think that the appearance of these errors in optimistic mode is most likely expected – by definition any DML is allowed to run in parallel, so it can happen that UPDATE is attempted before INSERT, causes an error and is retried later.

            I will assign it to Elkin to confirm (or not) that this behavior is indeed expected. Also, if it is, maybe it makes sense to convert the error into a warning or even a note. It is printed on warning level 2, which has become a default in 10.2.

            mdev13577.yy

            query_init:
              CREATE TABLE IF NOT EXISTS `t1` (
                `id` BIGINT UNSIGNED PRIMARY KEY,
                `c` char(120) NOT NULL default ''
              ) ENGINE=InnoDB
            ;
             
            query:
              update | delete | insert | insert;
             
            insert:
              INSERT IGNORE INTO `t1` (`id`) VALUES ( UNIX_TIMESTAMP() );
             
            update:
              UPDATE `t1` SET `c` = _string WHERE `id` = UNIX_TIMESTAMP();
             
            delete:
                DELETE FROM `t1` WHERE `id` = UNIX_TIMESTAMP();
            

            To reproduce,

            git clone https://github.com/elenst/rqg --branch experimental rqg-mdev13577
            cd rqg-mdev13577
            # Place the grammar above somewhere, e.g. ./mdev13577.yy
            perl ./runall-new.pl \
            --threads=8 --duration=600 --queries=100M \
            --grammar=mdev13577.yy --skip-gendata \
            --rpl-mode=row --mysqld=--slave-parallel-mode=optimistic --mysqld=--slave-parallel-threads=4 \
            --mysqld=--log-warnings=2 \
            --vardir=/tmp/mdev13577 \
            --basedir=<your basedir>
            

            (change paths as needed).
            The slave error log will be /tmp/mdev13577_slave/mysqld.err or alike, depending on the vardir path that you choose.

            elenst Elena Stepanova added a comment - The error as such is easily reproducible with a simple concurrent test which inserts / updates rows with the same PK (see the test below). The result seems to be all right at the end, no inconsistencies; so, I think that the appearance of these errors in optimistic mode is most likely expected – by definition any DML is allowed to run in parallel, so it can happen that UPDATE is attempted before INSERT, causes an error and is retried later. I will assign it to Elkin to confirm (or not) that this behavior is indeed expected. Also, if it is, maybe it makes sense to convert the error into a warning or even a note. It is printed on warning level 2, which has become a default in 10.2. mdev13577.yy query_init: CREATE TABLE IF NOT EXISTS `t1` ( `id` BIGINT UNSIGNED PRIMARY KEY, `c` char(120) NOT NULL default '' ) ENGINE=InnoDB ;   query: update | delete | insert | insert;   insert: INSERT IGNORE INTO `t1` (`id`) VALUES ( UNIX_TIMESTAMP() );   update: UPDATE `t1` SET `c` = _string WHERE `id` = UNIX_TIMESTAMP();   delete: DELETE FROM `t1` WHERE `id` = UNIX_TIMESTAMP(); To reproduce, git clone https://github.com/elenst/rqg --branch experimental rqg-mdev13577 cd rqg-mdev13577 # Place the grammar above somewhere, e.g. ./mdev13577.yy perl ./runall-new.pl \ --threads=8 --duration=600 --queries=100M \ --grammar=mdev13577.yy --skip-gendata \ --rpl-mode=row --mysqld=--slave-parallel-mode=optimistic --mysqld=--slave-parallel-threads=4 \ --mysqld=--log-warnings=2 \ --vardir=/tmp/mdev13577 \ --basedir=<your basedir> (change paths as needed). The slave error log will be /tmp/mdev13577_slave/mysqld.err or alike, depending on the vardir path that you choose.

            Elena is right, these errors will occur regularly internally for optimistic parallel replication, and will be handled by retrying the failed transactions.

            But I think the error should never appear in the error log. An error that is handled by optimistic parallel replication as a temporary error that merely causes a transaction retry should not be put into the logs. There is no actual error, and the user cannot do anything about it anyway - and there is a retry count status available to monitor this behaviour.

            Maybe historycally, slave retries (in single-threaded replication) were logged like this, and so optimistic parallel replication logs retries the same way - but it seems wrong, and should be safe to silence.

            knielsen Kristian Nielsen added a comment - Elena is right, these errors will occur regularly internally for optimistic parallel replication, and will be handled by retrying the failed transactions. But I think the error should never appear in the error log. An error that is handled by optimistic parallel replication as a temporary error that merely causes a transaction retry should not be put into the logs. There is no actual error, and the user cannot do anything about it anyway - and there is a retry count status available to monitor this behaviour. Maybe historycally, slave retries (in single-threaded replication) were logged like this, and so optimistic parallel replication logs retries the same way - but it seems wrong, and should be safe to silence.

            I am also voting for not logging these pseudo-errors, we have enough noise in the error log as it is, and if there is nothing users are expected to do, there is no point alerting them. (I also have my own agenda, since the error messages create false positives in my tests, which I have to remember about and work around).

            elenst Elena Stepanova added a comment - I am also voting for not logging these pseudo-errors, we have enough noise in the error log as it is, and if there is nothing users are expected to do, there is no point alerting them. (I also have my own agenda, since the error messages create false positives in my tests, which I have to remember about and work around).
            Elkin Andrei Elkin added a comment -

            Sergei, salute.

            I start with offering you to review the patch whose main accent is
            on my_message_sql(). knielsen would be the 2nd option though I understood he
            apparently okayed an idea to downgrade unnecessary error level in
            case of the slave applier running optimistically.

            I hope you'll find time to check out.
            elenst needs this patch which hurts RockDB testing.

            Cheers,

            Andrei

            Elkin Andrei Elkin added a comment - Sergei, salute. I start with offering you to review the patch whose main accent is on my_message_sql() . knielsen would be the 2nd option though I understood he apparently okayed an idea to downgrade unnecessary error level in case of the slave applier running optimistically. I hope you'll find time to check out. elenst needs this patch which hurts RockDB testing. Cheers, Andrei
            Elkin Andrei Elkin added a comment -

            Fixes are pushed as 7bbe324fc17 commit.

            Elkin Andrei Elkin added a comment - Fixes are pushed as 7bbe324fc17 commit.

            People

              Elkin Andrei Elkin
              chernomorets Sergey Chernomorets
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.