Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15426

Event scheduler disabled on a slave possibly breaking replication

Details

    Description

      The environment is:

      master:
      mariadb 10.1.31
      debian stretch
      event_scheduler=ON

      slave:
      mariadb 10.1.31
      debian stretch
      event_scheduler=OFF

      Replication: ROW based

      The master was originally a standalone server and we wanted to add a slave.
      To do so we stopped mariadb and copied all the content over to the future slave.

      We do run events on the master and we didn't want those to run on the slave, as otherwise row based replication will break.

      When setting up the slave we started it for the first time with event_scheduler=0
      We started replication and it was working finely until an event was created on the master.
      It broke replication with:

                     Last_SQL_Error: Error 'Cannot proceed because system tables used by Event Scheduler were found damaged at server start' on query. Default database: 'tendril'. Query: 'CREATE DEFINER=`root`@`10.64.32.13` event db2093_codfw_wmnet_3306_schema_prep
        on schedule every 1 day starts date(now()) + interval floor(rand() * 23) hour
        do begin
       
          select @server_id := id, @enabled := enabled from servers where host = 'db2093.codfw.wmnet' and port = 3306;
       
          delete from schemata_check where server_id = @server_id;
          insert into schemata_check select @server_id, t.schema_name from db2093_codfw_wmnet_3306_schemata t;
       
        end'
      
      

      At first we thought the mysql.event  table could be corrupted because of the data transfer between the two servers. But it wasn't or doesn't look like it (as per check table event command)
      These are all the actions we have tried to try to get this fixed.

      • Repair event table + restarting mariadb
      • Analyze event table + restarting mariadb
      • Dropping + creating the event table + restarting mariadb
      • mysqldump from the master + load on the slave + restarting mariadb
      • altering event table to make it InnoDB + restarting mariadb
      • mysql_upgrade --force -s

      The table doesn't look like it is corrupted:

      mysql:root@localhost [mysql]> check table event;
      +-------------+-------+----------+----------+
      | Table       | Op    | Msg_type | Msg_text |
      +-------------+-------+----------+----------+
      | mysql.event | check | status   | OK       |
      +-------------+-------+----------+----------+
      1 row in set (0.05 sec)
       
      mysql:root@localhost [mysql]> select count(*) from event;
      +----------+
      | count(*) |
      +----------+
      |     1812 |
      +----------+
      1 row in set (0.00 sec)
       
      mysql:root@localhost [mysql]> select * from event limit 1\G
      *************************** 1. row ***************************
                        db: tendril
                      name: es1011_eqiad_wmnet_3306_usage
                      body: begin
       
          if (get_lock('es1011_eqiad_wmnet_3306_usage', 1) = 0) then
              signal sqlstate value '45000' set message_text = 'get_lock';
          end if;
      

      Yet mariadb thinks so:

       
      mysql:root@localhost [tendril]> show events;
      ERROR 1577 (HY000): Cannot proceed because system tables used by Event Scheduler were found damaged at server start
      

      And upon server restart

      Feb 27 13:26:03 db2093 mysqld[6921]: 2018-02-27 13:26:03 139909807241984 [ERROR] Slave SQL: Error 'Cannot proceed because system tables used by Event Scheduler were found damaged at server start' on query. Default database: 'tendril'. Query: 'CREATE DEFINER=`root`@`10.64.
      Feb 27 13:26:03 db2093 mysqld[6921]:   on schedule every 1 day starts date(now()) + interval floor(rand() * 23) hour
      Feb 27 13:26:03 db2093 mysqld[6921]:   do begin
      Feb 27 13:26:03 db2093 mysqld[6921]:     select @server_id := id, @enabled := enabled from servers where host = 'db2093.codfw.wmnet' and port = 3306;
      Feb 27 13:26:03 db2093 mysqld[6921]:     delete from schemata_check where server_id = @server_id;
      Feb 27 13:26:03 db2093 mysqld[6921]:     insert into schemata_check select @server_id, t.schema_name from db2093_codfw_wmnet_3306_schemata t;
      Feb 27 13:26:03 db2093 mysqld[6921]:   end', Gtid 171966586-171966586-319085319, Internal MariaDB error code: 1577
      Feb 27 13:26:03 db2093 mysqld[6921]: 2018-02-27 13:26:03 139909807241984 [Warning] Slave: Cannot proceed because system tables used by Event Scheduler were found damaged at server start Error_code: 1577
      

      This looks pretty similar to: https://bugs.mysql.com/bug.php?id=70975 which affects 5.7 and is still open.

      We could probably try to see if this doesn't happen if the slave has event_scheduler=ON (even though replication will be broken because of data drifts)

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            But the error log does complain about it:

            I know, but that's the effect, not the cause. This (admittedly stupid) error message is issued whenever event_scheduler is set to DISABLED and you try to do something event-related. You can get it by simply starting server with event_scheduler=DISABLED and running CREATE EVENT:

            MariaDB [test]> select @@event_scheduler;
            +-------------------+
            | @@event_scheduler |
            +-------------------+
            | DISABLED          |
            +-------------------+
            1 row in set (0.00 sec)
             
            MariaDB [test]> create event ev on schedule every 1 minute do select 1 into @a;
            ERROR 1577 (HY000): Cannot proceed because system tables used by Event Scheduler were found damaged at server start
            

            The question is why it got disabled to begin with. The easy way to achieve it is having it in the config files, that's why I asked about it.
            It can also be disabled upon server startup, but I'd expect it to produce an error message saying that it gets disabled, and why.

            Now it occurred to me – it might be producing the message at the very beginning of startup, before it starts writing to the error log. Could you please look around, I'm not sure where it writes those first few lines in your case – journal, messages, syslog, stdout, etc., depending on how you start it?

            elenst Elena Stepanova added a comment - - edited But the error log does complain about it: I know, but that's the effect, not the cause. This (admittedly stupid) error message is issued whenever event_scheduler is set to DISABLED and you try to do something event-related. You can get it by simply starting server with event_scheduler=DISABLED and running CREATE EVENT : MariaDB [test]> select @@event_scheduler; + -------------------+ | @@event_scheduler | + -------------------+ | DISABLED | + -------------------+ 1 row in set (0.00 sec)   MariaDB [test]> create event ev on schedule every 1 minute do select 1 into @a; ERROR 1577 (HY000): Cannot proceed because system tables used by Event Scheduler were found damaged at server start The question is why it got disabled to begin with. The easy way to achieve it is having it in the config files, that's why I asked about it. It can also be disabled upon server startup, but I'd expect it to produce an error message saying that it gets disabled, and why. Now it occurred to me – it might be producing the message at the very beginning of startup, before it starts writing to the error log. Could you please look around, I'm not sure where it writes those first few lines in your case – journal, messages, syslog, stdout, etc., depending on how you start it?

            There is nothing around the logs that indicates why the event scheduler has been disabled or stopped.
            I have looked all over the logs and possible logs sources.
            All the stuff that happens during startup is posted at https://jira.mariadb.org/browse/MDEV-15426?focusedCommentId=107788&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107788

            I haven't found anything more than that.

            marostegui Manuel Arostegui added a comment - There is nothing around the logs that indicates why the event scheduler has been disabled or stopped. I have looked all over the logs and possible logs sources. All the stuff that happens during startup is posted at https://jira.mariadb.org/browse/MDEV-15426?focusedCommentId=107788&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107788 I haven't found anything more than that.

            Could you please paste the output of

            select * from information_schema.system_variables where variable_name='event_scheduler' \G
            

            elenst Elena Stepanova added a comment - Could you please paste the output of select * from information_schema.system_variables where variable_name= 'event_scheduler' \G
            marostegui Manuel Arostegui added a comment - That output will not be relevant anymore, as we fixed the system already as stated at: https://jira.mariadb.org/browse/MDEV-15426?focusedCommentId=107796&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107796

            Unfortunately, without more details on how to reproduce it, there isn't anything we can work with. I've checked the basic scenario again – it does work: slave created by copying the initial data from the master, master running with event scheduler = ON, slave running with event_scheduler = OFF – nothing bad happens on replicated event creation. We are missing something.
            With what we can see (no errors in the log and event_scheduler set to DISABLED, the most likely reason is that it's done somewhere in server configuration upon startup, maybe there is an extra configuration file which wasn't noticed;
            Hypothetically, there might something non-standard in configuration which causes it (possibly reveals an actual bug), or some error messages about disabling event scheduler didn't make it to the error log on whatever reason.

            If somebody else experiences this problem, please comment, preferably including right away all configuration files, exact server version(s), and the output of select * from information_schema.system_variables where variable_name='event_scheduler'.

            elenst Elena Stepanova added a comment - Unfortunately, without more details on how to reproduce it, there isn't anything we can work with. I've checked the basic scenario again – it does work: slave created by copying the initial data from the master, master running with event scheduler = ON, slave running with event_scheduler = OFF – nothing bad happens on replicated event creation. We are missing something. With what we can see (no errors in the log and event_scheduler set to DISABLED, the most likely reason is that it's done somewhere in server configuration upon startup, maybe there is an extra configuration file which wasn't noticed; Hypothetically, there might something non-standard in configuration which causes it (possibly reveals an actual bug), or some error messages about disabling event scheduler didn't make it to the error log on whatever reason. If somebody else experiences this problem, please comment, preferably including right away all configuration files, exact server version(s), and the output of select * from information_schema.system_variables where variable_name='event_scheduler' .

            People

              Unassigned Unassigned
              marostegui Manuel Arostegui
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.