[MDEV-15426] Event scheduler disabled on a slave possibly breaking replication - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 10.1.31
Fix Version/s: N/A
Component/s: Events, Replication, Server
Labels:
- event
- replication
Environment:
debian stretch

Description

The environment is:

master:
mariadb 10.1.31
debian stretch
event_scheduler=ON

slave:
mariadb 10.1.31
debian stretch
event_scheduler=OFF

Replication: ROW based

The master was originally a standalone server and we wanted to add a slave.
To do so we stopped mariadb and copied all the content over to the future slave.

We do run events on the master and we didn't want those to run on the slave, as otherwise row based replication will break.

When setting up the slave we started it for the first time with event_scheduler=0
We started replication and it was working finely until an event was created on the master.
It broke replication with:

               Last_SQL_Error: Error 'Cannot proceed because system tables used by Event Scheduler were found damaged at server start' on query. Default database: 'tendril'. Query: 'CREATE DEFINER=`root`@`10.64.32.13` event db2093_codfw_wmnet_3306_schema_prep

  on schedule every 1 day starts date(now()) + interval floor(rand() * 23) hour

  do begin

    select @server_id := id, @enabled := enabled from servers where host = 'db2093.codfw.wmnet' and port = 3306;

    delete from schemata_check where server_id = @server_id;

    insert into schemata_check select @server_id, t.schema_name from db2093_codfw_wmnet_3306_schemata t;

  end'

At first we thought the mysql.event table could be corrupted because of the data transfer between the two servers. But it wasn't or doesn't look like it (as per check table event command)
These are all the actions we have tried to try to get this fixed.

Repair event table + restarting mariadb
Analyze event table + restarting mariadb
Dropping + creating the event table + restarting mariadb
mysqldump from the master + load on the slave + restarting mariadb
altering event table to make it InnoDB + restarting mariadb
mysql_upgrade --force -s

The table doesn't look like it is corrupted:

mysql:root@localhost [mysql]> check table event;

+-------------+-------+----------+----------+

| Table       | Op    | Msg_type | Msg_text |

+-------------+-------+----------+----------+

| mysql.event | check | status   | OK       |

+-------------+-------+----------+----------+

1 row in set (0.05 sec)

mysql:root@localhost [mysql]> select count(*) from event;

+----------+

| count(*) |

+----------+

|     1812 |

+----------+

1 row in set (0.00 sec)

mysql:root@localhost [mysql]> select * from event limit 1\G

*************************** 1. row ***************************

                  db: tendril

                name: es1011_eqiad_wmnet_3306_usage

                body: begin

    if (get_lock('es1011_eqiad_wmnet_3306_usage', 1) = 0) then

        signal sqlstate value '45000' set message_text = 'get_lock';

    end if;

Yet mariadb thinks so:

mysql:root@localhost [tendril]> show events;

ERROR 1577 (HY000): Cannot proceed because system tables used by Event Scheduler were found damaged at server start

And upon server restart

Feb 27 13:26:03 db2093 mysqld[6921]: 2018-02-27 13:26:03 139909807241984 [ERROR] Slave SQL: Error 'Cannot proceed because system tables used by Event Scheduler were found damaged at server start' on query. Default database: 'tendril'. Query: 'CREATE DEFINER=`root`@`10.64.

Feb 27 13:26:03 db2093 mysqld[6921]:   on schedule every 1 day starts date(now()) + interval floor(rand() * 23) hour

Feb 27 13:26:03 db2093 mysqld[6921]:   do begin

Feb 27 13:26:03 db2093 mysqld[6921]:     select @server_id := id, @enabled := enabled from servers where host = 'db2093.codfw.wmnet' and port = 3306;

Feb 27 13:26:03 db2093 mysqld[6921]:     delete from schemata_check where server_id = @server_id;

Feb 27 13:26:03 db2093 mysqld[6921]:     insert into schemata_check select @server_id, t.schema_name from db2093_codfw_wmnet_3306_schemata t;

Feb 27 13:26:03 db2093 mysqld[6921]:   end', Gtid 171966586-171966586-319085319, Internal MariaDB error code: 1577

Feb 27 13:26:03 db2093 mysqld[6921]: 2018-02-27 13:26:03 139909807241984 [Warning] Slave: Cannot proceed because system tables used by Event Scheduler were found damaged at server start Error_code: 1577

This looks pretty similar to: https://bugs.mysql.com/bug.php?id=70975 which affects 5.7 and is still open.

We could probably try to see if this doesn't happen if the slave has event_scheduler=ON (even though replication will be broken because of data drifts)

Attachments

Issue Links

relates to

MDEV-17718 Setting event_scheduler to DISABLED causes bogus corruption errors

Closed

Activity

Ascending order - Click to sort in descending order

View 6 older comments

Elena Stepanova added a comment - 2018-03-08 17:53 - edited

But the error log does complain about it:

I know, but that's the effect, not the cause. This (admittedly stupid) error message is issued whenever event_scheduler is set to DISABLED and you try to do something event-related. You can get it by simply starting server with event_scheduler=DISABLED and running CREATE EVENT:

MariaDB [test]> select @@event_scheduler;

+-------------------+

| @@event_scheduler |

+-------------------+

| DISABLED          |

+-------------------+

1 row in set (0.00 sec)

MariaDB [test]> create event ev on schedule every 1 minute do select 1 into @a;

ERROR 1577 (HY000): Cannot proceed because system tables used by Event Scheduler were found damaged at server start

The question is why it got disabled to begin with. The easy way to achieve it is having it in the config files, that's why I asked about it.
It can also be disabled upon server startup, but I'd expect it to produce an error message saying that it gets disabled, and why.

Now it occurred to me – it might be producing the message at the very beginning of startup, before it starts writing to the error log. Could you please look around, I'm not sure where it writes those first few lines in your case – journal, messages, syslog, stdout, etc., depending on how you start it?

Elena Stepanova added a comment - 2018-03-08 17:53 - edited But the error log does complain about it: I know, but that's the effect, not the cause. This (admittedly stupid) error message is issued whenever event_scheduler is set to DISABLED and you try to do something event-related. You can get it by simply starting server with event_scheduler=DISABLED and running CREATE EVENT : MariaDB [test]> select @@event_scheduler; + -------------------+ | @@event_scheduler | + -------------------+ | DISABLED | + -------------------+ 1 row in set (0.00 sec) MariaDB [test]> create event ev on schedule every 1 minute do select 1 into @a; ERROR 1577 (HY000): Cannot proceed because system tables used by Event Scheduler were found damaged at server start The question is why it got disabled to begin with. The easy way to achieve it is having it in the config files, that's why I asked about it. It can also be disabled upon server startup, but I'd expect it to produce an error message saying that it gets disabled, and why. Now it occurred to me – it might be producing the message at the very beginning of startup, before it starts writing to the error log. Could you please look around, I'm not sure where it writes those first few lines in your case – journal, messages, syslog, stdout, etc., depending on how you start it?

Manuel Arostegui added a comment - 2018-03-14 13:06

There is nothing around the logs that indicates why the event scheduler has been disabled or stopped.
I have looked all over the logs and possible logs sources.
All the stuff that happens during startup is posted at https://jira.mariadb.org/browse/MDEV-15426?focusedCommentId=107788&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107788

I haven't found anything more than that.

Manuel Arostegui added a comment - 2018-03-14 13:06 There is nothing around the logs that indicates why the event scheduler has been disabled or stopped. I have looked all over the logs and possible logs sources. All the stuff that happens during startup is posted at https://jira.mariadb.org/browse/MDEV-15426?focusedCommentId=107788&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107788 I haven't found anything more than that.

Elena Stepanova added a comment - 2018-03-17 00:34

Could you please paste the output of

select * from information_schema.system_variables where variable_name='event_scheduler' \G

Elena Stepanova added a comment - 2018-03-17 00:34 Could you please paste the output of select * from information_schema.system_variables where variable_name= 'event_scheduler' \G

Manuel Arostegui added a comment - 2018-03-23 16:50

That output will not be relevant anymore, as we fixed the system already as stated at: https://jira.mariadb.org/browse/MDEV-15426?focusedCommentId=107796&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107796

Manuel Arostegui added a comment - 2018-03-23 16:50 That output will not be relevant anymore, as we fixed the system already as stated at: https://jira.mariadb.org/browse/MDEV-15426?focusedCommentId=107796&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-107796

Elena Stepanova added a comment - 2019-01-22 01:03

Unfortunately, without more details on how to reproduce it, there isn't anything we can work with. I've checked the basic scenario again – it does work: slave created by copying the initial data from the master, master running with event scheduler = ON, slave running with event_scheduler = OFF – nothing bad happens on replicated event creation. We are missing something.
With what we can see (no errors in the log and event_scheduler set to DISABLED, the most likely reason is that it's done somewhere in server configuration upon startup, maybe there is an extra configuration file which wasn't noticed;
Hypothetically, there might something non-standard in configuration which causes it (possibly reveals an actual bug), or some error messages about disabling event scheduler didn't make it to the error log on whatever reason.

If somebody else experiences this problem, please comment, preferably including right away all configuration files, exact server version(s), and the output of select * from information_schema.system_variables where variable_name='event_scheduler'.

Elena Stepanova added a comment - 2019-01-22 01:03 Unfortunately, without more details on how to reproduce it, there isn't anything we can work with. I've checked the basic scenario again – it does work: slave created by copying the initial data from the master, master running with event scheduler = ON, slave running with event_scheduler = OFF – nothing bad happens on replicated event creation. We are missing something. With what we can see (no errors in the log and event_scheduler set to DISABLED, the most likely reason is that it's done somewhere in server configuration upon startup, maybe there is an extra configuration file which wasn't noticed; Hypothetically, there might something non-standard in configuration which causes it (possibly reveals an actual bug), or some error messages about disabling event scheduler didn't make it to the error log on whatever reason. If somebody else experiences this problem, please comment, preferably including right away all configuration files, exact server version(s), and the output of select * from information_schema.system_variables where variable_name='event_scheduler' .

MariaDB Server

Event scheduler disabled on a slave possibly breaking replication

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration