[MDEV-8537] Damaged binlog index Created: 2015-07-24  Updated: 2015-07-27  Resolved: 2015-07-27

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.0.20
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Igor Pashev Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

AWS EC2 / NixOS / Linux-amd64


Attachments: File crabbed-master-mysql-bin.index.gz     File master-main-stderr.log.gz     File master-my.cnf     File slave-main-stderr.log.gz    

 Description   

Few times we got damaged binlog index on two of our servers running MariaDB 10.0.20.

I can't tell anything exact, but.

Once index looked fine, but probably included invisible symbols, and none of master command were working (RESET MASTER, SHOW MASTER STATUS).

Another time index got some binary data (attached). This happened on master and slave (both running the same version - 10.0.20, and have the same config)

Yet another time index on master has something on the first line and mysqldump --master-data=1/2 caught SIGSEGV permanently:

# mysqldump --master-data=2 --single-transaction=1 --compact=1 --comments=0 bob_live_tw > /dev/null
Segmentation fault (core dumped)

mysqld printed something weird on stderr (attached).



 Comments   
Comment by Elena Stepanova [ 2015-07-27 ]

ip1981,

Judging by the logs, you have more problems than just binlog index corruption, and they appear to be related.

Do you have some kind of a wrapper around mysqld / mysqld_safe which starts/stops/restarts the server? For example, where these records in the error log come from?

2015-07-21 14:22:46 --MARK--

and more importantly

ready to mysql

The obvious problem (that might or might not be the reason for the binlog issue, but there is a good chance it is) is that your server does not shut down properly upon restart, so during restart you have a concurrent use of essential files. Maybe binlog index is one of those, then, if both the "old" and the "new" instances write into it at once, it's not surprising it ends up with a garbage.

There are visible indications that the concurrency problem does exist.
First, look at the shutdown/restart (a relatively normal one, without a crash or anything):

2015-07-21 18:48:37 --MARK--
150721 20:06:11 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld: Normal shutdown
 
150721 20:06:11 [Note] Event Scheduler: Purging the queue. 0 events
ready to mysql
150721 20:06:12 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 26891 ...

So, the normal shutdown just started, it's not completed (it would have said "Shutdown complete"). But somehow, something decides it's "ready to mysql" and starts the new one. Apparently, it does not even SIGKILL the previous server before doing so (it would have been a bad idea, but starting a new one without stopping the old one is even worse):

150721 20:06:12 [Note] /nix/store/x98iqcgraz75l3738va7kprwsapcggv3-mariadb-10.0.20/bin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 26891 ...
150721 20:06:12 [ERROR] mysqld: Can't lock aria control file '/var/mysql/db/aria_log_control' for exclusive use, error: 11. Will retry for 30 seconds

As the log says, something is still using the aria control file. Given the above, it's apparently the old process which is still shutting down.

There are other weird artefacts which might be caused by the same issue:

2015-07-21 11:11:15 7fa2e2a37700  InnoDB: Operating system error number 11 in a file operation.
InnoDB: Error number 11 means 'Resource temporarily unavailable'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
150721 11:11:15 [ERROR] InnoDB: File (unknown): 'read' returned OS error 111. Cannot continue operation

InnoDB: Page directory corruption: infimum not pointed to
2015-07-23 19:05:49 7f4a68826700 InnoDB: Page dump in ascii and hex (16384 bytes):

etc.

I suggest to get rid of this problem (e.g. fix the wrapper so it waits for the shutdown to finish, with all proper checks on the PID), to start fresh and see if the binlog index corruption re-appears. If the wrapper is something we provide that I'm not aware of, please let me know.
We'll be waiting for your feedback.

Comment by Igor Pashev [ 2015-07-27 ]

Yes, you are right. Our server is running under supervisor, that itself is under super-supervisor.
And we have discovered recently that super-supervisor sends SIGKILL to mysqld.

Thank you for your analysis! I'd close this case as "invalid"

Comment by Elena Stepanova [ 2015-07-27 ]

Closing as not-a-bug for now. If the issue with the binlog index corruption re-appears on a cleaner setup, please comment to re-open.

Generated at Thu Feb 08 07:27:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.