[MDEV-13624] pt-table-checksum crashes server Created: 2017-08-23  Updated: 2017-10-30  Due: 2017-09-12  Resolved: 2017-10-30

Status: Closed
Project: MariaDB Server
Component/s: Plugins
Affects Version/s: 10.1.26
Fix Version/s: 10.1.27

Type: Bug Priority: Major
Reporter: Heinz Wiesinger Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: None
Environment:

CentOS 7, MariaDB 10.1.26, encryption enabled (aws-kms), master-slave replication with one master and one slave


Attachments: File error.log     File schema.sql     File variables.log    
Issue Links:
Blocks
is blocked by MDEV-13650 Backport fix for MDEV-13060 (crash wh... Closed

 Description   

Set up a fresh environment, configured encryption and replication, and set up pt-table-checksum to check the database that is replicated. The database is still empty, no records in there yet. Whenever pt-table-checksum runs, it crashes the server.

I have an older environment running 10.1.21. Configuration-wise it is an exact replica of the fresh environment (except for the older MariaDB version). pt-table-checksum works fine there.



 Comments   
Comment by Heinz Wiesinger [ 2017-08-24 ]

Downgraded both master and slave to 10.1.21 and the crashes are indeed gone. So it looks like a regression introduced in a newer version.

Comment by Elena Stepanova [ 2017-08-24 ]

It looks like the database is not completely empty, there are at least tables. Could you please paste the output of SHOW CREATE TABLE `heartbeat`.`patient_medication` and attach your cnf file(s) or SHOW VARIABLES?

Are any special options for pt-table-checksum required?

Comment by Heinz Wiesinger [ 2017-08-25 ]

Right, I mentioned it on IRC, but forgot to mention it properly here. There is a database with a couple tables, but the tables themselves are empty. I attached the table schema of the tables involved as well as the output of SHOW VARIABLES.

The pt-table-checksum command we use is

pt-table-checksum -u percona -p password --no-check-plan --databases heartbeat --quiet

Comment by Elena Stepanova [ 2017-08-26 ]

It appears to be one of numerous vague consequences of the problem with loading both AWS and server audit.

Reproducible by starting server
--encrypt-tmp-files --plugin-load-add=server_audit --plugin-load-add=aws_key_management <valid aws options>, loading the attached dump and then executing
percona-toolkit-3.0.2/bin/pt-table-checksum -uroot --no-check-plan --quiet --port=3306 --host=127.0.0.1 --databases test

The scenario can obviously be simplified, but there is no much point doing it now, because the mere server startup with these options already causes all kinds of troubles.

The problem was already fixed in the scope of MDEV-13060, but only in 10.2. I have created a request for a backport to 10.1 in MDEV-13650. This bug is blocked until the backport is done, after that we will need to re-check it.

Comment by Heinz Wiesinger [ 2017-10-30 ]

I'm now running 10.1.28 on the servers for some time and didn't experience any crashes. Looks like the backport fixed it

Comment by Elena Stepanova [ 2017-10-30 ]

Thanks for checking!

Comment by Elena Stepanova [ 2017-10-30 ]

According to the above, closing as fixed in the scope of MDEV-13650.

Generated at Thu Feb 08 08:07:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.