[MDEV-24284] InnoDB: Record field 64 len 18446744... Created: 2020-11-26 Updated: 2023-04-11 Resolved: 2023-04-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Galera SST, mariabackup, Storage Engine - InnoDB |
| Affects Version/s: | 10.3.25 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Roope Pääkkönen (Inactive) | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Centos 7 |
||
| Issue Links: |
|
||||||||
| Description |
|
We have some mariadb 10.3 instances where the server has log entries similar to below:
Maybe few times per day at most, I haven't found what the exact trigger is The value of "record field" 64 may vary, but the "len" is always this 18446744073709544479. We haven't noticed any operational issues on the servers so far but still interested to learn if this is something to look into deeper I don't really have much information on when/how these might have began to happen. |
| Comments |
| Comment by Marko Mäkelä [ 2020-11-26 ] |
|
The number corresponds to -7137 in 2’s complement 64-bit arithmetics. That might have something to do with the default innodb_page_size=16384. Have the affected systems been initiated by snapshot transfer? If yes, in which way exactly? |
| Comment by Roope Pääkkönen (Inactive) [ 2020-11-26 ] |
|
Hi, yes, these have been set up with mariabackup SST |
| Comment by Marko Mäkelä [ 2020-11-26 ] |
|
Thank you. That could limit the problem to Mariabackup or the way it is used in Galera snapshot transfer. Do you execute any ALTER TABLE or OPTIMIZE TABLE statements? If yes, have you tried setting innodb_log_optimize_ddl=OFF (which |
| Comment by Roope Pääkkönen (Inactive) [ 2020-11-28 ] |
|
Yes, ALTER's are run every now and then, as part of software updates at minimum, OPTIMIZE not. But i don't have any exact query entry as of now that would match with that error message. I haven't tried the optimize_ddl setting, could it help? |
| Comment by Mark Reibert [ 2021-02-11 ] |
|
I am experiencing this too, on a three-node MariaDB Galera cluster (MariaDB version 10.4.14 with Galera 26.4.5). Note I am performing DDL statements as several tables in my DB are partitioned and there is a nightly cron that executes ALTER TABLE to add/drop partitions. Also, innodb_log_optimize_ddl is enabled as that is the default with MariaDB 10.4.14. |
| Comment by Mark Reibert [ 2021-02-12 ] |
|
Update: I set innodb_log_optimize_ddl = OFF on my cluster nodes and the "Record field len" messages are still appearing in the logs. Note I made the change without restarting mysqld since innodb_log_optimize_ddl is a dynamic variable. |
| Comment by Marko Mäkelä [ 2021-02-12 ] |
|
I suspect that this could share a root cause with |
| Comment by Marko Mäkelä [ 2021-02-12 ] |
|
mreibert, some data that has been already permanently corrupted cannot be magically uncorrupted by the change of a configuration parameter that should prevent future corruption. I hope that you are following a systematic approach in order to track down this problem: Always start with a backup of a known good cluster. (CHECK TABLE must pass on all tables, to begin with.) And then try to reproduce the problem, by replaying some load on the cluster and performing the snapshot transfer. otto was able to narrow down his problem in |
| Comment by Marko Mäkelä [ 2021-02-12 ] |
|
One more thought: In the upcoming 10.5.9 release, |
| Comment by Mark Reibert [ 2021-02-12 ] |
|
marko — Thank you for the feedback. As it turns out I am dealing with a rather large database (6TB) so it is not easy to debug as systematically as I would like. A SST alone takes 4–6 hours and CHECK TABLE over all the tables takes is a very long operation, so it's not entirely practical to just "try things". (This is a production system for which I cannot afford any down time.) That being written, I could take one node out of the Galera cluster and ensure it's DB is "clean", then re-introduce it to the cluster and let it IST. After that I could SST the other two nodes (now that I have innodb_log_optimize_ddl = OFF). In any case, I appreciate your thoughts and will continue to "see what I can do" given the size of the DB and restrictions of the environment. |
| Comment by Roope Pääkkönen (Inactive) [ 2021-02-26 ] |
|
We also tried the innodb_log_optimize_ddl=OFF setting on mariadb 10.3.25 but it didn't seem to improve things. And now on another servers that were updated to 10.3.28 from 10.3.25, where previously these messages appeared almost daily, we haven't seen them anymore. |
| Comment by Marko Mäkelä [ 2021-02-26 ] |
|
If this problem occurs due to I agree that with large amounts of data, a statistical approach rather than "try things" has to be followed. Hopefully I am right about |
| Comment by Roope Pääkkönen (Inactive) [ 2021-02-26 ] |
|
What would your recommended course of action be for existing databases, if the issue is fixed with 10.3.28? |
| Comment by Marko Mäkelä [ 2021-02-26 ] |
|
I think that this type of corruption could be fixed by logical dump and restore. I do not know Galera well, but I have understood that it supports a logical snapshot transfer. That should allow one corruption-free cluster node to be created. If this issue was only caused by You could also take a shortcut and assume that the InnoDB system tablespace is not corrupted. In that case, it would suffice to mysqldump, drop, and restore the corrupted table. |
| Comment by Mark Reibert [ 2021-02-26 ] |
|
marko — thank you for the additional information and thoughts. For my large (6TB) Galera cluster I can wipe a node and let it recreate the DB from scratch via a SST, but it seems I should first upgrade to 10.4.18 (I am currently on 10.4.14) on account of |
| Comment by Jan Lindström (Inactive) [ 2021-02-27 ] |
|
Yes, you can recreate DB from scratch via a SST but do not use rsync or mariabackup methods, you need to use mysqldump method. |
| Comment by Mark Reibert [ 2021-03-01 ] |
|
Hello jplindst — Unfortunately with a 6TB database mariabackup is my only practical option given the 6 hours it takes for a SST and the other methods blocking the donor during that time. |
| Comment by Jan Lindström [ 2023-04-11 ] |
|
10.3 will EOL soon. |
| Comment by Marko Mäkelä [ 2023-04-11 ] |
|
I think that this corruption should affect MariaDB 10.4 as well. The root cause could be a hard-to-reproduce InnoDB crash recovery failure. In 10.5, the recovery was rewritten as part of |