[MDEV-32132] DROP INDEX followed by CREATE INDEX may corrupt data Created: 2023-09-08  Updated: 2023-12-01  Resolved: 2023-09-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.2, 10.2, 10.3, 10.4
Fix Version/s: 10.4.32

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, crash, upstream

Issue Links:
Relates
relates to MDEV-20934 Infinite loop on innodb_fast_shutdown... Closed
relates to MDEV-29694 Remove the InnoDB change buffer Closed
relates to MDEV-30009 InnoDB shutdown hangs when the change... Closed
relates to MDEV-32044 Mariadb crash after upgrading to 11.0... Closed

 Description   

In MDEV-30009 we reproduced and analyzed a data corruption scenario that has affected some users and customers.

The scenario is as follows (adapting from MDEV-32044):

  1. The InnoDB change buffer was enabled, and some INSERT operations for page 0x60bb were buffered in the past. This was the default until the change buffer was disabled by default in MDEV-27734.
  2. DROP INDEX was executed on this index while buffered changes existed. The entries in the change buffer became stale at this point. This is how Heikki Tuuri designed it; he was a friend of lazy execution at that time.
  3. CREATE INDEX or ALTER TABLE…ADD INDEX was executed on the table, using a server version this bug had not been fixed. Page 0x60bb was allocated for this index. Heikki Tuuri’s design was such that any stale change buffer entries would be removed at the time when a page is reallocated for something else. This was missed by those who implemented and reviewed an optimization of InnoDB ALTER TABLE in MySQL 5.7, which was applied to MariaDB 10.2.2 without any review.
  4. (the rest is different from MDEV-32044): Some changes were buffered for the newly created index page 0x60bb.
  5. At some later point, the stale change buffer entries were wrongly applied to the page 0x60bb in addition to the newly written entries. This would corrupt the secondary index and could cause a crash on change buffer merge, typically due to an unexpected page overflow.

Notes:

  1. Before MDEV-19514 was implemented in 10.5, a change buffer merge could happen at any point of time, even when the affected table was not being accessed by higher-level operations.
  2. Before MDEV-13542 and some related changes were implemented in MariaDB Server 10.6, corruption detected during change buffer merge would lead to a crash. This would also happen during CHECK TABLE. (See MDEV-28349.)

The bug in index creation is that ibuf_set_bitmap_for_bulk_load() would fail to invoke ibuf_delete_recs() to remove any stale buffered entries for the page that is being reused for the being-created secondary index. When MDEV-30009 was fixed in MariaDB Server 10.5.19 and later, we forgot to fix this in older major versions of MariaDB Server.

This ticket is about porting the fix of ibuf_set_bitmap_for_bulk_load() to MariaDB Server 10.4.



 Comments   
Comment by Manuel Arostegui [ 2023-09-14 ]

We believe we've been hit by this bug when upgrading to 10.4.31. We saw one crash on an specific table with:

Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: Failing assertion: !cursor->index->is_committed()
Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: We intentionally generate a memory trap.
Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: If you get repeated assertion failures or crashes, even
Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: immediately after the mysqld startup, there may be
Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
Sep 02 15:32:14 db1128 mysqld[1952629]: InnoDB: about forcing recovery.
Sep 02 15:32:14 db1128 mysqld[1952629]: 230902 15:32:14 [ERROR] mysqld got signal 6 ;
Sep 02 15:32:14 db1128 mysqld[1952629]: This could be because you hit a bug. It is also possible that this binary
Sep 02 15:32:14 db1128 mysqld[1952629]: or one of the libraries it was linked against is corrupt, improperly built,
Sep 02 15:32:14 db1128 mysqld[1952629]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 02 15:32:14 db1128 mysqld[1952629]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Sep 02 15:32:14 db1128 mysqld[1952629]: We will try our best to scrape up some info that will hopefully help
Sep 02 15:32:14 db1128 mysqld[1952629]: diagnose the problem, but since we have already crashed,
Sep 02 15:32:14 db1128 mysqld[1952629]: something is definitely wrong and this may fail.
Sep 02 15:32:14 db1128 mysqld[1952629]: Server version: 10.4.31-MariaDB-log source revision: 2aea9387497cecb5668ef605b8f80886f9de812c
Sep 02 15:32:14 db1128 mysqld[1952629]: key_buffer_size=134217728
Sep 02 15:32:14 db1128 mysqld[1952629]: read_buffer_size=131072
Sep 02 15:32:14 db1128 mysqld[1952629]: max_used_connections=247
Sep 02 15:32:14 db1128 mysqld[1952629]: max_threads=2010
Sep 02 15:32:14 db1128 mysqld[1952629]: thread_count=54
Sep 02 15:32:14 db1128 mysqld[1952629]: It is possible that mysqld could use up to
Sep 02 15:32:14 db1128 mysqld[1952629]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4753830 K  bytes of memory
Sep 02 15:32:14 db1128 mysqld[1952629]: Hope that's ok; if not, decrease some variables in the equation.
Sep 02 15:32:14 db1128 mysqld[1952629]: Thread pointer: 0x7f93a8001538
Sep 02 15:32:14 db1128 mysqld[1952629]: Attempting backtrace. You can use the following information to find out
Sep 02 15:32:14 db1128 mysqld[1952629]: where mysqld died. If you see no messages after this, something went
Sep 02 15:32:14 db1128 mysqld[1952629]: terribly wrong...
Sep 02 15:32:14 db1128 mysqld[1952629]: stack_bottom = 0x7ff2ac481760 thread_stack 0x30000
Sep 02 15:32:15 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(my_print_stacktrace+0x2e)[0x556be23d470e]
Sep 02 15:32:15 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(handle_fatal_signal+0x55d)[0x556be1e56c7d]
Sep 02 15:32:16 db1128 mysqld[1952629]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7ff2ba678140]
Sep 02 15:32:17 db1128 mysqld[1952629]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7ff2ba1b0ce1]
Sep 02 15:32:17 db1128 mysqld[1952629]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7ff2ba19a537]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0x5c68c0)[0x556be1b348c0]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0x5b2f0b)[0x556be1b20f0b]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0xc84717)[0x556be21f2717]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0xc84f65)[0x556be21f2f65]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0xc961e5)[0x556be22041e5]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0xbd98ef)[0x556be21478ef]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(_ZN7handler12ha_write_rowEPKh+0x326)[0x556be1e64ea6]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x19d)[0x556be1c12a8d]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0xb0f)[0x556be1c1d42f]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(_Z21mysql_execute_commandP3THD+0x191c)[0x556be1c49c6c]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x223)[0x556be1c50af3]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x6dd)[0x556be1f6e03d]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0x628a24)[0x556be1b96a24]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(handle_slave_sql+0x1672)[0x556be1ba09a2]
Sep 02 15:32:19 db1128 mysqld[1952629]: /opt/wmf-mariadb104/bin/mysqld(+0xb77c02)[0x556be20e5c02]
Sep 02 15:32:20 db1128 mysqld[1952629]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7ff2ba66cea7]
Sep 02 15:32:22 db1128 mysqld[1952629]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7ff2ba273a2f]
Sep 02 15:32:22 db1128 mysqld[1952629]: Trying to get some variables.
Sep 02 15:32:22 db1128 mysqld[1952629]: Some pointers may be invalid and cause the dump to abort.
Sep 02 15:32:22 db1128 mysqld[1952629]: Query (0x7f93a904e695): INSERT /* MediaWiki\User\UserOptionsManager::saveOptionsInternal  */ IGNORE INTO `user_properties` (up_user,up_property,up_value) VALUES (395179,'echo-subscriptions-push-ge-newcomer',''),(395179,'skin','time>
Sep 02 15:32:22 db1128 mysqld[1952629]: Connection ID (thread ID): 10
Sep 02 15:32:22 db1128 mysqld[1952629]: Status: NOT_KILLED
Sep 02 15:32:22 db1128 mysqld[1952629]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_k>
Sep 02 15:32:22 db1128 mysqld[1952629]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Sep 02 15:32:22 db1128 mysqld[1952629]: information that should help you find out what is causing the crash.
Sep 02 15:32:22 db1128 mysqld[1952629]: Writing a core file...
Sep 02 15:32:22 db1128 mysqld[1952629]: Working directory at /srv/sqldata
Sep 02 15:32:22 db1128 mysqld[1952629]: Resource Limits:
Sep 02 15:32:22 db1128 mysqld[1952629]: Limit                     Soft Limit           Hard Limit           Units
Sep 02 15:32:22 db1128 mysqld[1952629]: Max cpu time              unlimited            unlimited            seconds
Sep 02 15:32:22 db1128 mysqld[1952629]: Max file size             unlimited            unlimited            bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max data size             unlimited            unlimited            bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max stack size            8388608              unlimited            bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max core file size        0                    0                    bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max resident set          unlimited            unlimited            bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max processes             2058221              2058221              processes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max open files            200001               200001               files
Sep 02 15:32:22 db1128 mysqld[1952629]: Max locked memory         65536                65536                bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max address space         unlimited            unlimited            bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max file locks            unlimited            unlimited            locks
Sep 02 15:32:22 db1128 mysqld[1952629]: Max pending signals       2058221              2058221              signals
Sep 02 15:32:22 db1128 mysqld[1952629]: Max msgqueue size         819200               819200               bytes
Sep 02 15:32:22 db1128 mysqld[1952629]: Max nice priority         0                    0
Sep 02 15:32:22 db1128 mysqld[1952629]: Max realtime priority     0                    0
Sep 02 15:32:22 db1128 mysqld[1952629]: Max realtime timeout      unlimited            unlimited            us
Sep 02 15:32:22 db1128 mysqld[1952629]: Core pattern: /var/tmp/core/core.%h.%e.%p.%t
Sep 02 15:32:22 db1128 mysqld[1952629]: Kernel version: Linux version 5.10.0-25-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.191-1 (2023-08-16)
Sep 02 15:32:24 db1128 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT

We thought it could have been a one time thing and we recloned the host. Today we saw the same crash again, with the same table

Sep 13 21:21:46 db1128 mysqld[1632625]: 2023-09-13 21:21:46 0x7ea63fea1700  InnoDB: Assertion failure in file /root/mariadb-10.4.31/storage/innobase/row/row0ins.cc line 219
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: Failing assertion: !cursor->index->is_committed()
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: We intentionally generate a memory trap.
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: If you get repeated assertion failures or crashes, even
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: immediately after the mysqld startup, there may be
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: corruption in the InnoDB tablespace. Please refer to
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
Sep 13 21:21:46 db1128 mysqld[1632625]: InnoDB: about forcing recovery.
Sep 13 21:21:46 db1128 mysqld[1632625]: 230913 21:21:46 [ERROR] mysqld got signal 6 ;
Sep 13 21:21:46 db1128 mysqld[1632625]: This could be because you hit a bug. It is also possible that this binary
Sep 13 21:21:46 db1128 mysqld[1632625]: or one of the libraries it was linked against is corrupt, improperly built,
Sep 13 21:21:46 db1128 mysqld[1632625]: or misconfigured. This error can also be caused by malfunctioning hardware.
Sep 13 21:21:46 db1128 mysqld[1632625]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Sep 13 21:21:46 db1128 mysqld[1632625]: We will try our best to scrape up some info that will hopefully help
Sep 13 21:21:46 db1128 mysqld[1632625]: diagnose the problem, but since we have already crashed,
Sep 13 21:21:46 db1128 mysqld[1632625]: something is definitely wrong and this may fail.
Sep 13 21:21:46 db1128 mysqld[1632625]: Server version: 10.4.31-MariaDB-log source revision: 2aea9387497cecb5668ef605b8f80886f9de812c
Sep 13 21:21:46 db1128 mysqld[1632625]: key_buffer_size=134217728
Sep 13 21:21:46 db1128 mysqld[1632625]: read_buffer_size=131072
Sep 13 21:21:46 db1128 mysqld[1632625]: max_used_connections=226
Sep 13 21:21:46 db1128 mysqld[1632625]: max_threads=2010
Sep 13 21:21:46 db1128 mysqld[1632625]: thread_count=43
Sep 13 21:21:46 db1128 mysqld[1632625]: It is possible that mysqld could use up to
Sep 13 21:21:46 db1128 mysqld[1632625]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4753830 K  bytes of memory
Sep 13 21:21:46 db1128 mysqld[1632625]: Hope that's ok; if not, decrease some variables in the equation.
Sep 13 21:21:46 db1128 mysqld[1632625]: Thread pointer: 0x7ea618000d28
Sep 13 21:21:46 db1128 mysqld[1632625]: Attempting backtrace. You can use the following information to find out
Sep 13 21:21:46 db1128 mysqld[1632625]: where mysqld died. If you see no messages after this, something went
Sep 13 21:21:46 db1128 mysqld[1632625]: terribly wrong...
Sep 13 21:21:46 db1128 mysqld[1632625]: stack_bottom = 0x7ea63fea0760 thread_stack 0x30000
Sep 13 21:21:55 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(my_print_stacktrace+0x2e)[0x5629038d670e]
Sep 13 21:21:55 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(handle_fatal_signal+0x55d)[0x562903358c7d]
Sep 13 21:22:03 db1128 mysqld[1632625]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7f06271e4140]
Sep 13 21:22:11 db1128 mysqld[1632625]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f0626d1cce1]
Sep 13 21:22:11 db1128 mysqld[1632625]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f0626d06537]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0x5c68c0)[0x5629030368c0]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0x5b2f0b)[0x562903022f0b]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0xc84717)[0x5629036f4717]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0xc84f65)[0x5629036f4f65]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0xc961e5)[0x5629037061e5]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0xbd98ef)[0x5629036498ef]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(_ZN7handler12ha_write_rowEPKh+0x326)[0x562903366ea6]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x19d)[0x562903114a8d]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0xb0f)[0x56290311f42f]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(_Z21mysql_execute_commandP3THD+0x191c)[0x56290314bc6c]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x223)[0x562903152af3]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x6dd)[0x56290347003d]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0x628a24)[0x562903098a24]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(handle_slave_sql+0x1672)[0x5629030a29a2]
Sep 13 21:22:19 db1128 mysqld[1632625]: /opt/wmf-mariadb104/bin/mysqld(+0xb77c02)[0x5629035e7c02]
Sep 13 21:22:28 db1128 mysqld[1632625]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f06271d8ea7]
Sep 13 21:22:37 db1128 mysqld[1632625]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f0626ddfa2f]
Sep 13 21:22:37 db1128 mysqld[1632625]: Trying to get some variables.
Sep 13 21:22:37 db1128 mysqld[1632625]: Some pointers may be invalid and cause the dump to abort.
Sep 13 21:22:37 db1128 mysqld[1632625]: Query (0x7ea618f3de25): INSERT /* MediaWiki\User\UserOptionsManager::saveOptionsInternal  */ IGNORE INTO `user_properties` (up_user,up_property,up_value) VALUES (20542576,'advancedsearch-disable-local-exception',''),(20542576,'betafeatures-auto-enroll-local-exception',''),(20542576,'ccmeonemails-local-exception',''),(20542576,'centralnotice-display-campaign-type-advocacy',''),(20542576,'centralnotice-display-campaign-type-advocacy-local-exception',''),(20542576,'centralnotice-display-campaign-type-article-writing-local-exception',''),(20542576,'centralnotice-display-campaign-type-event-local-exception',''),(20542576,'centralnotice-display-campaign-type-fundraising',''),(20542576,'centralnotice-display-campaign-type-fundraising-local-exception',''),(20542576,'centralnotice-display-campaign-type-governance-local-exception',''),(20542576,'centralnotice-display-campaign-type-photography-local-exception',''),(20542576,'cirrussearch-pref-completion-profile-local-exception',''),(20542576,'compact-language-links-local-exception',''),(20542576,'date-local-exception',''),(20542576,'diffonly-local-exception',''),(20542576,'disablemail-local-exception',''),(20542576,'discussiontools-autotopicsub-local-exception',''),(20542576,'discussiontools-betaenable-local-exception',''),(20542576,'discussiontools-newtopictool-createpage-local-exception',''),(20542576,'discussiontools-newtopictool-local-exception',''),(20542576,'discussiontools-replytool-local-exception',''),(20542576,'discussiontools-sourcemodetoolbar-local-exception',''),(20542576,'discussiontools-topicsubscription-local-exception',''),(20542576,'discussiontools-visualenhancements-local-exception',''),(20542576,'echo-cross-wiki-notifications',''),(20542576,'echo-cross-wiki-notifications-local-exception',''),(20542576,'echo-dont-email-read-notifications-local-exception',''),(20542576,'echo-email-format-local-exception',''),(20542576,'echo-email-frequency-local-exception',''),(20542576,'echo-notifications-page-linked-title-muted-list-local-exception',''),(20542576,'echo-subscriptions-email-ge-newcomer',''),(20542576,'echo-subscriptions-email-login-fail',''),(20542576,'echo-subscriptions-email-login-success',''),(20542576,'echo-subscriptions-push-dt-subscription-archiving',''),(20542576,'echo-subscriptions-push-ge-newcomer',''),(20542576,'echo-subscriptions-web-ge-newcomer',''),(20542576,'editfont-local-exception',''),(20542576,'email-allow-new-users-local-exception',''),(20542576,'enotifminoredits-local-exception',''),(20542576,'enotifwatchlistpages-local-exception',''),(20542576,'extendwatchlist-local-exception',''),(20542576,'forceeditsummary-local-exception',''),(20542576,'gadget-ReferenceTooltips',''),(20542576,'gadget-extra-toolbar-buttons',''),(20542576,'gender-local-exception',''),(20542576,'growthexperiments-help-panel-tog-help-panel-local-exception',''),(20542576,'growthexperiments-homepage-enable-local-exception',''),(20542576,'hidecategorization',''),(20542576,'hidecategorization-local-exception',''),(20542576,'hideminor-local-exception',''),(20542576,'imagesize-local-exception',''),(20542576,'ipinfo-beta-feature-enable-local-exception',''),(20542576,'language-local-exception',''),(20542576,'math-local-exception',''),(20542576,'multimediaviewer-enable-local-exception',''),(20542576,'norollbackdiff-local-exception',''),(20542576,'previewonfirst-local-exception',''),(20542576,'previewontop',''),(20542576,'previewontop-local-exception',''),(20542576,'rcdays-local-exception',''),(20542576,'rcenhancedfilters-disable-local-exception',''),(20542576,'rclimit-local-exception',''),(20542576,'rcshowwikidata-local-exception',''),(20542576,'requireemail-local-exception',''),(20542576,'revisionslider-disable-local-exception',''),(20542576,'searchlimit-local-exception',''),(20542576,'showhiddencats-local-exception',''),(20542576,'skin-local-exception',''),(20542576,'skin-responsive-local-exception',''),(20542576,'thumbsize-local-exception',''),(20542576,'timecorrection-local-exception',''),(20542576,'twocolconflict-local-exception',''),(20542576,'underline-local-exception',''),(20542576,'usebetatoolbar-local-exception','1'),(20542576,'usecodemirror-colorblind',''),(20542576,'usecodemirror-colorblind-local-exception',''),(20542576,'useeditwarning-local-exception',''),(20542576,'uselivepreview-local-exception',''),(20542576,'usenewrc-local-exception',''),(20542576,'vector-limited-width-local-exception',''),(20542576,'visualeditor-betatempdisable-local-exception',''),(20542576,'visualeditor-newwikitext-local-exception',''),(20542576,'watchcreations-local-exception',''),(20542576,'watchdefault-local-exception',''),(20542576,'watchlistdays-local-exception',''),(20542576,'watchlisthideanons-local-exception',''),(20542576,'watchlisthideliu-local-exception',''),(20542576,'watchlisthideminor-local-exception',''),(20542576,'watchlisthideown-local-exception',''),(20542576,'watchlistunwatchlinks-local-exception',''),(20542576,'watchmoves-local-exception',''),(20542576,'wikilove-enabled-local-exception',''),(20542576,'wlenhancedfilters-disable-local-exception',''),(20542576,'wllimit-local-exception',''),(20542576,'wlshowwikibase-local-exception',''),(20542576,'usebetatoolbar',''),(20542576,'echo-notifications-page-linked-title-muted-list-local-exception-local-exception','')
Sep 13 21:22:37 db1128 mysqld[1632625]: Connection ID (thread ID): 13616
Sep 13 21:22:37 db1128 mysqld[1632625]: Status: NOT_KILLED
Sep 13 21:22:37 db1128 mysqld[1632625]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=off,condition_pushdown_from_having=on
Sep 13 21:22:37 db1128 mysqld[1632625]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Sep 13 21:22:37 db1128 mysqld[1632625]: information that should help you find out what is causing the crash.
Sep 13 21:22:37 db1128 mysqld[1632625]: Writing a core file...
Sep 13 21:22:37 db1128 mysqld[1632625]: Working directory at /srv/sqldata
Sep 13 21:22:37 db1128 mysqld[1632625]: Resource Limits:
Sep 13 21:22:37 db1128 mysqld[1632625]: Limit                     Soft Limit           Hard Limit           Units
Sep 13 21:22:37 db1128 mysqld[1632625]: Max cpu time              unlimited            unlimited            seconds
Sep 13 21:22:37 db1128 mysqld[1632625]: Max file size             unlimited            unlimited            bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max data size             unlimited            unlimited            bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max stack size            8388608              unlimited            bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max core file size        0                    0                    bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max resident set          unlimited            unlimited            bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max processes             2058221              2058221              processes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max open files            200001               200001               files
Sep 13 21:22:37 db1128 mysqld[1632625]: Max locked memory         65536                65536                bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max address space         unlimited            unlimited            bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max file locks            unlimited            unlimited            locks
Sep 13 21:22:37 db1128 mysqld[1632625]: Max pending signals       2058221              2058221              signals
Sep 13 21:22:37 db1128 mysqld[1632625]: Max msgqueue size         819200               819200               bytes
Sep 13 21:22:37 db1128 mysqld[1632625]: Max nice priority         0                    0
Sep 13 21:22:37 db1128 mysqld[1632625]: Max realtime priority     0                    0
Sep 13 21:22:37 db1128 mysqld[1632625]: Max realtime timeout      unlimited            unlimited            us
Sep 13 21:22:37 db1128 mysqld[1632625]: Core pattern: /var/tmp/core/core.%h.%e.%p.%t
Sep 13 21:22:37 db1128 mysqld[1632625]: Kernel version: Linux version 5.10.0-25-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.191-1 (2023-08-16)
Sep 13 21:22:54 db1128 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT

After spending quite some time with marko today with some gdb traces, it looks like we've been hit by this bug on this table which had a DROP+CREATE index a few years ago and where we had change buffer enabled.

+------------------------+-------+----------+------------------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------------+-------+----------+------------------------------------------------------------------------------+
| enwiki.user_properties | check | Warning | InnoDB: Index 'up_property' contains 270261695 entries, should be 270261639. |
| enwiki.user_properties | check | error | Corrupt |
+------------------------+-------+----------+------------------------------------------------------------------------------+
2 rows in set (9 min 57.704 sec)

Once that table was optimized, replication started to flow again, but the host crashed with the same exception, but this time with a different table, which also had a schema change a few years ago, and which has revealed to have corrupted indexes:

| Table | Op | Msg_type | Msg_text |
+------------------+-------+----------+-------------------------------------------------------------------------------------------+
| enwiki.pagelinks | check | Warning | InnoDB: Index 'pl_namespace' contains 1549475640 entries, should be 1549475630. |
| enwiki.pagelinks | check | Warning | InnoDB: Index 'pl_backlinks_namespace' contains 1549475640 entries, should be 1549475630. |
| enwiki.pagelinks | check | error | Corrupt |
+------------------+-------+----------+-------------------------------------------------------------------------------------------+
3 rows in set (2 hours 2 min 27.976 sec)

For now we are optimizing all the tables in this host.
Thanks Marko for your help here!

Comment by Manuel Arostegui [ 2023-09-19 ]

For the record, even after optimizing the broken tables (and they were being reported ok by a check table) the host kept crashing with the same issue (and same table).

I've downgraded the host back to its previous version 10.4.28, recloned it from the original host it was recloned when it crashed, and so far no crashes. There must be something more specific within 10.4.31 and 10.4.28 and maybe that table specific usage/data/schema that makes it keeps crashing on 10.4.31 but not on 10.4.28.

Comment by Marko Mäkelä [ 2023-09-20 ]

marostegui, there are two InnoDB global variables that will prevent OPTIMIZE TABLE from actually rebuilding the table. ALTER TABLE…FORCE is immune to those settings:

  • innodb_defragment (deprecated in MDEV-30544, removed in MDEV-30545)
  • innodb_optimize_fulltext_only: if there are FULLTEXT INDEX on the table, only an OPTIMIZE operation of those will be executed

If neither parameter applies here, I do not know what could be the problem. Any table rebuild will assign a new tablespace ID and therefore disassociate any old change buffer entries with the new data file. I think that further root cause analysis of the corruption would be needed. Crashes on corruption were fixed in MariaDB Server 10.6 in MDEV-13542 and some related changes.

Comment by Manuel Arostegui [ 2023-09-20 ]

Thanks for following up Marko. Neither of those variables are set, and we indeed do not have a FULL TEXT index on that table....

We are in process of migrating from 10.4 to 10.6 but this will take many months yet, so we do keep upgrading our 10.4 series

Thanks again for your time

Generated at Thu Feb 08 10:29:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.