[MDEV-28002] Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->page.id.page_no() Created: 2022-03-04  Updated: 2022-10-18  Resolved: 2022-06-06

Status: Closed
Project: MariaDB Server
Component/s: Admin statements
Affects Version/s: 10.3.9
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2

Type: Bug Priority: Critical
Reporter: Cassilia Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, innodb
Environment:

10.3.9-MariaDB-1:10.3.9+maria~bionic mariadb.org binary distribution

  1. dpkg -l | grep maria
    ii mariadb-backup 1:10.3.9+maria~bionic amd64 Backup tool for MariaDB server
    ii mariadb-client-10.3 1:10.3.9+maria~bionic amd64 MariaDB database client binaries
    ii mariadb-client-core-10.3 1:10.3.9+maria~bionic amd64 MariaDB database core client binaries
    ii mariadb-common 1:10.3.9+maria~bionic all MariaDB database common files (e.g. /etc/mysql/conf.d/mariadb.cnf)
    ii mariadb-server 1:10.3.9+maria~bionic all MariaDB database server (metapackage depending on the latest version)
    ii mariadb-server-10.3 1:10.3.9+maria~bionic amd64 MariaDB database server binaries
    ii mariadb-server-core-10.3 1:10.3.9+maria~bionic amd64 MariaDB database core server files
    ii mysql-common 1:10.3.9+maria~bionic all MariaDB database common files (e.g. /etc/mysql/my.cnf)

Issue Links:
Relates
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed

 Description   

Hello, my name is Cassil.
I am a new employee engineer.
I don't know if it's okay for me to write here.
I'm not good at English. Please excuse me.

Our company uses mariadb.
MariaDB is being used as a POD of k8s. Maria DB POD continues to be restarted with Error and CrashLoopBack occurring in our products.

I don't know what the cause is. Could you please check the 'LOG' for what error this is?

2022-02-28  7:22:06 0 [Note] mysqld (mysqld 10.3.9-MariaDB-1:10.3.9+maria~bionic) starting as process 1 ...  
2022-02-28  7:22:06 0 [Note] InnoDB: Using Linux native AIO  2022-02-28  7:22:06 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2022-02-28  7:22:06 0 [Note] InnoDB: Uses event mutexes  
2022-02-28  7:22:06 0 [Note] InnoDB: Compressed tables use zlib 1.2.11  
2022-02-28  7:22:06 0 [Note] InnoDB: Using SSE2 crc32 instructions 
2022-02-28  7:22:06 0 [Note] InnoDB: Initializing buffer pool, total size = 32M, instances = 1, chunk size = 32M  
2022-02-28  7:22:06 0 [Note] InnoDB: Completed initialization of buffer pool 
2022-02-28  7:22:06 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().  
2022-02-28  7:22:06 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=16477207  
2022-02-28  7:22:06 0 [Note] InnoDB: Ignoring data file './soda/build_docker_svc_temp_code.ibd' with space ID 187, since the redo log references ./soda/build_docker_svc_temp_code.ibd with space ID 186.
2022-02-28  7:22:07 0 [Note] InnoDB: Starting final batch to recover 30 pages from redo log.
2022-02-28  7:22:07 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2022-02-28  7:22:07 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-02-28  7:22:07 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-02-28  7:22:07 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-02-28  7:22:07 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-02-28  7:22:07 0 [Note] InnoDB: Waiting for purge to start
2022-02-28  7:22:07 0 [Note] InnoDB: 10.3.9 started; log sequence number 16506873; transaction id 130154
2022-02-28  7:22:07 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-02-28  7:22:07 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-02-28  7:22:07 0 [Note] Recovering after a crash using tc.log
2022-02-28  7:22:07 0 [Note] Starting crash recovery...
2022-02-28  7:22:07 0 [Note] Crash recovery finished.
2022-02-28  7:22:08 0 [Note] Server socket created on IP: '::'.
2022-02-28  7:22:08 0 [Note] Server socket created on IP: '::'.
2022-02-28  7:22:08 0 [Note] InnoDB: Buffer pool(s) load completed at 220228  7:22:08 
2022-02-28  7:22:08 0 [Warning] 'proxies_priv' entry '@% root@devai-service-mariadb-6b8cc7855d-kwcpc' ignored in --skip-name-resolve mode.
2022-02-28  7:22:08 0 [Note] Reading of all Master_info entries succeded
2022-02-28  7:22:08 0 [Note] Added new Master_info '' to hash table 
2022-02-28  7:22:08 0 [Note] mysqld: ready for connections.Version: '10.3.9-MariaDB-1:10.3.9+maria~bionic'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
2022-02-28 07:23:02 0x7fd85423d700
InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.9/storage/innobase/btr/btr0pcur.cc line 461
InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->page.id.page_no()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even 
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes
InnoDB: about forcing recovery. 
 
220228  7:23:02 [ERROR] mysqld got signal 6 ;  This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built,  or misconfigured. This error can also be caused by malfunctioning hardware.  To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully helpdiagnose the problem, but since we have already crashed,something is definitely wrong and this may fail.  Server version: 10.3.9-MariaDB-1:10.3.9+maria~bionic key_buffer_size=134217728 read_buffer_size=2097152  max_used_connections=2 max_threads=155  thread_count=8It is possible that mysqld could use up to  key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1086832 K  bytes of memory  Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7fd7f4000c08  Attempting backtrace. You can use the following information to find outwhere mysqld died. If you see no messages after this, something wentterribly wrong...stack_bottom = 0x7fd85423cdd8 thread_stack 0x49000mysqld(my_print_stacktrace+0x2e)[0x564aa452508e]  
 
mysqld(handle_fatal_signal+0x5a5)[0x564aa3fbe515] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fd85b965890]  /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fd85ae74e97] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fd85ae76801]
mysqld(+0x49d558)[0x564aa3d12558]
mysqld(+0xa43ec8)[0x564aa42b8ec8]
mysqld(+0x9acf0f)[0x564aa4221f0f]
mysqld(+0x8d5b47)[0x564aa414ab47]
mysqld(_ZN7handler11ha_rnd_nextEPh+0x127)[0x564aa3fc31f7]
mysqld(_Z13rr_sequentialP11READ_RECORD+0x22)[0x564aa40d2782]
mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x1b0)[0x564aa3e1ee70]
mysqld(_ZN4JOIN10exec_innerEv+0x8be)[0x564aa3e3d65e]
mysqld(_ZN4JOIN4execEv+0x33)[0x564aa3e3da23]
mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xeb)[0x564aa3e3db6b]mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x14d)[0x564aa3e3e53d]
mysqld(+0x56aaf1)[0x564aa3ddfaf1]  mysqld(_Z21mysql_execute_commandP3THD+0x6132)[0x564aa3dec542]
mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x245)[0x564aa3def055]
mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1ca7)[0x564aa3df1a17]
mysqld(_Z10do_commandP3THD+0x178)[0x564aa3df2818]
mysqld(_Z24do_handle_one_connectionP7CONNECT+0x212)[0x564aa3ec0cc2]
mysqld(handle_one_connection+0x3d)[0x564aa3ec0e9d]/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7fd85b95a6db]/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fd85af5788f]
Trying to get some variables.Some pointers may be invalid and cause the dump to abort. 
 
Query (0x7fd7f4012180): SELECT key_labels, send_dt, alert_level, total_labels,LPAD((SELECT count(*)  FROM alert_msg WHERE DATE(send_dt) = DATE(now())),3,'0') AS cnt FROM alert_msgWHERE TIMESTAMPDIFF(second, send_dt, now()) < 60 INTO OUTFILE '/var/lib/mysql/scheduled_bin/alert_msgs.csv' FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
 
Connection ID (thread ID): 48Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains 
information that should help you find out what is causing the crash.  Fatal signal 11 while backtracing  

To solve this error, what should I look at first?
..Do you have any idea?

thank you



 Comments   
Comment by Marko Mäkelä [ 2022-03-04 ]

The assertion failure expression says that the previous page of the next page is not pointing to the current page.

Such corruption could be caused by some unsafe handling of data files. For example, copying a data directory while the server is running, without using proper tools, or deleting the InnoDB redo log (ib_logfile*) to ‘fix’ crash recovery problems.

MariaDB Server 10.3.9 is pretty old too.

If you are lucky, OPTIMIZE TABLE may fix this corruption by rebuilding the table.

Comment by Marko Mäkelä [ 2022-05-16 ]

There is an old bug MDEV-13542 that is about avoiding InnoDB crashes when the database is corrupted. The assertion failed in btr_pcur_move_to_next_page(). I will fix this crash as part of MDEV-13542; all UNIV_BTR_DEBUG code must be replaced with error returns.

10.3.9 is a rather old version, which is affected by a serious bug MDEV-19916. But I don’t think that the corruption could be caused by that bug. A more likely cause could be a corruption of the file system (we usually use ext4 when developing or testing on GNU/Linux), an unsafe method of copying the database while the server was running, or a bug in mariabackup. Also the Galera snapshot transfer can corrupt data; see MDEV-27437 for an example.

I think that it could make sense to invoke mysqldump to make a logical dump of the database (with crashes possibly prevented by innodb_force_recovery=3, or innodb_force_recovery=4), and then load that SQL dump into a newly created database, using the latest 10.3 release.

Comment by Marko Mäkelä [ 2022-05-16 ]

Note to myself: In MDEV-27053 I identified some other code that uses a similar assertion. All that code will have to be adjusted in the MDEV-13542 fix.

Comment by Marko Mäkelä [ 2022-06-06 ]

This was fixed as part of MDEV-13542.

Generated at Thu Feb 08 09:57:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.