[MDEV-29153] 2022-07-22 18:12:56 0x7f3319fff700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/page/page0cur.cc line 11 Created: 2022-07-22  Updated: 2023-04-04  Resolved: 2023-04-04

Status: Closed
Project: MariaDB Server
Component/s: Platform Debian, Server, Storage Engine - InnoDB
Affects Version/s: 10.4.24
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kirill Gromov Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: recovery
Environment:

Debian 9


Issue Links:
Relates
relates to MDEV-12353 Efficient InnoDB redo log record format Closed
relates to MDEV-21724 Optimize page_cur_insert_rec_low() re... Closed

 Description   

After server soft reboot, MySQL Server 10.4.24 cant start
I tried to start with all innodb_force_recovery modes from 1 to 6, one by one. But nothing recovered. In mode 6 mysqld hangs, and mysql -p hangs after password.
Please help to start mysql with some data losses, because data isnt very critical in a part, but critical in overall. And I have no backups because database is very large 1,4Tb

22-07-22 18:12:56 0 [Note] InnoDB: Using Linux native AIO
22-07-22 18:12:56 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
22-07-22 18:12:56 0 [Note] InnoDB: Uses event mutexes
22-07-22 18:12:56 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
22-07-22 18:12:56 0 [Note] InnoDB: Number of pools: 1
22-07-22 18:12:56 0 [Note] InnoDB: Using SSE2 crc32 instructions
22-07-22 18:12:56 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
22-07-22 18:12:56 0 [Note] InnoDB: Completed initialization of buffer pool
22-07-22 18:12:56 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority()
22-07-22 18:12:56 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=8117163271706
22-07-22 18:12:56 0 [Note] InnoDB: Starting final batch to recover 11069 pages from redo log.
22-07-22 18:12:56 0x7f3319fff700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/page/page0cur.cc line 1154
noDB: We intentionally generate a memory trap.
noDB: Submit a detailed bug report to https://jira.mariadb.org/
noDB: If you get repeated assertion failures or crashes, even
noDB: immediately after the mysqld startup, there may be
noDB: corruption in the InnoDB tablespace. Please refer to
noDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
noDB: about forcing recovery.
0722 18:12:56 [ERROR] mysqld got signal 6 ;
is could be because you hit a bug. It is also possible that this binary
one of the libraries it was linked against is corrupt, improperly built,
misconfigured. This error can also be caused by malfunctioning hardware.

report this bug, see https://mariadb.com/kb/en/reporting-bugs

will try our best to scrape up some info that will hopefully help
agnose the problem, but since we have already crashed,
mething is definitely wrong and this may fail.

rver version: 10.4.25-MariaDB-1:10.4.25+maria~stretch
y_buffer_size=536870912
ad_buffer_size=131072
x_used_connections=0
x_threads=122
read_count=0
is possible that mysqld could use up to
y_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6789305 K bytes of memory
pe that's ok; if not, decrease some variables in the equation.

read pointer: 0x0
tempting backtrace. You can use the following information to find out
ere mysqld died. If you see no messages after this, something went
rribly wrong...
ack_bottom = 0x0 thread_stack 0x80000
dr2line: 'mysqld': Нет такого файла
sqld(my_print_stacktrace+0x2e)[0x56082302ec5e]
sqld(handle_fatal_signal+0x3af)[0x560822a731af]
ib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7f3380eef0e0]
ib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7f33802e7fff]
ib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f33802e942a]
dr2line: 'mysqld': Нет такого файла
sqld(+0x59541d)[0x56082278f41d]
sqld(+0xbd1fa3)[0x560822dcbfa3]
sqld(+0xbaf48c)[0x560822da948c]
sqld(+0xbb044f)[0x560822daa44f]
sqld(+0x591220)[0x56082278b220]
sqld(+0xce2848)[0x560822edc848]
sqld(+0xd4ccc1)[0x560822f46cc1]
sqld(+0xc5e158)[0x560822e58158]
ib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f3380ee54a4]
ib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f338039dd0f]
e manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
formation that should help you find out what is causing the crash.
iting a core file...
rking directory at /var/lib/mysql
source Limits:
tal signal 11 while backtracing



 Comments   
Comment by Marko Mäkelä [ 2022-07-22 ]

The assertion fails in page_cur_parse_insert_rec(), which attempts to apply a redo log record that inserts a record into an index page. This code was removed in MDEV-12353 and replaced with something conceptually similar in MDEV-21724.

Without having a reproducible test case, it is very hard to do anything about this. I think that there is a good chance that this type of error is not be possible in 10.5 or a later major release.

Comment by Kirill Gromov [ 2022-07-28 ]

Something changed since my last message.

1. I got luck to run MySQL in 6 recovery mode
2. But now when I tried to run mysqldump --all-databases, i got this message:

2022-07-28 11:42:46 0 [Warning] option 'innodb-purge-threads': unsigned value 0 adjusted to 1

2022-07-28 11:42:46 0 [Note] InnoDB: Started in read only mode

2022-07-28 11:42:46 0 [Note] InnoDB: Using Linux native AIO

2022-07-28 11:42:46 0 [Note] InnoDB: !!! innodb_force_recovery is set to 6 !!!

2022-07-28 11:42:46 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2022-07-28 11:42:46 0 [Note] InnoDB: Uses event mutexes

2022-07-28 11:42:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.8

2022-07-28 11:42:46 0 [Note] InnoDB: Number of pools: 1

2022-07-28 11:42:46 0 [Note] InnoDB: Using SSE2 crc32 instructions

2022-07-28 11:42:46 0 [Note] InnoDB: Disabling background log and ibuf IO write threads.

2022-07-28 11:42:47 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M

2022-07-28 11:42:47 0 [Note] InnoDB: Completed initialization of buffer pool

2022-07-28 11:42:47 0 [Note] InnoDB: innodb_force_recovery=6 skips redo log apply

2022-07-28 11:42:47 0 [Note] InnoDB: 10.4.25 started; log sequence number 0; transaction id 0

2022-07-28 11:42:47 0 [Note] Plugin 'FEEDBACK' is disabled.

2022-07-28 11:42:47 0 [Note] Server socket created on IP: '0.0.0.0'.

2022-07-28 11:42:47 0 [Warning] 'user' entry 'root@my-server' ignored in --skip-name-resolve mode.

2022-07-28 11:42:47 1 [Note] Event Scheduler: scheduler thread started with id 1

2022-07-28 11:42:47 2 [Warning] InnoDB: Allocated tablespace ID 4 for mysql/gtid_slave_pos, old maximum was 0

2022-07-28 11:42:47 0 [Note] mysqld: ready for connections.

Version: '10.4.25-MariaDB-1:10.4.25+maria~stretch' socket: '/var/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution

2022-07-28 11:42:53 6 [Warning] InnoDB: A transaction id in a record of table `mydb`.`host` is newer than the system-wide maximum.

220728 11:42:53 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.4.25-MariaDB-1:10.4.25+maria~stretch

key_buffer_size=536870912

read_buffer_size=131072

max_used_connections=3

max_threads=122

thread_count=4

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6789305 K bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f4e280009a8

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f4e680d3cf8 thread_stack 0x80000

addr2line: 'mysqld': Нет такого файла

mysqld(my_print_stacktrace+0x2e)[0x55e45382ac5e]

mysqld(handle_fatal_signal+0x3af)[0x55e45326f1af]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7f4eab6a40e0]

/lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f4eab69cb40]

addr2line: 'mysqld': Нет такого файла

mysqld(+0xbcd726)[0x55e4535c3726]

mysqld(+0xc643a6)[0x55e45365a3a6]

mysqld(+0xc660cc)[0x55e45365c0cc]

mysqld(+0xc803a0)[0x55e4536763a0]

mysqld(+0xc51eac)[0x55e453647eac]

mysqld(+0xc418cd)[0x55e4536378cd]

mysqld(+0xb54196)[0x55e45354a196]

mysqld(_ZN7handler11ha_rnd_nextEPh+0x12f)[0x55e453273f0f]

mysqld(_Z13rr_sequentialP11READ_RECORD+0x22)[0x55e4533a03b2]

mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x177)[0x55e4530b1e37]

mysqld(_ZN4JOIN10exec_innerEv+0xc11)[0x55e4530d3f21]

mysqld(_ZN4JOIN4execEv+0x33)[0x55e4530d42b3]

mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x18e)[0x55e4530d24ae]

mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0xf6)[0x55e4530d2e26]

mysqld(+0x5827ca)[0x55e452f787ca]

mysqld(_Z21mysql_execute_commandP3THD+0x18d1)[0x55e453076411]

mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x21a)[0x55e45307cbca]

mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1309)[0x55e45307ee99]

mysqld(_Z10do_commandP3THD+0x124)[0x55e4530804c4]

mysqld(_Z24do_handle_one_connectionP7CONNECT+0x242)[0x55e453162a52]

mysqld(handle_one_connection+0x3d)[0x55e453162b0d]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f4eab69a4a4]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4eaab52d0f]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7f4e2800fdc0): SELECT /!40001 SQL_NO_CACHE / `id`, `ctx`, `hostname`, `fqdns`, `asset`, `threshold_c`, `threshold_a`, `alert`, `persistence`, `nat`, `rrd_profile`, `descr`, `lat`, `lon`, `icon`, `country`, `external_host`, `permissions`, `av_component`, `created`, `updated` FROM `host`

Connection ID (thread ID): 6

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /var/lib/mysql

Resource Limits:

Limit Soft Limit Hard Limit Units

Max cpu time unlimited unlimited seconds

Max file size unlimited unlimited bytes

Max data size unlimited unlimited bytes

Max stack size 8388608 unlimited bytes

Max core file size 0 unlimited bytes

Max resident set unlimited unlimited bytes

Max processes 128532 128532 processes

Max open files 1000000 1000000 files

Max locked memory 65536 65536 bytes

Max address space unlimited unlimited bytes

Max file locks unlimited unlimited locks

Max pending signals 128532 128532 signals

Max msgqueue size 819200 819200 bytes

Max nice priority 0 0

Max realtime priority 0 0

Max realtime timeout unlimited unlimited us

Core pattern: core

When I run SELECT * FROM hosts in MySQL, I got a result with no errors

3. I cant Upgrade to MariaDB 10.6, because MySQL says: "You have to recover database first before Upgrade!"

Maybe if I drop all indexes i got a chance to dump all databases? What do you think?

Comment by Marko Mäkelä [ 2022-07-28 ]

Kirill Gromov, this seems to be your problem:

2022-07-28 11:42:53 6 [Warning] InnoDB: A transaction id in a record of table `mydb`.`host` is newer than the system-wide maximum.

One possible explanation is that the database was originally created and written a long time ago, and some garbage was not correctly ignored when upgrading to 10.3 or 10.4; see MDEV-27800.

Can you try to produce a full stack trace of the crash?

You may be able to make a logical dump of the database (mysqldump) if you use the READ UNCOMMITTED transaction isolation level. If the server can be shut down cleanly, then also --innodb-read-only should work; the server would not start up if recovery is needed (MDEV-11814).

Because the transaction data is corrupted, you cannot trust that the data is entirely consistent.

Comment by Marko Mäkelä [ 2022-07-28 ]

Sorry, I missed one more thing. After innodb_force_recovery=6, often all hope is lost. That option causes the redo log to be ignored completely. Depending on which unwritten changes to data pages the buffer pool contained when the server had been previously killed, you could get wildly inconsistent data, such as B-tree pages pointing to pages that no longer (or not yet) are valid pages. It could also be that some undo log pages were not yet written, but an index page contains a newer DB_TRX_ID, causing the message that I pointed out. Similar trouble can be caused if a logically nonempty log file is deleted. That is why I implemented MDEV-27199.

Comment by Kirill Gromov [ 2022-07-28 ]

Seems you're right with your guesses about garbage from upgrades. But it works near half a year after upgrade with no errors.
I can't produce a full stack trace because I have a stripped version of MariaDB, and reading of manual couldn't clarify me how to enable it.

I tried to run READ-UNCOMMITED and innodb-read-only modes, but mysqldump still doesn't work

Most of the database consists of events that are not of great value. That is most fresh data in the database with most great sequence numbers. So I'll be glad if lost or delete them.

But there are some static data that were entered manually and I really want to save them.
I also want to keep the database operational so that I don't have to reinstall software product from scratch.

So, please advise me to do with my broken database!
Is it possible to achieve my goals at all?

Comment by Kirill Gromov [ 2022-07-28 ]

Maybe there is some way to delete all data with sequence number greater than system-wide maximum?

Comment by Marko Mäkelä [ 2022-07-28 ]

At any other isolation level than READ UNCOMMITTED, I think that those records that carry a newer DB_TRX_ID than the global maximum should be omitted from the output.

If you could attach gdb to the database process before trying to access any tables from SQL, I think that you should get a good stack trace. Can you try that?

sudo gdb -p $(pgrep -x mysqld)

Then, in gdb, execute the following commands:

handle SIGUSR1 nostop noprint
handle SIGUSR2 nostop noprint
handle SIGWAITING nostop noprint
handle SIGLWP nostop noprint
handle SIGPIPE nostop
handle SIGALRM nostop
handle SIGHUP nostop
handle SIGTERM nostop noprint
handle SIG32 nostop noprint
handle SIG33 nostop noprint
continue

Once you have executed the problematic SQL, you should get a SIGSEGV or SIGABRT in GDB. At that point, execute the command backtrace, and finally quit.

The SIGSEGV may occur because of some other inconsistency than the transaction ID. It is also possible that the crash has been fixed in MDEV-13542. If you back up the database, and delete the ib_logfile* from the copy (this is equivalent to using innodb_force_recovery=6), you should be able to start up the upcoming 10.6.9 server on that, and perhaps the SELECT would not crash. Without knowing the exact crash stack trace, I can only make guesses.

Comment by Kirill Gromov [ 2022-07-29 ]

Ok, Marko! I did all that said

But I understand nothing from this:

Results of gdb

[New LWP 21425]
[New LWP 21426]
[New LWP 21427]
[New LWP 24160]
[New LWP 24161]
[New LWP 26044]
[New LWP 5740]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f285367f8bd in poll () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) handle SIGUSR1 nostop noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) handle SIGUSR2 nostop noprint
Signal Stop Print Pass to program Description
SIGUSR2 No No Yes User defined signal 2
(gdb) handle SIGWAITING nostop noprint
Signal Stop Print Pass to program Description
SIGWAITING No No Yes Process's LWPs are blocked
(gdb) handle SIGLWP nostop noprint
Signal Stop Print Pass to program Description
SIGLWP No No Yes Signal LWP
(gdb) handle SIGPIPE nostop
Signal Stop Print Pass to program Description
SIGPIPE No Yes Yes Broken pipe
(gdb) handle SIGALRM nostop
Signal Stop Print Pass to program Description
SIGALRM No No Yes Alarm clock
(gdb) handle SIGHUP nostop
Signal Stop Print Pass to program Description
SIGHUP No Yes Yes Hangup
(gdb) handle SIGTERM nostop noprint
Signal Stop Print Pass to program Description
SIGTERM No No Yes Terminated
(gdb) handle SIG32 nostop noprint
Signal Stop Print Pass to program Description
SIG32 No No Yes Real-time event 32
(gdb) handle SIG33 nostop noprint
Signal Stop Print Pass to program Description
SIG33 No No Yes Real-time event 33
(gdb) continue
Continuing.

Thread 27 "mysqld" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f27e957e700 (LWP 21427)]
0x00007f28541d2b40 in pthread_mutex_lock ()
from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) backtrace
#0 0x00007f28541d2b40 in pthread_mutex_lock ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00005650bec62726 in ?? ()
#2 0x00005650becf93a6 in ?? ()
#3 0x00005650becfb0cc in ?? ()
#4 0x00005650bed153a0 in ?? ()
#5 0x00005650bece6eac in ?? ()
#6 0x00005650becd68cd in ?? ()
#7 0x00005650bebe9196 in ?? ()
#8 0x00005650be912f0f in handler::ha_rnd_next(unsigned char*) ()
#9 0x00005650bea3f3b2 in rr_sequential(READ_RECORD*) ()
#10 0x00005650be750e37 in sub_select(JOIN*, st_join_table*, bool) ()
#11 0x00005650be772f21 in JOIN::exec_inner() ()
#12 0x00005650be7732b3 in JOIN::exec() ()
#13 0x00005650be7714ae in mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) ()
#14 0x00005650be771e26 in handle_select(THD*, LEX*, select_result*, unsigned long) ()
#15 0x00005650be6177ca in ?? ()
#16 0x00005650be715411 in mysql_execute_command(THD*) ()
#17 0x00005650be71bbca in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) ()
#18 0x00005650be71de99 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) ()
#19 0x00005650be71f4c4 in do_command(THD*) ()
#20 0x00005650be801a52 in do_handle_one_connection(CONNECT*) ()
#21 0x00005650be801b0d in handle_one_connection ()
#22 0x00007f28541d04a4 in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#23 0x00007f2853688d0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Comment by Marko Mäkelä [ 2022-07-29 ]

Kirill Gromov, thank you, that is better, but you still need to install the debug symbol package for this version of the server, so that we can see the names of all functions, as well as the function parameter names and values and source code line numbers.

The output would seem to indicate that the segmentation violation occurred inside the thread library function pthread_mutex_lock(), possibly because it was invoked on an invalid pointer. I have no idea what could cause that, without seeing the full call stack.

Comment by Kirill Gromov [ 2022-08-01 ]

Hello, Marko! Thank you for helping with my "headache"
My brain rested enough over the weekend, so this time I installed debug symbol package on the first try

Here the results of debug session:

Thread 19 "mysqld" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fd6c8203700 (LWP 21872)]
0x00007fd6ddf26b40 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) backtrace
#0 0x00007fd6ddf26b40 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00005571b0f7d726 in OSMutex::enter (this=0x10) at ./storage/innobase/include/sync0types.h:440
#2 os_event::reset (this=0x0) at ./storage/innobase/os/os0event.cc:73
#3 os_event_reset (event=0x0) at ./storage/innobase/os/os0event.cc:467
#4 0x00005571b10143a6 in sync_array_reserve_cell (arr=arr@entry=0x5571b38ef9e0, object=object@entry=0x5571b24b5c00 <purge_sys+64>, type=type@entry=4, file=file@entry=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=line@entry=2300) at ./storage/innobase/sync/sync0arr.cc:385
#5 0x00005571b10160cc in sync_array_get_and_reserve_cell (cell=0x7fd6c81fe948, line=2300, file=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", type=4, object=0x5571b24b5c00 <purge_sys+64>) at ./storage/innobase/include/sync0arr.inl:75
#6 rw_lock_s_lock_spin (lock=lock@entry=0x5571b24b5c00 <purge_sys+64>, pass=pass@entry=0, file_name=file_name@entry=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=line@entry=2300) at ./storage/innobase/sync/sync0rw.cc:333
#7 0x00005571b10303a0 in rw_lock_s_lock_func (lock=<optimized out>, pass=0, file_name=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=2300) at ./storage/innobase/include/sync0rw.inl:285
#8 pfs_rw_lock_s_lock_func (lock=<optimized out>, pass=0, file_name=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=2300) at ./storage/innobase/include/sync0rw.inl:639
#9 trx_undo_get_undo_rec (undo_rec=<synthetic pointer>, name=..., trx_id=<optimized out>, heap=0x7fd64cbe3260, roll_ptr=20266208632179229) at ./storage/innobase/trx/trx0rec.cc:2300
#10 trx_undo_prev_version_build (index_rec=index_rec@entry=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", index_mtr=index_mtr@entry=0x7fd6c81ff8e0, rec=rec@entry=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", index=index@entry=0x7fd6600875f8, offsets=0x7fd6c81feca0, heap=heap@entry=0x7fd64cbe3260, old_vers=0x7fd6c81fead8, v_heap=0x0, vrow=0x0, v_status=0) at ./storage/innobase/trx/trx0rec.cc:2390
#11 0x00005571b1001eac in row_vers_build_for_consistent_read (rec=rec@entry=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", mtr=mtr@entry=0x7fd6c81ff8e0, index=index@entry=0x7fd6600875f8, offsets=offsets@entry=0x7fd6c81fec28, view=view@entry=0x7fd6c9f80158, offset_heap=offset_heap@entry=0x7fd6c81fec20, in_heap=0x7fd64cb5c080, old_vers=0x7fd6c81fef00, vrow=0x0) at ./storage/innobase/row/row0vers.cc:1210
#12 0x00005571b0ff18cd in row_sel_build_prev_vers_for_mysql (mtr=0x7fd6c81ff8e0, vrow=0x0, old_vers=0x7fd6c81fef00, offset_heap=0x7fd6c81fec20, 6c81fec28, rec=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", prebuilt=0x7fd66007ed78, clust_index=0x7fd6600875f8, read_view=0x7fd6c9f80158) at ./storage/innobase/row/row0sel.cc:3236
#13 row_search_mvcc (buf=buf@entry=0x7fd66007b6b0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7fd66007ed78, match_mode=match_mode@entry=0, direction=direction@entry=1) at ./storage/innobase/row/row0sel.cc:5265
#14 0x00005571b0f04196 in ha_innobase::general_fetch (this=0x7fd660075580, buf=0x7fd66007b6b0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a", direction=1, match_mode=0) at ./storage/innobase/handler/ha_innodb.cc:9658
#15 0x00005571b0c2df0f in handler::ha_rnd_next (this=0x7fd660075580, buf=0x7fd66007b6b0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a") at ./sql/handler.cc:2891
#16 0x00005571b0d5a3b2 in rr_sequential (info=0x7fd64c014d90) at ./sql/records.cc:485
#17 0x00005571b0a6be37 in READ_RECORD::read_record (this=0x7fd64c014d90) at ./sql/records.h:70
#18 sub_select (join=0x7fd64c013020, join_tab=0x7fd64c014cc8, end_of_records=<optimized out>) at ./sql/sql_select.cc:20656
#19 0x00005571b0a8df21 in do_select (procedure=<optimized out>, join=0x7fd64c013020) at ./sql/sql_select.cc:20173
#20 JOIN::exec_inner (this=this@entry=0x7fd64c013020) at ./sql/sql_select.cc:4545
#21 0x00005571b0a8e2b3 in JOIN::exec (this=this@entry=0x7fd64c013020) at ./sql/sql_select.cc:4327
#22 0x00005571b0a8c4ae in mysql_select (thd=thd@entry=0x7fd64c0009a8, tables=0x7fd64c012058, wild_num=0, fields=..., conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=551905659648, result=0x7fd64c012ff8, unit=0x7fd64c004718, select_lex=0x7fd64c0100e8) at ./sql/sql_select.cc:4766
#23 0x00005571b0a8ce26 in handle_select (thd=thd@entry=0x7fd64c0009a8, lex=lex@entry=0x7fd64c004658, result=result@entry=0x7fd64c012ff8, setup_tables_done_option=setup_tables_done_option@entry=0) at ./sql/sql_select.cc:448
#24 0x00005571b09327ca in execute_sqlcom_select (thd=thd@entry=0x7fd64c0009a8, all_tables=0x7fd64c012058) at ./sql/sql_parse.cc:6449
#25 0x00005571b0a30411 in mysql_execute_command (thd=thd@entry=0x7fd64c0009a8) at ./sql/sql_parse.cc:3963
#26 0x00005571b0a36bca in mysql_parse (thd=thd@entry=0x7fd64c0009a8, rawbuf=<optimized out>, length=273, parser_state=parser_state@entry=0x7fd6c8202430, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at ./sql/sql_parse.cc:7995
#27 0x00005571b0a38e99 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd64c0009a8, packet=packet@entry=0x7fd64c007a7d "\032\006MAR023", packet_length=packet_length@entry=273, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at ./sql/sql_parse.cc:1858
#28 0x00005571b0a3a4c4 in do_command (thd=0x7fd64c0009a8) at ./sql/sql_parse.cc:1379
#29 0x00005571b0b1ca52 in do_handle_one_connection (connect=connect@entry=0x5571b560bd38) at ./sql/sql_connect.cc:1420
#30 0x00005571b0b1cb0d in handle_one_connection (arg=0x5571b560bd38) at ./sql/sql_connect.cc:1316
#31 0x00007fd6ddf244a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#32 0x00007fd6dd3dcd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Comment by Kirill Gromov [ 2022-08-03 ]

Hello, @Marko Mäkelä @marko

What do think about my debug log? Do I have a chance to recover my DB?

Comment by Kirill Gromov [ 2022-08-03 ]

Here the results of debug session:

Thread 19 "mysqld" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fd6c8203700 (LWP 21872)]
0x00007fd6ddf26b40 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) backtrace
#0 0x00007fd6ddf26b40 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00005571b0f7d726 in OSMutex::enter (this=0x10) at ./storage/innobase/include/sync0types.h:440
#2 os_event::reset (this=0x0) at ./storage/innobase/os/os0event.cc:73
#3 os_event_reset (event=0x0) at ./storage/innobase/os/os0event.cc:467
#4 0x00005571b10143a6 in sync_array_reserve_cell (arr=arr@entry=0x5571b38ef9e0, object=object@entry=0x5571b24b5c00 <purge_sys+64>, type=type@entry=4, file=file@entry=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=line@entry=2300) at ./storage/innobase/sync/sync0arr.cc:385
#5 0x00005571b10160cc in sync_array_get_and_reserve_cell (cell=0x7fd6c81fe948, line=2300, file=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", type=4, object=0x5571b24b5c00 <purge_sys+64>) at ./storage/innobase/include/sync0arr.inl:75
#6 rw_lock_s_lock_spin (lock=lock@entry=0x5571b24b5c00 <purge_sys+64>, pass=pass@entry=0, file_name=file_name@entry=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=line@entry=2300) at ./storage/innobase/sync/sync0rw.cc:333
#7 0x00005571b10303a0 in rw_lock_s_lock_func (lock=<optimized out>, pass=0, file_name=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=2300) at ./storage/innobase/include/sync0rw.inl:285
#8 pfs_rw_lock_s_lock_func (lock=<optimized out>, pass=0, file_name=0x5571b1454f60 "/home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/trx/trx0rec.cc", line=2300) at ./storage/innobase/include/sync0rw.inl:639
#9 trx_undo_get_undo_rec (undo_rec=<synthetic pointer>, name=..., trx_id=<optimized out>, heap=0x7fd64cbe3260, roll_ptr=20266208632179229) at ./storage/innobase/trx/trx0rec.cc:2300
#10 trx_undo_prev_version_build (index_rec=index_rec@entry=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", index_mtr=index_mtr@entry=0x7fd6c81ff8e0, rec=rec@entry=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", index=index@entry=0x7fd6600875f8, offsets=0x7fd6c81feca0, heap=heap@entry=0x7fd64cbe3260, old_vers=0x7fd6c81fead8, v_heap=0x0, vrow=0x0, v_status=0) at ./storage/innobase/trx/trx0rec.cc:2390
#11 0x00005571b1001eac in row_vers_build_for_consistent_read (rec=rec@entry=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", mtr=mtr@entry=0x7fd6c81ff8e0, index=index@entry=0x7fd6600875f8, offsets=offsets@entry=0x7fd6c81fec28, view=view@entry=0x7fd6c9f80158, offset_heap=offset_heap@entry=0x7fd6c81fec20, in_heap=0x7fd64cb5c080, old_vers=0x7fd6c81fef00, vrow=0x0) at ./storage/innobase/row/row0vers.cc:1210
#12 0x00005571b0ff18cd in row_sel_build_prev_vers_for_mysql (mtr=0x7fd6c81ff8e0, vrow=0x0, old_vers=0x7fd6c81fef00, offset_heap=0x7fd6c81fec20, 6c81fec28, rec=0x7fd6926c5362 "\224\032\f\311е\214\254\323nEG\214\317\344e", prebuilt=0x7fd66007ed78, clust_index=0x7fd6600875f8, read_view=0x7fd6c9f80158) at ./storage/innobase/row/row0sel.cc:3236
#13 row_search_mvcc (buf=buf@entry=0x7fd66007b6b0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7fd66007ed78, match_mode=match_mode@entry=0, direction=direction@entry=1) at ./storage/innobase/row/row0sel.cc:5265
#14 0x00005571b0f04196 in ha_innobase::general_fetch (this=0x7fd660075580, buf=0x7fd66007b6b0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a", direction=1, match_mode=0) at ./storage/innobase/handler/ha_innodb.cc:9658
#15 0x00005571b0c2df0f in handler::ha_rnd_next (this=0x7fd660075580, buf=0x7fd66007b6b0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a") at ./sql/handler.cc:2891
#16 0x00005571b0d5a3b2 in rr_sequential (info=0x7fd64c014d90) at ./sql/records.cc:485
#17 0x00005571b0a6be37 in READ_RECORD::read_record (this=0x7fd64c014d90) at ./sql/records.h:70
#18 sub_select (join=0x7fd64c013020, join_tab=0x7fd64c014cc8, end_of_records=<optimized out>) at ./sql/sql_select.cc:20656
#19 0x00005571b0a8df21 in do_select (procedure=<optimized out>, join=0x7fd64c013020) at ./sql/sql_select.cc:20173
#20 JOIN::exec_inner (this=this@entry=0x7fd64c013020) at ./sql/sql_select.cc:4545
#21 0x00005571b0a8e2b3 in JOIN::exec (this=this@entry=0x7fd64c013020) at ./sql/sql_select.cc:4327
#22 0x00005571b0a8c4ae in mysql_select (thd=thd@entry=0x7fd64c0009a8, tables=0x7fd64c012058, wild_num=0, fields=..., conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=551905659648, result=0x7fd64c012ff8, unit=0x7fd64c004718, select_lex=0x7fd64c0100e8) at ./sql/sql_select.cc:4766
#23 0x00005571b0a8ce26 in handle_select (thd=thd@entry=0x7fd64c0009a8, lex=lex@entry=0x7fd64c004658, result=result@entry=0x7fd64c012ff8, setup_tables_done_option=setup_tables_done_option@entry=0) at ./sql/sql_select.cc:448
#24 0x00005571b09327ca in execute_sqlcom_select (thd=thd@entry=0x7fd64c0009a8, all_tables=0x7fd64c012058) at ./sql/sql_parse.cc:6449
#25 0x00005571b0a30411 in mysql_execute_command (thd=thd@entry=0x7fd64c0009a8) at ./sql/sql_parse.cc:3963
#26 0x00005571b0a36bca in mysql_parse (thd=thd@entry=0x7fd64c0009a8, rawbuf=<optimized out>, length=273, parser_state=parser_state@entry=0x7fd6c8202430, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at ./sql/sql_parse.cc:7995
#27 0x00005571b0a38e99 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fd64c0009a8, packet=packet@entry=0x7fd64c007a7d "\032\006MAR023", packet_length=packet_length@entry=273, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at ./sql/sql_parse.cc:1858
#28 0x00005571b0a3a4c4 in do_command (thd=0x7fd64c0009a8) at ./sql/sql_parse.cc:1379
#29 0x00005571b0b1ca52 in do_handle_one_connection (connect=connect@entry=0x5571b560bd38) at ./sql/sql_connect.cc:1420
#30 0x00005571b0b1cb0d in handle_one_connection (arg=0x5571b560bd38) at ./sql/sql_connect.cc:1316
#31 0x00007fd6ddf244a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#32 0x00007fd6dd3dcd0f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Comment by Marko Mäkelä [ 2022-08-03 ]

Kirill Gromov, thank you for the full stack trace.

MariaDB Server 10.6 replaced InnoDB’s earlier homebrew synchronization primitives with plain mutex, condition variable, and a homebrew shared/update/exclusive lock (MDEV-24142).

The sync_array is something that I removed in MDEV-21452. InnoDB used to have some extra bookkeeping of all mutex or rw-lock waits, so that it could implement a watchdog against hangs (the infamous long semaphore wait messages). After MDEV-21452, there only is a watchdog for one specific mutex or rw-lock, dict_sys.mutex which was later merged to dict_sys.latch.

The crash could theoretically occur due to some bug in the sync_array. I am not keen to debug that, especially since we know that the database is corrupted in the first place.

The stack trace also tells me that this crash occurs while constructing an earlier version of a record. With the following commands, you should know the name of the table, which I suppose you may already know:

frame 13
print prebuilt.table.name
print prebuilt.trx.mysql_thd.query_string

I previously suggested that the DB_TRX_ID (identifying the last transaction that modified the record) and DB_ROLL_PTR (pointing to an undo log record of the transaction identified by DB_TRX_ID if it was not committed and purged yet) in the index page may be inconsistent with the undo log page, because one of those pages is missing some updates, for example, if crash recovery was unable to apply redo log because of some corruption of the log.

Your best bet to extract some data from the database (or this table) would be to execute the following before trying to SELECT any data:

SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

If that fails, you could try to start up the database with innodb_force_recovery=5 so that all undo logs will be ignored. The end result is the same: all transactions will be executed at the READ UNCOMMITTED isolation level.

Once you have rescued all data (and verified its consistency), you should discard the database and create a new one from the SQL dump.

Comment by Kirill Gromov [ 2022-08-03 ]

@marko Marko Mäkelä, thank you for your time.

I agree ofcourse that debugging an old version is wasting of time.

So I tried all of you wrote to do. Here the results:

(gdb) frame 13
#13 row_search_mvcc (buf=buf@entry=0x7f546459fef0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f54645a2df8, match_mode=match_mode@entry=0, direction=direction@entry=1) at ./storage/innobase/row/row0sel.cc:5265
5265    ./storage/innobase/row/row0sel.cc: Нет такого файла или каталога.
(gdb) print prebuilt.table.name
$1 = {m_name = 0x7f546459ab50 "mydb/host", static part_suffix = "#P#"}
(gdb) print prebuilt.trx.mysql_thd.query_string
$2 = {string = {str = 0x7f5464011ef0 "SELECT /*!40001 SQL_NO_CACHE */ `id`, `ctx`, `hostname`, `fqdns`, `asset`, `threshold_c`, `threshold_a`, `alert`, `persistence`, `nat`, `rrd_profile`, `descr`, `lat`, `lon`, `icon`, `country`, `extern"..., length = 273}, cs = 0x557f894fb220 <my_charset_utf8mb4_general_ci>}

And I can't do SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; because mysqld didn't start in any mode except in innodb_force_recovery=6

So I have a last question

Why mysqldump crashes on mydb.host with simple query (look at $2 in text before) and if I do a simple query from mysql

SELECT * FROM mydb.host

all works without crash and return all of 218 rows of a table?

Maybe something wrong with used CharSet?

Anyway, please tell me how can I dump my tables without using of mysqldump or maybe I can run it with some magic params to avoid crashing.

Comment by Marko Mäkelä [ 2022-08-03 ]

Kirill Gromov, innodb_force_recovery=6 is otherwise the same as innodb_force_recovery=5, but it will additionally ignore the ib_logfile*, which means that whatever changes had not been written back to data pages (only to the log) will be ignored. You could have most pages corresponding to some logical time before the server had been killed, but some pages could be older or newer (and thus you could get broken next-page links or such).

So, if the database starts up with innodb_force_recovery=5, I would suggest using that. Already that setting will hard-wire the isolation level to READ UNCOMMITTED.

The GDB output does not show the entire query. It might be that the query plan involves using a secondary index. In InnoDB secondary index pages, if a record is delete-marked, or the PAGE_MAX_TRX_ID is ‘too new’ for the read view, then the corresponding record in the clustered index (primary key + data) will be looked up. The clustered index page contains only the latest version of the record, and the DB_ROLL_PTR forms a singly-linked list into undo pages. The secondary index page will contain all (secondary_key,primary_key) values for every not-yet-purged version of the primary_key record. So, the secondary index access may involve finding out whether the secondary index record actually belongs to this read view.

The intended purpose of innodb_force_recovery is to allow the database to start up, so that some data can be rescued with mysqldump or similar tools.

A query like SELECT * FROM mydb.host should always be executed as a full table scan: search the path from the clustered index root page to the leftmost leaf page, and then follow the FIL_PAGE_NEXT links until the end of the leaf page level. A similar query plan should also be used if you add LIMIT or OFFSET to the query.

If SELECT * FROM mydb.host crashed after dumping some record (let me assume PRIMARY KEY(id) and id=12345), you could try to dump the rest by adding a condition like WHERE id>12345 or possibly a larger value. This could allow InnoDB to navigate to the next uncorrupted leaf page.

Based on what you wrote, it could be that the secondary index MVCC ‘disobeys’ the innodb_force_recovery=5 setting and tries to access the undo logs even though we did not discover them during startup. This could be the reason of the crash. I tried the following test case on 10.6, but it did not crash that easily:

--source include/have_innodb.inc
create table t1(a int primary key, b int unique)engine=innodb;
insert into t1 values(1,2);
# prevent the purge of history, so that the secondary index will contain 2 record versions after the UPDATE below
--let $restart_parameters=--innodb-force-recovery=2
--source include/restart_mysqld.inc
update t1 set b=3;
--let $restart_parameters=--innodb-force-recovery=5
--source include/restart_mysqld.inc
select b from t1 force index(b);

An InnoDB code fix against that crash would be to refuse to use any secondary index when innodb_force_recovery=5 or 6 has been specified and the undo logs have not been initialized. With such a fix, the force index(b) in my above test should be refused.

Comment by Kirill Gromov [ 2022-08-03 ]

Sorry, @marko Marko but I'm a little confused by your answer.

I wrote that mysqld didn`t start in mode 5 but only in mode 6
And another thing that I can't understand is: mysqldump crashes on near row 152, but when I do SELECT - everything is OK, and I got all of 218 rows

Now I started mysqldump with key --ignore-table=mydb.host and it still work, and dumping all my databases.

It seems like my adventure going to the finish

Also I have successfully done SELECT into OUTFILE FROM mydb.host and only thing that worry me is how to recreate the structure of table host in new database. Please point me to the page of documentation, where I can learn about it

After that I thing this topic can be closed!

Comment by Marko Mäkelä [ 2023-02-17 ]

It should be possible to view the schema of a .frm file. One way would be to copy the file to the data directory of a working server and then execute SHOW CREATE TABLE for that name.

The sync_array that seems to be the immediate reason of this crash was removed in MDEV-21452. MariaDB Server 10.6 or later should not crash on many forms of corruption, thanks to MDEV-13542. It would be interesting to know if a recent MariaDB 10.6 would crash on your corrupted data directory, but I understand that you may already have deleted and rebuilt it.

Generated at Thu Feb 08 10:06:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.