[MDEV-25163] Rowid filter does not process storage engine error correctly. - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5.8, 10.6.10, 10.6.12, 10.5.21, 10.5.22
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3
Component/s: Storage Engine - InnoDB
Labels:
None

Description

We had this crash on production database, but we cant see reason for this.
It was not possible to restart the server again and got error message (see picture below).

We managed to run it again by setting innodb_force_recovery=1. The server is running fine, but not knowing what caused this makes me worry

Mar 15 15:59:54 mysql02-server mariadbd: 2021-03-15 15:59:54 3535477 [Warning] Sort aborted, host: web06, user: dbuser, thread: 3535477,

query: SELECT * FROM `******` WHERE id = 123 ORDER BY id ASC, id ASC LIMIT 1 FOR UPDATE

Mar 15 15:59:54 mysql02-server mariadbd: 2021-03-15 15:59:54 0x7fc49e404700

 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.8/storage/innobase/trx/trx0trx.cc line 1336

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: We intentionally generate a memory trap.

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: If you get repeated assertion failures or crashes, even

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: immediately after the mysqld startup, there may be

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: corruption in the InnoDB tablespace. Please refer to

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/

Mar 15 15:59:54 mysql02-server mariadbd: InnoDB: about forcing recovery.

Mar 15 15:59:54 mysql02-server mariadbd: 210315 15:59:54 [ERROR] mysqld got signal 6 ;

Mar 15 15:59:54 mysql02-server mariadbd: This could be because you hit a bug. It is also possible that this binary

Mar 15 15:59:54 mysql02-server mariadbd: or one of the libraries it was linked against is corrupt, improperly built,

Mar 15 15:59:54 mysql02-server mariadbd: or misconfigured. This error can also be caused by malfunctioning hardware.

Mar 15 15:59:54 mysql02-server mariadbd: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Mar 15 15:59:54 mysql02-server mariadbd: We will try our best to scrape up some info that will hopefully help

Mar 15 15:59:54 mysql02-server mariadbd: diagnose the problem, but since we have already crashed,

Mar 15 15:59:54 mysql02-server mariadbd: something is definitely wrong and this may fail.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

gdb_output.txt
135 kB
2021-06-18 06:24
image-2021-03-16-13-47-55-914.png
278 kB
2021-03-16 13:47
mariadb_log.txt
20 kB
2021-06-18 06:26
screenshot-1.png
6 kB
2021-06-17 13:48

Issue Links

blocks

MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0 causing disruption and replication failure

Closed

duplicates

MDEV-24322 Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_P REFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329

Closed

is duplicated by

MDEV-29478 Database Crashed with no reason

Closed

MDEV-29536 InnoDB: Assertion failure

Closed

MDEV-31780 InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\trx\trx0trx.cc line 1252

Closed

relates to

MDEV-25296 Failing assertion: UT_LIST_GET_LEN(trx->lock.trx_locks) == 0 sporadically on SELECT with certain tables

Closed

MDEV-25594 Crash in deadlock checker under high load

Closed

MDEV-27031 InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\trx\trx0trx.cc line 1288

Closed

MDEV-31941 Assertion failures upon using OQGraph table under lock

Confirmed

MDEV-32201 join_init_read_record() wrongly issues HA_ERR_OUT_OF_MEM

Confirmed

MDEV-32448 MariaDB 10.6.11 - InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/trx/trx0trx.cc line 1270

Closed

(6 relates to)

Activity

Ascending order - Click to sort in descending order

View 35 older comments

Sergei Petrunia added a comment - 2023-09-19 10:07

Looking at the patch...

This part:

+        /* It's supposed that thd error state will be set(with

+        file->print_error(res, MYF(0)) call) inside of quick->get_next() call,

+        that's why there is no need to do this here. See

+        Mrr_simple_index_reader::get_next() code for example. */

This is true... But then the code in rr_quick is [sometimes] redundant ...
Debugging a query:

select * from t1 where a between 10 and 20 for update;

  Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0)

(gdb) wher

  #0  my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l

  #1  0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125

  #2  0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093

  #3  0x0000555555f828e4 in Mrr_simple_index_reader::get_next (this=0x7fff70038030, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/m

  #4  0x0000555555f85734 in DsMrr_impl::dsmrr_next (this=0x7fff70037ee0, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/multi_range_

  #5  0x00005555564db79a in ha_innobase::multi_range_read_next (this=0x7fff70037a48, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/stor

  #6  0x000055555627e322 in QUICK_RANGE_SELECT::get_next (this=0x7fff70082f90) at /home/psergey/dev-git2/10.4-look10/sql/opt_range.cc:12345

  #7  0x00005555562920e0 in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:369

  #8  0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70

  #9  0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_

  #10 0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425

  #11 0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605

  #12 0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387

Then, again:

  Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0)

(gdb) wher

  #0  my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l

  #1  0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125

  #2  0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093

  #3  0x000055555629209d in rr_handle_error (info=0x7fff70019250, error=146) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:356

  #4  0x00005555562920ff in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:371

  #5  0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70

  #6  0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_

  #7  0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425

  #8  0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605

  #9  0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387

So, the error is reported twice... but this is outside of the scope of this patch...

Sergei Petrunia added a comment - 2023-09-19 10:07 Looking at the patch... This part: + /* It's supposed that thd error state will be set(with + file->print_error(res, MYF(0)) call) inside of quick->get_next() call, + that's why there is no need to do this here. See + Mrr_simple_index_reader::get_next() code for example. */ This is true... But then the code in rr_quick is [sometimes] redundant ... Debugging a query: select * from t1 where a between 10 and 20 for update; Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) (gdb) wher #0 my_message_sql (error=1205, str=0x7fffcaffbf70 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l #1 0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125 #2 0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093 #3 0x0000555555f828e4 in Mrr_simple_index_reader::get_next (this=0x7fff70038030, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/m #4 0x0000555555f85734 in DsMrr_impl::dsmrr_next (this=0x7fff70037ee0, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/sql/multi_range_ #5 0x00005555564db79a in ha_innobase::multi_range_read_next (this=0x7fff70037a48, range_info=0x7fffcaffd1a8) at /home/psergey/dev-git2/10.4-look10/stor #6 0x000055555627e322 in QUICK_RANGE_SELECT::get_next (this=0x7fff70082f90) at /home/psergey/dev-git2/10.4-look10/sql/opt_range.cc:12345 #7 0x00005555562920e0 in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:369 #8 0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70 #9 0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_ #10 0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425 #11 0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605 #12 0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387 Then, again: Thread 32 "mysqld" hit Breakpoint 5, my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) (gdb) wher #0 my_message_sql (error=1205, str=0x7fffcaffc040 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /home/psergey/dev-git2/10.4-l #1 0x00005555569af452 in my_error (nr=1205, MyFlags=0) at /home/psergey/dev-git2/10.4-look10/mysys/my_error.c:125 #2 0x00005555560e9156 in handler::print_error (this=0x7fff70037a48, error=146, errflag=0) at /home/psergey/dev-git2/10.4-look10/sql/handler.cc:4093 #3 0x000055555629209d in rr_handle_error (info=0x7fff70019250, error=146) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:356 #4 0x00005555562920ff in rr_quick (info=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.cc:371 #5 0x0000555555ce5eb9 in READ_RECORD::read_record (this=0x7fff70019250) at /home/psergey/dev-git2/10.4-look10/sql/records.h:70 #6 0x0000555555e1b878 in sub_select (join=0x7fff70017288, join_tab=0x7fff70019188, end_of_records=false) at /home/psergey/dev-git2/10.4-look10/sql/sql_ #7 0x0000555555e1aace in do_select (join=0x7fff70017288, procedure=0x0) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:20425 #8 0x0000555555deee1f in JOIN::exec_inner (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4605 #9 0x0000555555dedf36 in JOIN::exec (this=0x7fff70017288) at /home/psergey/dev-git2/10.4-look10/sql/sql_select.cc:4387 So, the error is reported twice... but this is outside of the scope of this patch...

Sergei Petrunia added a comment - 2023-09-19 10:14

+        /* It's supposed that thd error state will be set(with

+        file->print_error(res, MYF(0)) call) inside of quick->get_next() call,

+        that's why there is no need to do this here. See

+        Mrr_simple_index_reader::get_next() code for example. */

It's not "will be set" it is "has been set" at this point. get_next() call is dozen lines above. Please change the wording.

Please also add

  DBUG_ASSERT(thd->is_error());

to make sure the error state has been set.

Also this gives a clue how to add test coverage for the changed code in sql_join_cache.cc :

https://gist.github.com/spetrunia/d2e78780184e09842f4299a87531d848

Sergei Petrunia added a comment - 2023-09-19 10:14 + /* It's supposed that thd error state will be set(with + file->print_error(res, MYF(0)) call) inside of quick->get_next() call, + that's why there is no need to do this here. See + Mrr_simple_index_reader::get_next() code for example. */ It's not "will be set" it is "has been set" at this point. get_next() call is dozen lines above. Please change the wording. Please also add DBUG_ASSERT(thd->is_error()); to make sure the error state has been set. Also this gives a clue how to add test coverage for the changed code in sql_join_cache.cc : https://gist.github.com/spetrunia/d2e78780184e09842f4299a87531d848

Sergei Petrunia added a comment - 2023-09-19 19:06

Also please apply this coding style fix: https://gist.github.com/spetrunia/7edad4c40927c1d804aa13ec96aecd60

Sergei Petrunia added a comment - 2023-09-19 19:06 Also please apply this coding style fix: https://gist.github.com/spetrunia/7edad4c40927c1d804aa13ec96aecd60

Sergei Petrunia added a comment - 2023-09-25 12:08

More review input provided on Slack. Ok to push when addressed.

Sergei Petrunia added a comment - 2023-09-25 12:08 More review input provided on Slack. Ok to push when addressed.

Sergei Petrunia added a comment - 2023-11-22 09:30

Summary: a query that's doing a locking read and using Rowid Filter could cause a server crash if it has encountered a Lock Wait Timeout or Deadlock opr a similar error when building the Rowid Filter.

Sergei Petrunia added a comment - 2023-11-22 09:30 Summary: a query that's doing a locking read and using Rowid Filter could cause a server crash if it has encountered a Lock Wait Timeout or Deadlock opr a similar error when building the Rowid Filter.

MariaDB Server

Rowid filter does not process storage engine error correctly.

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration