[MDEV-21098] Crash in rec_get_offsets_func() due to invalid rec_get_status() Created: 2019-11-20  Updated: 2023-04-14  Resolved: 2022-08-01

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Subquery, Storage Engine - InnoDB
Affects Version/s: 10.3.11
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1

Type: Bug Priority: Critical
Reporter: Nigel Gomm Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: None
Environment:

linux


Attachments: Text File crashdump.txt     Text File mariaerror.txt    
Issue Links:
Blocks
blocks MDEV-28349 Provide "crash safe" options for CHEC... Open
is blocked by MDEV-13542 Crashing on a corrupted page is unhel... Closed
Duplicate
is duplicated by MDEV-22736 Assertion failure in file rem0rec.cc ... Closed
is duplicated by MDEV-29451 server crash on complex select (Asser... Closed
is duplicated by MDEV-30316 mariadb server crash Closed

 Description   

where to begin.....

Over the last 24 hours i've had 3 (out of 5 total) mariadb production servers crash with similar errors (see attachment).

Each server has about 30 customer databases (identical structure) that customers connect to from an indentical win32 desktop using ODBC.

INNODB tables.

Not aware of any changes to the servers or the databases or the win32.exe.

On each server there has been, since yesterday, one database where if i run a couple of specific but seemingly innocuous queries the server crashes and restarts.

The error mentions an index.... so in each of the queries i (eventually) removed the indexes used by a subquery. and hey presto no crash. slow but no crash.

I must emphasise that i only removed the indexes from the one database on each server that was mentioned in the crash report. The other databases with the same structure on the same servers continue to work just fine with the same queries and the same indexes . And the database that are causing a crash have been same structure running same queries for months and months.

the databases causing the error are no larger than any of the others (a few thousand records in the deal and invoices tables).

i removed the indexes on invoices,dealid invoices.who invoices.term invoices.invoice deal.dealid

.dealid and .invoice and .term are int 11; .who is varchar 10

the other query that causes a crash is an update and uses invoice.invoice in a subquery

again, to emphasise, this query has been run a thousand times a day on 200 or more databases for more than a year with no problem... until yesterday on 3 databases



 Comments   
Comment by Nigel Gomm [ 2019-11-20 ]

one server also giving this
2019-11-20 10:36:45 84 [Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name. Please run mysql_upgrade
2019-11-20 10:36:45 84 [Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name. Please run mysql_upgrade

Comment by Nigel Gomm [ 2019-11-20 ]

today (but not yesterday) check table is reporting index corruptions.
these were deleted and recreated yesterday on at least one of the tables

Comment by Elena Stepanova [ 2019-11-25 ]

From the log:

2019-11-19 09:13:55 0x7fd739d58700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.11/storage/innobase/rem/rem0rec.cc line 820
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
InnoDB: about forcing recovery.
191119  9:13:55 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.3.11-MariaDB-1:10.3.11+maria~cosmic-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=60
max_threads=502
thread_count=67
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3226530 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fd7242a8df8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fd739d57dd8 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5606afee0c7e]
/usr/sbin/mysqld(handle_fatal_signal+0x505)[0x5606afa05575]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12dd0)[0x7fd797324dd0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fd796e4f077]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fd796e30535]
/usr/sbin/mysqld(+0x4c0aec)[0x5606af742aec]
/usr/sbin/mysqld(+0x4aa3bb)[0x5606af72c3bb]
/usr/sbin/mysqld(+0x9677c2)[0x5606afbe97c2]
/usr/sbin/mysqld(+0xa41ea1)[0x5606afcc3ea1]
/usr/sbin/mysqld(+0x9cc5de)[0x5606afc4e5de]
/usr/sbin/mysqld(+0x9033f1)[0x5606afb853f1]
/usr/sbin/mysqld(_ZN7handler17ha_index_read_mapEPhPKhm16ha_rkey_function+0x178)[0x5606afa0a858]
/usr/sbin/mysqld(+0x5f5058)[0x5606af877058]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x18e)[0x5606af86697e]
/usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0xb02)[0x5606af888be2]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x5606af888e23]
/usr/sbin/mysqld(_ZN30subselect_single_select_engine4execEv+0x145)[0x5606afa9ca75]
/usr/sbin/mysqld(_ZN14Item_subselect4execEv+0x4d)[0x5606afa9c35d]
/usr/sbin/mysqld(_ZN24Item_singlerow_subselect11val_decimalEP10my_decimal+0x36)[0x5606afa9da36]
/usr/sbin/mysqld(_ZN18Item_cache_decimal11cache_valueEv+0x37)[0x5606afa15227]
/usr/sbin/mysqld(_ZN18Item_cache_wrapper7val_strEP6String+0x7f)[0x5606afa2a40f]
/usr/sbin/mysqld(_ZNK12Type_handler13Item_send_strEP4ItemP8ProtocolP8st_value+0x1c)[0x5606af96a99c]
/usr/sbin/mysqld(_ZN8Protocol19send_result_set_rowEP4ListI4ItemE+0xa3)[0x5606af78ba93]
/usr/sbin/mysqld(_ZN11select_send9send_dataER4ListI4ItemE+0x5b)[0x5606af7f019b]
/usr/sbin/mysqld(+0x5f6be5)[0x5606af878be5]
/usr/sbin/mysqld(+0x5da446)[0x5606af85c446]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x1a7)[0x5606af866997]
/usr/sbin/mysqld(+0x5da446)[0x5606af85c446]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x20f)[0x5606af8669ff]
/usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0xb02)[0x5606af888be2]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x5606af888e23]
/usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xef)[0x5606af888f6f]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x14d)[0x5606af88986d]
/usr/sbin/mysqld(+0x5a61fc)[0x5606af8281fc]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x662d)[0x5606af83524d]
/usr/sbin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x15)[0x5606af7a64b5]
/usr/sbin/mysqld(_ZN13sp_lex_keeper23reset_lex_and_exec_coreEP3THDPjbP8sp_instr+0x90)[0x5606af7ad030]
/usr/sbin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x647)[0x5606af7adae7]
/usr/sbin/mysqld(_ZN7sp_head7executeEP3THDb+0x7fc)[0x5606af7a93cc]
/usr/sbin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x973)[0x5606af7aa7e3]
/usr/sbin/mysqld(+0x5a5fe9)[0x5606af827fe9]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x425a)[0x5606af832e7a]
/usr/sbin/mysqld(_ZN18Prepared_statement7executeEP6Stringb+0x41e)[0x5606af84b05e]
/usr/sbin/mysqld(_ZN18Prepared_statement12execute_loopEP6StringbPhS2_+0x9a)[0x5606af84b24a]
/usr/sbin/mysqld(+0x5ca24d)[0x5606af84c24d]
/usr/sbin/mysqld(_Z19mysqld_stmt_executeP3THDPcj+0x25)[0x5606af84c2e5]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1a92)[0x5606af839c02]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x170)[0x5606af83ac80]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x242)[0x5606af90eeb2]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x5606af90f05d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8164)[0x7fd79731a164]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fd796f28def]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fd7247d9dc0): SELECT rental.propref, deal.dealnum AS deal,rental.displayaddress as address,deal.dealid,   (SELECT SUM(TERM) FROM invoices WHERE dealid = deal.dealid AND who != 'Tenant' and term > 0 ) AS paidsofar,deal.TERM,   (SELECT SUM(TERMdays) FROM invoices WHERE dealid = deal.dealid AND who != 'Tenant'  ) AS termdays,   deal.startdate AS duedate,rental.building AS building,deal.account AS invoice,deal.startdate,   deal.dealnames AS tenant,deal.dealid-deal.dealid as offerid,enddate,actualend,deal.FLWONNOTREQ   FROM deal   INNER JOIN rental ON rental.propref = deal.propref   WHERE  deal.cancelled = 0     AND deal.shortlet = 0    AND deal.commrate > 0 and account >= 0
Connection ID (thread ID): 99
Status: NOT_KILLED
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on

Comment by Nigel Gomm [ 2019-12-04 ]

problem went away when i exported each table, dropped them and imported back in.

So some sort of data corruption that corrupted the indexes?

Odd that the same 3 tables on 3 different databases (out of 300 or so identically structured databases) got the same error on the same day.

oh well... customers are now happy so i will stop worrying about it unless it reoccurs.

Comment by Marko Mäkelä [ 2020-02-25 ]

The assertion failed in rec_get_offsets_func(), because rec_get_status() returned an invalid bit combination on a record header. This report is basically duplicating MDEV-13542.

For a while, I was thinking whether MDEV-19916 could explain this, but I do not think so. If you are ever going to execute ALTER TABLE…ADD COLUMN, it would be good to upgrade in any case.

Comment by Marko Mäkelä [ 2022-06-07 ]

In MDEV-13542, many crashes were fixed, but mostly at a higher level, preventing access when some page header fields are incorrect. I did not make rec_get_offsets_func() more robust against crashes yet. So, this bug deserves to remain open.

Comment by Marko Mäkelä [ 2022-06-08 ]

It is not easy to fix this. It may be necessary to implement record header validation in specific places that have are (since MDEV-13542) able to return an error. If we added some error return to rec_get_offsets_func() and checks for it in each caller, the code bloat could cause a performance regression.

Comment by Marko Mäkelä [ 2022-06-09 ]

A possible fix of this could be to introduce some record header validation to the following functions (and make the functions return some special value such as nullptr on corruption, and adjust their callers):

  • btr_search_check_guess()
  • page_rec_get_next_low(), page_rec_get_next_const(), page_rec_get_next()
  • page_dir_slot_get_rec(); also the pointer should be validated to be between the infimum and the PAGE_HEAP_TOP

These measures should guarantee that rec_get_offsets_func() is never invoked on a corrupted record header. For added safety against crashes, the assertion could be changed to one that is only present in debug builds.

Generated at Thu Feb 08 09:04:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.