[MDEV-19408] Assertion `trx->state == TRX_STATE_ACTIVE || trx->state == TRX_STATE_PREPARED' failed in ReadView::copy_trx_ids, innodb.xa_debug, innodb.innodb fail in buildbot Created: 2019-05-07  Updated: 2019-05-08  Resolved: 2019-05-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.2.24
Fix Version/s: 10.2.24

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-15772 Potential list overrun during XA reco... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-asan/builds/1149

10.2 ce195987c

2019-05-03 16:15:41 140540140169472 [Note] InnoDB: Transaction contains changes to 1 rows
2019-05-03 16:15:41 140540140169472 [Note] InnoDB: 8 transactions in prepared state after recovery
2019-05-03 16:15:41 140540140169472 [Note] Found 8 prepared transaction(s) in InnoDB
mysqld: /home/buildbot/buildbot/build/mariadb-10.2.24/storage/innobase/read/read0read.cc:433: void ReadView::copy_trx_ids(const trx_ids_t&): Assertion `trx->state == TRX_STATE_ACTIVE || trx->state == TRX_STATE_PREPARED' failed.
190503 16:15:41 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.24-MariaDB-debug-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=5
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63102 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x62a00002a208
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fd1f866ebe0 thread_stack 0x5b000
/usr/lib/x86_64-linux-gnu/libasan.so.2(+0x4a077)[0x7fd20c173077]
/home/buildbot/buildbot/build/mariadb-10.2.24/sql/mysqld(my_print_stacktrace+0xab)[0x56315a4b4133]
mysys/stacktrace.c:267(my_print_stacktrace)[0x5631594af3dc]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd20a9ed390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7fd209da7428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7fd209da902a]
/lib/x86_64-linux-gnu/libc.so.6(+0x2dbd7)[0x7fd209d9fbd7]
/lib/x86_64-linux-gnu/libc.so.6(+0x2dc82)[0x7fd209d9fc82]
read/read0read.cc:428(ReadView::copy_trx_ids(std::vector<unsigned long, ut_allocator<unsigned long, true> > const&))[0x563159df9c99]
read/read0read.cc:454(ReadView::prepare(unsigned long))[0x563159df9ed9]
read/read0read.cc:695(MVCC::clone_oldest_view(ReadView*))[0x563159df8b80]
trx/trx0purge.cc:1695(trx_purge(unsigned long, unsigned long, bool))[0x563159fa0197]
srv/srv0srv.cc:2708(srv_do_purge(unsigned long*))[0x563159f4be90]
srv/srv0srv.cc:2850(srv_purge_coordinator_thread)[0x563159f4cc19]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fd20a9e36ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fd209e7882d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): 
Connection ID (thread ID): 2
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /dev/shm/var/4/mysqld.1/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    0                    bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             23715                23715                processes 
Max open files            1024                 1024                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       23715                23715                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        



 Comments   
Comment by Elena Stepanova [ 2019-05-07 ]

Also innodb.innodb:
http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/17062

10.2 b30fb701

mysqltest_embedded: /home/buildbot/buildbot/build/mariadb-10.2.24/storage/innobase/read/read0read.cc:433: void ReadView::copy_trx_ids(const trx_ids_t&): Assertion `trx->state == TRX_STATE_ACTIVE || trx->state == TRX_STATE_PREPARED' failed. mysqltest got signal 6 read_command_buf (0x83310958): COMM conn->name (0x835a0bf0): conn->cur_query (0x837d84b0): COMMIT Attempting backtrace... stack_bottom = 0x0 thread_stack 0x49000 /mnt/buildbot/build/mariadb-10.2.24/libmysqld/examples/mysqltest_embedded(my_print_stacktrace+0x33)[0x80355816] mysys/stacktrace.c:267(my_print_stacktrace)[0x8033851d] client/mysqltest.cc:9054(dump_backtrace())[0x8033855b] addr2line: '': No such file [0xb7753c14] [0xb7753c31] /lib/i386-linux-gnu/libc.so.6(gsignal+0x39)[0xb70bfe89] /lib/i386-linux-gnu/libc.so.6(abort+0x157)[0xb70c13e7] /lib/i386-linux-gnu/libc.so.6(+0x24d07)[0xb70b8d07] /lib/i386-linux-gnu/libc.so.6(+0x24d8b)[0xb70b8d8b] /mnt/buildbot/build/mariadb-10.2.24/libmysqld/examples/mysqltest_embedded(+0xc16d98)[0x80ca3d98] read/read0read.cc:428(ReadView::copy_trx_ids(std::vector<unsigned long long, ut_allocator<unsigned long long, true> > const&))[0x80ca3e86] read/read0read.cc:454(ReadView::prepare(unsigned long long))[0x80ca3467] read/read0read.cc:693(MVCC::clone_oldest_view(ReadView*))[0x8099f2ce] trx/trx0purge.cc:1694(trx_purge(unsigned int, unsigned int, bool))[0x80bcb1c4] srv/srv0srv.cc:2688(srv_do_purge(unsigned int*))[0x80bcb6ef] /lib/i386-linux-gnu/libpthread.so.0(+0x62b5)[0xb772a2b5] /lib/i386-linux-gnu/libc.so.6(clone+0x6e)[0xb717b16e]

Comment by Marko Mäkelä [ 2019-05-08 ]

MDEV-15772 introduced the state TRX_STATE_PREPARED_RECOVERED, which the debug check fails to tolerate. There are not many tests for the recovery of XA PREPARE transactions.

This bug does not affect non-debug builds.

The innodb.innodb crash ought to be due to a slightly different reason. It is on an embedded server, so there should not be any server restarts. None of the preceding tests on the same worker instance were using XA transactions. But, the debug check was evaluating trx->state twice. Maybe the state changed exactly between those two reads? Also, while the check is executing, we are only holding trx_sys->mutex, but lock_trx_release_locks() is not holding it while changing a transaction to TRX_STATE_COMMITTED_IN_MEMORY. Maybe we should only disallow the state TRX_STATE_NOT_STARTED in the debug check.

Comment by Marko Mäkelä [ 2019-05-08 ]

The innodb.xa_debug failure affects MariaDB 10.2.24 only (and only in the case if this fix does fix does not make it into the 10.2.24 release). The corresponding check in rw_trx_hash_t::validate_element() in MariaDB 10.3 is correct, and no such check existed before 10.2.2.

The embedded failure of innodb.innodb should affect all 10.2 versions since 10.2.2 if my analysis is correct. That failure should occur in any test that involves multiple client connections and rollback.

Generated at Thu Feb 08 08:51:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.