[MDEV-23967] SEGV in my_atomic_loadlint storage/innobase/include/sync0types.h:1131 Created: 2020-10-15  Updated: 2020-12-15  Resolved: 2020-12-14

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.26
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Duplicate Votes: 0
Labels: rr-profile-analyzed

Issue Links:
Duplicate
is duplicated by MDEV-23693 Failing assertion: my_atomic_load32_e... Closed

 Description   

Thread 3 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1968947.1970639]
0x000055ed4289732f in my_atomic_loadlint (A=0x2d8b3ed5dca0) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/sync0types.h:1131
1131      return ulint(my_atomic_loadlong(A));
(rr) bt
#0  0x000055ed4289732f in my_atomic_loadlint (A=0x2d8b3ed5dca0) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/sync0types.h:1131
#1  MutexDebug<TTASEventMutex<BlockMutexPolicy> >::is_owned (this=0x2d8b3ed5dc68) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/sync0policy.h:200
#2  PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::is_owned (this=0x2d8b3ed5dc58) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/ib0mutex.h:639
#3  buf_flush_or_remove_page (buf_pool=buf_pool@entry=0x61b000001580, bpage=<optimized out>, flush=<optimized out>) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0lru.cc:387
#4  0x000055ed4289adb2 in buf_flush_or_remove_pages (buf_pool=buf_pool@entry=0x61b000001580, id=id@entry=922, observer=observer@entry=0x6040000899d0, first=first@entry=0) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/buf0flu.h:355
#5  0x000055ed4289b4de in buf_flush_dirty_pages (buf_pool=0x61b000001580, id=id@entry=922, observer=observer@entry=0x6040000899d0, first=first@entry=0) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0lru.cc:529
#6  0x000055ed4289bcc6 in buf_LRU_flush_or_remove_pages (id=922, observer=observer@entry=0x6040000899d0, first=first@entry=0) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/buf0buf.ic:1097
#7  0x000055ed42879b89 in FlushObserver::flush (this=this@entry=0x6040000899d0) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0flu.cc:3802
#8  0x000055ed4246d0f2 in row_merge_build_indexes (trx=<optimized out>, old_table=<optimized out>, new_table=new_table@entry=0x618000104908, online=<optimized out>, indexes=0x61d000384b58, key_numbers=<optimized out>, n_indexes=<optimized out>, 
    table=<optimized out>, defaults=<optimized out>, col_map=<optimized out>, add_autoinc=<optimized out>, sequence=..., skip_pk_sort=<optimized out>, stage=<optimized out>, add_v=<optimized out>, eval_table=<optimized out>, 
    allow_not_null=<optimized out>) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/row/row0merge.cc:5044
#9  0x000055ed42161605 in ha_innobase::inplace_alter_table (this=0x61c00032b0a8, altered_table=<optimized out>, ha_alter_info=<optimized out>) at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/handler/handler0alter.cc:7154
#10 0x000055ed414455ea in handler::ha_inplace_alter_table (ha_alter_info=0x73fb0c09d920, altered_table=0x61f0003a3e88, this=<optimized out>) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/handler.h:4207
#11 mysql_inplace_alter_table (thd=thd@entry=0x62a00024c208, table_list=<optimized out>, table=table@entry=0x61f0003a2288, altered_table=altered_table@entry=0x61f0003a3e88, ha_alter_info=ha_alter_info@entry=0x73fb0c09d920, 
    inplace_supported=inplace_supported@entry=HA_ALTER_INPLACE_COPY_LOCK, target_mdl_request=<optimized out>, alter_ctx=<optimized out>) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_table.cc:7654
#12 0x000055ed41468297 in mysql_alter_table (thd=thd@entry=0x62a00024c208, new_db=<optimized out>, new_name=<optimized out>, create_info=create_info@entry=0x73fb0c09f460, table_list=table_list@entry=0x62b00013b3f8, 
    alter_info=alter_info@entry=0x73fb0c09f360, order_num=<optimized out>, order=<optimized out>, ignore=<optimized out>) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_table.cc:9946
#13 0x000055ed415ddeee in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x62a00024c208) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_alter.cc:512
#14 0x000055ed411f1944 in mysql_execute_command (thd=thd@entry=0x62a00024c208) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_parse.cc:6022
#15 0x000055ed411f617f in mysql_parse (thd=thd@entry=0x62a00024c208, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x73fb0c0a1f40, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
    at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_parse.cc:7810
#16 0x000055ed411fd8bf in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62a00024c208, 
    packet=packet@entry=0x629000a8c209 " ALTER TABLE t1 ADD FULLTEXT INDEX IF NOT EXISTS `ftidx2` ( col_text ), LOCK = SHARED  /* E_R Thread2 QNO 295 CON_ID 66 */ ", packet_length=packet_length@entry=123, is_com_multi=is_com_multi@entry=false, 
    is_next_command=is_next_command@entry=false) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_class.h:1134
#17 0x000055ed412071d6 in do_command (thd=0x62a00024c208) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_parse.cc:1393
#18 0x000055ed415d09d0 in do_handle_one_connection (connect=connect@entry=0x608000001328) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_connect.cc:1403
#19 0x000055ed415d10de in handle_one_connection (arg=0x608000001328) at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_connect.cc:1308
#20 0x00007fff8d9bd609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x0000000067f30103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
 
Query (0x62b00013b228): ALTER TABLE t1 ADD FULLTEXT INDEX IF NOT EXISTS `ftidx2` ( col_text ), LOCK = SHARED  /* E_R Thread2 QNO 295 CON_ID 66 */
# 2020-10-15T12:52:08 [1967283] | [rr 1968947 779414]
# 2020-10-15T12:52:08 [1967283] | Connection ID (thread ID): 66
# 2020-10-15T12:52:08 [1967283] | [rr 1968947 779416]Status: KILL_TIMEOUT[rr 1968947 764251]201015 12:43:46 [rr 1968947 764254][ERROR] mysqld got signal 11 ;
...
Query (0x62b00013b228): ALTER TABLE t1 ADD FULLTEXT INDEX IF NOT EXISTS `ftidx2` ( col_text ), LOCK = SHARED  /* E_R Thread2 QNO 295 CON_ID 66 */
Connection ID (thread ID): 66
Status: KILL_TIMEOUT    <== Maybe important
 
 
origin/bb-10.3-MDEV-23072 8ebfe9811e805a9b508199bc0224a82f7fa69eb3 2020-10-15T13:26:18+05:30  compiled with debug + ASAN + Og
 
RQG
====
git clone https://github.com/mleich1/rqg --branch experimental RQG
    origin/experimental e95247cfe20eb07f11c81bfa551e45571dd3637d 
 
perl rqg.pl \
--grammar=conf/mariadb/table_stress_innodb.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=120 \
--duration=400 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=400 \
--mysqld=--innodb_stats_persistent=on \
--mysqld=--loose-max-statement-time=30 \
--threads=33 \
--mysqld=--innodb_page_size=64K \
--mysqld=--innodb-buffer-pool-size=256M \
--duration=400 \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_ \
--rr=Server \
--rr_options=--chaos
 
I hit this SEGV once on > 2500 RQG tests.



 Comments   
Comment by Matthias Leich [ 2020-10-15 ]

rr:/home/mleich/RQG/storage/1602758093
001304.log -- RQG log
001304.tgz - Archive with remains of the test including rr trace
 
cd /home/mleich/RQG/storage/1602758093/tmp1/dev/shm/vardir/1602758093/96/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio
 
error pattern for RQG
[ 'TBR-703' , 'in my_atomic_loadlint.{1,300}sync0types.h.{1,300}MutexDebug<TTASEventMutex<BlockMutexPolicy> >::is_owned.{1,300}sync0policy.h.{1,300}PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::is_owned.{1,300}ib0mutex.h.{1,300}buf_flush_or_remove_page' ],

Comment by Thirunarayanan Balathandayuthapani [ 2020-10-16 ]

 
[rr 1968947 764135]2020-10-15 12:43:46 0 [Note] InnoDB: buffer pool 0 : withdrawing blocks. (2047/2047)
[rr 1968947 764139]2020-10-15 12:43:46 0 [Note] InnoDB: buffer pool 0 : withdrew 953 blocks from free list. Tried to relocate 354 pages (2047/2047).
[rr 1968947 764143]2020-10-15 12:43:46 0 [Note] InnoDB: buffer pool 0 : withdrawn target 2047 blocks.
[rr 1968947 764147]2020-10-15 12:43:46 0 [Note] InnoDB: Latching whole of buffer pool.
[rr 1968947 764167]2020-10-15 12:43:46 0 [Note] InnoDB: buffer pool 0 : resizing with chunks 2 to 1.
[rr 1968947 764237]2020-10-15 12:43:46 0 [Note] InnoDB: buffer pool 0 : 1 chunks (2047 blocks) were freed.
 
t 44:
=====
#17 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (
    this=0x61b000001580, n_spins=<optimized out>, 
    n_delay=<optimized out>, name=<optimized out>, line=<optimized out>)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/ib0mutex.h:592
#18 0x000055ed42897552 in buf_flush_or_remove_page (
    buf_pool=buf_pool@entry=0x61b000001580, bpage=<optimized out>, 
    flush=<optimized out>)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0lru.cc:377
#19 0x000055ed4289adb2 in buf_flush_or_remove_pages (
    buf_pool=buf_pool@entry=0x61b000001580, id=id@entry=922, 
    observer=observer@entry=0x6040000899d0, first=first@entry=0)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/buf0flu.h:355
#20 0x000055ed4289b4de in buf_flush_dirty_pages (
    buf_pool=0x61b000001580, id=id@entry=922, 
    observer=observer@entry=0x6040000899d0, first=first@entry=0)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0lru.cc:529
#21 0x000055ed4289bcc6 in buf_LRU_flush_or_remove_pages (id=922, 
    observer=observer@entry=0x6040000899d0, first=first@entry=0)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/include/buf0buf.ic:1097
#22 0x000055ed42879b89 in FlushObserver::flush (
    this=this@entry=0x6040000899d0)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0flu.cc:3802
#23 0x000055ed4246d0f2 in row_merge_build_indexes (trx=<optimized out>, 
--Type <RET> for more, q to quit, c to continue without paging--
    old_table=<optimized out>, 
    new_table=new_table@entry=0x618000104908, online=<optimized out>, 
    indexes=0x61d000384b58, key_numbers=<optimized out>, 
    n_indexes=<optimized out>, table=<optimized out>, 
    defaults=<optimized out>, col_map=<optimized out>, 
    add_autoinc=<optimized out>, sequence=..., 
    skip_pk_sort=<optimized out>, stage=<optimized out>, 
    add_v=<optimized out>, eval_table=<optimized out>, 
    allow_not_null=<optimized out>)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/row/row0merge.cc:5044
#24 0x000055ed42161605 in ha_innobase::inplace_alter_table (
    this=0x61c00032b0a8, altered_table=<optimized out>, 
    ha_alter_info=<optimized out>)
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/handler/handler0alter.cc:7154
#25 0x000055ed414455ea in handler::ha_inplace_alter_table (
    ha_alter_info=0x73fb0c09d920, altered_table=0x61f0003a3e88, 
    this=<optimized out>)
    at /home/mleich/Server/bb-10.3-MDEV-23072/sql/handler.h:4207
#26 mysql_inplace_alter_table (thd=thd@entry=0x62a00024c208, 
    table_list=<optimized out>, table=table@entry=0x61f0003a2288, 
    altered_table=altered_table@entry=0x61f0003a3e88, 
    ha_alter_info=ha_alter_info@entry=0x73fb0c09d920, 
    inplace_supported=inplace_supported@entry=HA_ALTER_INPLACE_COPY_LOCK, target_mdl_request=<optimized out>, alter_ctx=<optimized out>)
    at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_table.cc:7654
#27 0x000055ed41468297 in mysql_alter_table (
    thd=thd@entry=0x62a00024c208, new_db=<optimized out>, 
    new_name=<optimized out>, 
    create_info=create_info@entry=0x73fb0c09f460, 
    table_list=table_list@entry=0x62b00013b3f8, 
    alter_info=alter_info@entry=0x73fb0c09f360, 
    order_num=<optimized out>, order=<optimized out>, 
    ignore=<optimized out>)
    at /home/mleich/Server/bb-10.3-MDEV-23072/sql/sql_table.cc:9946
 
 
t 28:
======
 
(rr) where
#0  buf_pool_resize ()
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0buf.cc:3077
#1  0x000055ed4284fd13 in buf_resize_thread ()
    at /home/mleich/Server/bb-10.3-MDEV-23072/storage/innobase/buf/buf0buf.cc:3207
#2  0x00007fff8d9bd609 in start_thread (arg=<optimized out>)
    at pthread_create.c:477
#3  0x0000000067f30103 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This is the stacktrace when failure happened.
Problem is that block_mutex could be freed while withdrawing the blocks.

  and block mutex. */
                processed = buf_flush_page(
                        buf_pool, bpage, BUF_FLUSH_SINGLE_PAGE, false);
 
                if (processed) {
                        /* Wake possible simulated aio thread to actually
                        post the writes to the operating system */
                        os_aio_simulated_wake_handler_threads();
                        buf_pool_mutex_enter(buf_pool);
                } else {
                        mutex_exit(block_mutex);
                }
        } else {
                mutex_exit(block_mutex);
        }
 
        buf_flush_list_mutex_enter(buf_pool);
 
        ut_ad(!mutex_own(block_mutex));
        ut_ad(buf_pool_mutex_own(buf_pool));

When t44 is waiting to acquire buf_pool_mutex_enter(buf_pool), InnoDB does buffer pool resizing and withdraws few blocks.
ut_ad(!mutex_own(block_mutex) block mutex could be freed.

Comment by Thirunarayanan Balathandayuthapani [ 2020-12-14 ]

It should be fixed by MDEV-23693. Can mleich you please check it ?

Comment by Matthias Leich [ 2020-12-14 ]

Hitting the issue was already rare in history. And I do not have some test which offers some significant hitrate.
The only thing I can tell is that the error pattern "TBR-703" was never hit again since end of October 2020.
And that corresponds quite good with   MDEV-23693  Resolved:   2020-10-27 14:31 
Between end of October and today (14'th December) more than 100000 RQG tests were run.

Generated at Thu Feb 08 09:26:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.