[MDEV-27616] heap-use-after-free storage/innobase/handler/handler0alter.cc:6787 in prepare_inplace_alter_table_dict Created: 2022-01-25  Updated: 2023-11-28

Status: Stalled
Project: MariaDB Server
Component/s: Partitioning
Affects Version/s: 10.8.0
Fix Version/s: 10.11

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Alexey Botchkov
Resolution: Unresolved Votes: 0
Labels: rr-profile

Attachments: File MDEV-27616.log    

 Description   

The ASAN failure was found on
origin/10.8, 10.8 5595ed9d9f20aeb463ac26e075c9cba41927e85e 2022-01-24T19:26:09+09:00
+ patch for MDEV-19555 + preliminary patch for MDEV-14481.
Per Marko its unlikely that the problem is caused by the patches.
 
==3771038==ERROR: AddressSanitizer: heap-use-after-free on address 0x61f0000492c4 at pc 0x55ea1c35b1c7 bp 0x7f09e985f380 sp 0x7f09e985f370
READ of size 4 at 0x61f0000492c4 thread T39
[Note] InnoDB: Online DDL : End of reading clustered index of the table and create temporary files
#0 0x55ea1c35b1c6 in prepare_inplace_alter_table_dict /data/Server/10.8-MDEV-14481/storage/innobase/handler/handler0alter.cc:6787
#1 0x55ea1c36b433 in ha_innobase::prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/storage/innobase/handler/handler0alter.cc:8276
#2 0x55ea1ba3825f in handler::ha_prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/sql/handler.cc:5172
#3 0x55ea1c1db64b in ha_partition::prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/sql/ha_partition.cc:10401
#4 0x55ea1ba3825f in handler::ha_prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/Server/10.8-MDEV-14481/sql/handler.cc:5172
#5 0x55ea1b4ab27b in mysql_inplace_alter_table /data/Server/10.8-MDEV-14481/sql/sql_table.cc:7450
#6 0x55ea1b4bf5c8 in mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool) /data/Server/10.8-MDEV-14481/sql/sql_table.cc:10347
#7 0x55ea1b64af5f in Sql_cmd_alter_table::execute(THD*) /data/Server/10.8-MDEV-14481/sql/sql_alter.cc:544
#8 0x55ea1b219b50 in mysql_execute_command(THD*, bool) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:5988
#9 0x55ea1b225f8e in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:8027
#10 0x55ea1b1fe37f in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:1894
#11 0x55ea1b1fb757 in do_command(THD*, bool) /data/Server/10.8-MDEV-14481/sql/sql_parse.cc:1402
...
Query (0x6290012e8238): ALTER TABLE a ENGINE = INNODB
Status: KILL_TIMEOUT
 
MDEV-27616.log and also pluto:/data/results/1643042626/TBR-1333/rqg.log
== RQG log which contains output written by ASAN etc.
pluto:/data/results/1643042626/TBR-1333 # Unpacked 
pluto:/data/results/1643042626/TBR-1333/dev/shm/rqg/1643042626/19/1/rr
== location of rr trace
 
RQG
===
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental 17c90b97a931f3b395258aec37903273627de6d7 2022-01-21T11:19:46+01:00
# rqg.pl  : Version 4.0.4 (2021-12)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/partitions_innodb.yy \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma \
# --mysqld=--plugin-load-add=provider_snappy \
# --mysqld=--plugin-load-add=provider_lz4 \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--innodb_stats_persistent=on \
# --mysqld=--innodb_adaptive_hash_index=off \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=33 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--chaos --wait \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --vardir_type=fast \
# --mysqld=--innodb_page_size=16K \
# --mysqld=--innodb-buffer-pool-size=256M \
# --no_mask \
# <some local settings>
 



 Comments   
Comment by Nayuta Yanagisawa (Inactive) [ 2022-01-25 ]

mleich Could you share the entire test case to reproduce the ASAN error?

Comment by Matthias Leich [ 2022-01-25 ]

Hi Nayuta, its a bit complicated.
The description contains how to clone my version of RQG and also how the call of the test which showed the problem looked like.
But this is only half of the truth because hitting that problem is extreme rare and definitely not deterministic.
I had one hit per 10000 more or less differing RQG tests executed.
~ 273  fiddled excessive with partitioning DDL
~ 1000 fiddled a bit with partitioning DDL
There was also most time a CPU load (user+system) > 97% on the testing box
(nproc=104, 384GB RAM). This was achieved by having around 220 RQG runs
in parallel and managed by my RQG batch tool.
Please try some analysis based on the rr trace first.
If it looks too complicated or hopeless than I can try to simplify the test.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-01-25 ]

I understood the situation. OK. I will check the rr trace soon.

Comment by Matthias Leich [ 2022-01-27 ]

I tried to reproduce the problem and to simplify the test by using the automatic RQG Simplifier.
No replay and also no other unknown bad effect even though experimenting with different setups.

Generated at Thu Feb 08 09:54:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.