[MDEV-33031] Assertion failure upon reading from performance schema with binlog enabled Created: 2023-12-14  Updated: 2024-01-22  Resolved: 2024-01-09

Status: Closed
Project: MariaDB Server
Component/s: Performance Schema, Replication, Storage Engine - Spider
Affects Version/s: 10.6
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Blocks
blocks MDEV-33150 double-locking of LOCK_thd_kill in pe... Closed

 Description   

--source include/have_log_bin.inc
 
--connect (con1,localhost,root,,)
SELECT VARIABLE_VALUE FROM performance_schema.status_by_thread;

10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301

mariadbd: /data/bld/10.6-asan/sql/sql_class.cc:454: void* thd_get_ha_data(const THD*, const handlerton*): Assertion `thd == _current_thd() || ((&(&thd->LOCK_thd_data)->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(&thd->LOCK_thd_data)->m_mutex)->thread))' failed.
231214 19:34:03 [ERROR] mysqld got signal 6 ;
 
#9  0x00007f87e7853e32 in __GI___assert_fail (assertion=0x559ef6901ac0 "thd == _current_thd() || ((&(&thd->LOCK_thd_data)->m_mutex)->count > 0 && pthread_equal(pthread_self(), (&(&thd->LOCK_thd_data)->m_mutex)->thread))", file=0x559ef6901840 "/data/bld/10.6-asan/sql/sql_class.cc", line=454, function=0x559ef6901a60 "void* thd_get_ha_data(const THD*, const handlerton*)") at ./assert/assert.c:101
#10 0x0000559ef47ef0f2 in thd_get_ha_data (thd=0x62b00008c218, hton=0x615000001998) at /data/bld/10.6-asan/sql/sql_class.cc:454
#11 0x0000559ef54d8fa9 in MYSQL_BIN_LOG::set_status_variables (this=0x559ef8ca59c0 <mysql_bin_log>, thd=0x62b00008c218) at /data/bld/10.6-asan/sql/log.cc:11784
#12 0x0000559ef54d8d54 in show_binlog_vars (thd=0x62b00008c218, var=0x7f87dfc737c0, status_var=0x62b00008cf50) at /data/bld/10.6-asan/sql/log.cc:11717
#13 0x0000559ef5a6e4dd in PFS_status_variable_cache::manifest (this=0x61d0002396a8, thd=0x62b00008c218, show_var_array=0x623000031118, status_vars=0x62b00008cf50, prefix=0x559ef7143ce0 "", nested_array=false, strict=true) at /data/bld/10.6-asan/storage/perfschema/pfs_variable.cc:1167
#14 0x0000559ef5a6dccf in PFS_status_variable_cache::do_materialize_session (this=0x61d0002396a8, pfs_thread=0x7f87e57e5a00) at /data/bld/10.6-asan/storage/perfschema/pfs_variable.cc:1078
#15 0x0000559ef5ae2960 in PFS_variable_cache<Status_variable>::materialize_session (this=0x61d0002396a8, pfs_thread=0x7f87e57e5a00, use_mem_root=false) at /data/bld/10.6-asan/storage/perfschema/pfs_variable.h:574
#16 0x0000559ef5ae1643 in table_status_by_thread::rnd_next (this=0x61d000239680) at /data/bld/10.6-asan/storage/perfschema/table_status_by_thread.cc:140
#17 0x0000559ef59f2933 in ha_perfschema::rnd_next (this=0x61b0000405b8, buf=0x62300002f538 "\377") at /data/bld/10.6-asan/storage/perfschema/ha_perfschema.cc:358
#18 0x0000559ef51bbd3d in handler::ha_rnd_next (this=0x61b0000405b8, buf=0x62300002f538 "\377") at /data/bld/10.6-asan/sql/handler.cc:3459
#19 0x0000559ef5613a57 in rr_sequential (info=0x62b0000d6798) at /data/bld/10.6-asan/sql/records.cc:519
#20 0x0000559ef476f854 in READ_RECORD::read_record (this=0x62b0000d6798) at /data/bld/10.6-asan/sql/records.h:81
#21 0x0000559ef4a8d75b in join_init_read_record (tab=0x62b0000d66c0) at /data/bld/10.6-asan/sql/sql_select.cc:22862
#22 0x0000559ef4a865e4 in sub_select (join=0x62b0000d5370, join_tab=0x62b0000d66c0, end_of_records=false) at /data/bld/10.6-asan/sql/sql_select.cc:21826
#23 0x0000559ef4a84550 in do_select (join=0x62b0000d5370, procedure=0x0) at /data/bld/10.6-asan/sql/sql_select.cc:21352
#24 0x0000559ef4a0e3da in JOIN::exec_inner (this=0x62b0000d5370) at /data/bld/10.6-asan/sql/sql_select.cc:4882
#25 0x0000559ef4a0b8b2 in JOIN::exec (this=0x62b0000d5370) at /data/bld/10.6-asan/sql/sql_select.cc:4660
#26 0x0000559ef4a0fd11 in mysql_select (thd=0x62b0000cb218, tables=0x62b0000d29d8, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x62b0000d5340, unit=0x62b0000cf5e0, select_lex=0x62b0000d2320) at /data/bld/10.6-asan/sql/sql_select.cc:5139
#27 0x0000559ef49df0f2 in handle_select (thd=0x62b0000cb218, lex=0x62b0000cf518, result=0x62b0000d5340, setup_tables_done_option=0) at /data/bld/10.6-asan/sql/sql_select.cc:561
#28 0x0000559ef49436f7 in execute_sqlcom_select (thd=0x62b0000cb218, all_tables=0x62b0000d29d8) at /data/bld/10.6-asan/sql/sql_parse.cc:6285
#29 0x0000559ef4931e2c in mysql_execute_command (thd=0x62b0000cb218, is_called_from_prepared_stmt=false) at /data/bld/10.6-asan/sql/sql_parse.cc:3961
#30 0x0000559ef494e74d in mysql_parse (thd=0x62b0000cb218, rawbuf=0x62b0000d2238 "SELECT VARIABLE_VALUE FROM performance_schema.status_by_thread", length=62, parser_state=0x7f87dfc769f0) at /data/bld/10.6-asan/sql/sql_parse.cc:8051
#31 0x0000559ef4924372 in dispatch_command (command=COM_QUERY, thd=0x62b0000cb218, packet=0x62900026c219 "", packet_length=62, blocking=true) at /data/bld/10.6-asan/sql/sql_parse.cc:1896
#32 0x0000559ef49210a6 in do_command (thd=0x62b0000cb218, blocking=true) at /data/bld/10.6-asan/sql/sql_parse.cc:1409
#33 0x0000559ef4d9456a in do_handle_one_connection (connect=0x608000002fb8, put_in_cache=true) at /data/bld/10.6-asan/sql/sql_connect.cc:1416
#34 0x0000559ef4d93f2b in handle_one_connection (arg=0x608000002fb8) at /data/bld/10.6-asan/sql/sql_connect.cc:1318
#35 0x0000559ef59f8f16 in pfs_spawn_thread (arg=0x617000006618) at /data/bld/10.6-asan/storage/perfschema/pfs.cc:2201
#36 0x00007f87e78a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#37 0x00007f87e792861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The failure started happening after this commit in 10.6:

commit 736a54f49c72d89fb82ef4165e96cddb506cf555
Author: Sergei Golubchik
Date:   Wed Dec 13 00:37:57 2023 +0100
 
    perfschema: use LOCK_thd_kill to "keep THD during materialization"



 Comments   
Comment by Kristian Nielsen [ 2023-12-14 ]

Looks like this just needs to add lock/unlock of the mutex LOCK_thd_data across the call to thd_get_ha_data() in TC_LOG_BINLOG::set_status_variables().
This was not necessary before Serg's patch, but now it's LOCK_thd_kill that's being held by the caller instead, so LOCK_thd_data must be taken explicitly.

(Though - how did this get through buildbot ? Are we missing coverage in the test suite for this, perhaps?)

Comment by Sergei Golubchik [ 2024-01-05 ]

please, see commits b7bcbda85394 and 3be269542f4d

Comment by Yuchen Pei [ 2024-01-09 ]

Thanks for the patches - reviewed on the mailing list:

https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/OHCITR7R76N7XOCYMKVFFG6W4D6CWKOU/
https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/ZMWRAPCP2D5JHMZN34MRHH46P3ZTNCG7/

Generated at Thu Feb 08 10:35:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.