[MDEV-15752] Possible race between DDL and accessing I_S.INNODB_TABLESPACES_ENCRYPTION Created: 2018-04-02  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Thirunarayanan Balathandayuthapani
Resolution: Unresolved Votes: 0
Labels: None

Attachments: HTML File threads    
Issue Links:
Relates
relates to MDEV-21452 Use condition variables and normal mu... Closed
relates to MDEV-23855 InnoDB log checkpointing causes regre... Closed

 Description   

https://api.travis-ci.org/v3/job/360919346/log.txt

10.3 19bb7fdcd6e68aca6e41a2e30ffb3c2ad1d14cb6

mysqld: /home/travis/src/storage/innobase/buf/buf0buf.cc:4269: buf_block_t* buf_page_get_gen(const page_id_t&, const page_size_t&, ulint, buf_block_t*, ulint, const char*, unsigned int, mtr_t*, dberr_t*): Assertion `found' failed.
180401 23:17:11 [ERROR] mysqld got signal 6 ;
 
Query (0x7fd7d801fcd8): REPLACE INTO transforms.insert_select_24744  SELECT * FROM (INFORMATION_SCHEMA.`INNODB_TABLESPACES_ENCRYPTION` AS table1 INNER JOIN mysql.`general_log` AS table2 ON ( table2.`user_host` = table1.`ENCRYPTION_SCHEME` ) ) WHERE table1.`ENCRYPTION_SCHEME` = table2.`user_host` LIMIT 5 /* QNO 2811 CON_ID 18 */  /* QNO 2834 CON_ID 18 */
Connection ID (thread ID): 18
Status: NOT_KILLED

#7  0x00007fd823382ca2 in __GI___assert_fail (assertion=0x559920bddd5f "found", file=0x559920bdc030 "/home/travis/src/storage/innobase/buf/buf0buf.cc", line=4269, function=0x559920be1820 <buf_page_get_gen(page_id_t const&, page_size_t const&, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*)::__PRETTY_FUNCTION__> "buf_block_t* buf_page_get_gen(const page_id_t&, const page_size_t&, ulint, buf_block_t*, ulint, const char*, unsigned int, mtr_t*, dberr_t*)") at assert.c:101
#8  0x0000559920587497 in buf_page_get_gen (page_id=..., page_size=..., rw_latch=1, guess=0x0, mode=10, file=0x559920c15878 "/home/travis/src/storage/innobase/fil/fil0crypt.cc", line=963, mtr=0x7fd80d909a90, err=0x0) at /home/travis/src/storage/innobase/buf/buf0buf.cc:4269
#9  0x0000559920625e6f in fil_crypt_read_crypt_data (space=0x7fd7b8451ba0) at /home/travis/src/storage/innobase/fil/fil0crypt.cc:962
#10 0x00005599206293cf in fil_space_crypt_get_status (space=0x7fd7b8451ba0, status=0x7fd80d90a040) at /home/travis/src/storage/innobase/fil/fil0crypt.cc:2432
#11 0x000055992034ddb6 in i_s_dict_fill_tablespaces_encryption (thd=0x7fd7d800b360, space=0x7fd7b8451ba0, table_to_fill=0x7fd7d8168a58) at /home/travis/src/storage/innobase/handler/i_s.cc:8566
#12 0x000055992034e2f4 in i_s_tablespaces_encryption_fill_table (thd=0x7fd7d800b360, tables=0x7fd7d8020f58) at /home/travis/src/storage/innobase/handler/i_s.cc:8640
#13 0x000055991fea1fb3 in get_schema_tables_result (join=0x7fd7d80239d8, executed_place=PROCESSED_BY_JOIN_EXEC) at /home/travis/src/sql/sql_show.cc:8872
#14 0x000055991fe3bae8 in JOIN::exec_inner (this=0x7fd7d80239d8) at /home/travis/src/sql/sql_select.cc:3952
#15 0x000055991fe3b18e in JOIN::exec (this=0x7fd7d80239d8) at /home/travis/src/sql/sql_select.cc:3783
#16 0x000055991fe3c402 in mysql_select (thd=0x7fd7d800b360, tables=0x7fd7d8020f58, wild_num=1, fields=..., conds=0x7fd7d8022ab8, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=553782610688, result=0x7fd7d8023930, unit=0x7fd7d800f1a8, select_lex=0x7fd7d800f920) at /home/travis/src/sql/sql_select.cc:4188
#17 0x000055991fe2e69b in handle_select (thd=0x7fd7d800b360, lex=0x7fd7d800f0e0, result=0x7fd7d8023930, setup_tables_done_option=1073741824) at /home/travis/src/sql/sql_select.cc:382
#18 0x000055991fdf3197 in mysql_execute_command (thd=0x7fd7d800b360) at /home/travis/src/sql/sql_parse.cc:4833
#19 0x000055991fdfd1dc in mysql_parse (thd=0x7fd7d800b360, rawbuf=0x7fd7d801fcd8 "REPLACE INTO transforms.insert_select_24744  SELECT * FROM (INFORMATION_SCHEMA.`INNODB_TABLESPACES_ENCRYPTION` AS table1 INNER JOIN mysql.`general_log` AS table2 ON ( table2.`user_host` = table1.`ENCR"..., length=331, parser_state=0x7fd80d90b5e0, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:8013
#20 0x000055991fdea8d2 in dispatch_command (command=COM_QUERY, thd=0x7fd7d800b360, packet=0x7fd7d8017791 "REPLACE INTO transforms.insert_select_24744  SELECT * FROM (INFORMATION_SCHEMA.`INNODB_TABLESPACES_ENCRYPTION` AS table1 INNER JOIN mysql.`general_log` AS table2 ON ( table2.`user_host` = table1.`ENCR"..., packet_length=332, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:1842
#21 0x000055991fde931e in do_command (thd=0x7fd7d800b360) at /home/travis/src/sql/sql_parse.cc:1387
#22 0x000055991ff4e9e1 in do_handle_one_connection (connect=0x5599235f94c0) at /home/travis/src/sql/sql_connect.cc:1402
#23 0x000055991ff4e76e in handle_one_connection (arg=0x5599235f94c0) at /home/travis/src/sql/sql_connect.cc:1308
#24 0x00007fd823f44184 in start_thread (arg=0x7fd80d90c700) at pthread_create.c:312
#25 0x00007fd82345103d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

experimental 4feb78e17fbecf7fc7f2847b49b8c66b54879629

perl /home/travis/rqg/runall-new.pl --duration=350 --threads=6 --seed=1522624566 --reporters=Backtrace,ErrorLog,Deadlock --validators=TransformerNoComparator --transformers=ConvertSubqueriesToViews,ConvertTablesToDerived,Count,DisableOptimizations,Distinct,EnableOptimizations,ExecuteAsCTE,ExecuteAsDeleteReturning,ExecuteAsDerived,ExecuteAsExcept,ExecuteAsExecuteImmediate,ExecuteAsInsertSelect,ExecuteAsIntersect,ExecuteAsSelectItem,ExecuteAsUnion,ExecuteAsUpdateDelete,ExecuteAsView,ExecuteAsWhereSubquery,Having,InlineSubqueries,LimitRowsExamined,OrderBy,StraightJoin,ExecuteAsPreparedTwice,ExecuteAsTrigger,ExecuteAsSPTwice,ExecuteAsFunctionTwice --redefine=conf/mariadb/general-workarounds.yy --mysqld=--log_output=FILE --mysqld=--log_bin_trust_function_creators=1 --mysqld=--log-bin --mysqld=--max-statement-time=30 --mysqld=--loose-debug_assert_on_not_freed_memory=0 --views --redefine=conf/mariadb/versioning.yy --basedir=/home/travis/server --grammar=conf/runtime/information_schema.yy --vardir=/home/travis/logs/vardir1_1

Not reproducible so far.



 Comments   
Comment by Marko Mäkelä [ 2018-04-07 ]

MDEV-11984 simplified the access to tablespace metadata in i_s_tablespaces_encryption_fill_table(), possibly affecting reproducibility. This should be possible in 10.2 as well, by appropriately timed

SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION;

while some DDL operation is in progress (even on non-encrypted tables). In this case, a good candidate is ALTER TABLE on a partitioned table:

#11 0x00005599202fbb9b in ha_innobase::rename_table (this=0x7fd7b855e898, from=0x7fd80dbfa110 "./test/#sql-607a_f#P#p6", to=0x7fd80dbfa320 "./test/B#P#p6") at /home/travis/src/storage/innobase/handler/ha_innodb.cc:13273
#12 0x00005599200e7979 in handler::ha_rename_table (this=0x7fd7b855e898, from=0x7fd80dbfa110 "./test/#sql-607a_f#P#p6", to=0x7fd80dbfa320 "./test/B#P#p6") at /home/travis/src/sql/handler.cc:4508
#13 0x000055992083a396 in ha_partition::del_ren_table (this=0x7fd7b8520690, from=0x7fd80dbfa800 "./test/#sql-607a_f", to=0x7fd80dbfaa10 "./test/B") at /home/travis/src/sql/ha_partition.cc:2359
#14 0x000055992083576e in ha_partition::rename_table (this=0x7fd7b8520690, from=0x7fd80dbfa800 "./test/#sql-607a_f", to=0x7fd80dbfaa10 "./test/B") at /home/travis/src/sql/ha_partition.cc:609
#15 0x00005599200e7979 in handler::ha_rename_table (this=0x7fd7b8520690, from=0x7fd80dbfa800 "./test/#sql-607a_f", to=0x7fd80dbfaa10 "./test/B") at /home/travis/src/sql/handler.cc:4508
#16 0x000055991fec1ef9 in mysql_rename_table (base=0x55992354ffa0, old_db=0x7fd80dbfc040, old_name=0x7fd80dbfc070, new_db=0x7fd80dbfc040, new_name=0x7fd80dbfc060, flags=1) at /home/travis/src/sql/sql_table.cc:5562
#17 0x000055991fece2f4 in mysql_alter_table (thd=0x7fd7b8015910, new_db=0x7fd7b8019f68, new_name=0x7fd7b801a318, create_info=0x7fd80dbfcbf0, table_list=0x7fd7b802c518, alter_info=0x7fd80dbfcb30, order_num=0, order=0x0, ignore=false) at /home/travis/src/sql/sql_table.cc:10095

I do not understand why fil_space_crypt_get_status() needs to invoke buf_page_get_gen() at all. Shouldn’t this information be updated directly in crypt_data when the first page is read or updated in the first place?

In threads, we can see that the tablespace full well exists. The debug assertion fails here:

		bool			found;
		const page_size_t&	space_page_size
			= fil_space_get_page_size(page_id.space(), &found);
		ut_ad(found);
		ut_ad(page_size.equals_to(space_page_size));

The call fil_space_get_page_size() can result in !found even if the tablespace exists, because it is invoking fil_space_get_space() which contains some non-trivial logic. I believe that the code can and should be made more robust:

diff --git a/storage/innobase/fil/fil0crypt.cc b/storage/innobase/fil/fil0crypt.cc
index b4fead7335a..592acce53c9 100644
--- a/storage/innobase/fil/fil0crypt.cc
+++ b/storage/innobase/fil/fil0crypt.cc
@@ -956,6 +956,11 @@ fil_crypt_read_crypt_data(fil_space_t* space)
 		return;
 	}
 
+	if (!fil_space_get_size(space->id)) {
+		/* The file cannot be accessed. */
+		return;
+	}
+
 	const page_size_t page_size(space->flags);
 	mtr_t	mtr;
 	mtr.start();

A more proper fix would be to remove this whole function, to make the crypt_data a more integral part of fil_space_t, and to ensure that crypt_data is always updated in sync with any access to the file (be it the very first time the first page is read, or any update of the encryption status).

Comment by Marko Mäkelä [ 2018-04-07 ]

I pushed a work-around to 10.1 that should reduce the probability of the race, but not prevent it altogether. I think that we should tightly integrate crypt_data with fil_space_t and as part of that effort remove the function fil_crypt_read_crypt_data().

Comment by Marko Mäkelä [ 2018-04-24 ]

Related to this, I think that we should remove SYNC_NO_ORDER_CHECK, especially for the encryption mutexes and rw-locks. Maybe we could use fil_space_t::latch or just fil_space_acquire() for encryption, and no separate mutex in fil_space_t::crypt_data?
Also related to this, I can observe two encryption tests hanging in Valgrind. (You may need a work-around first.)

diff --git a/mysql-test/mysql-test-run.pl b/mysql-test/mysql-test-run.pl
index f8f1e0af45a..e7aff0d2f35 100755
--- a/mysql-test/mysql-test-run.pl
+++ b/mysql-test/mysql-test-run.pl
@@ -4003,8 +4003,8 @@ sub run_testcase ($$) {
       else
       {
          $proc= My::SafeProcess->wait_any_timeout($test_timeout);
-         $keep_waiting_proc{$proc} = 1;
       }
+      $keep_waiting_proc{$proc} = 1;
     }
 
     if (scalar(keys(%keep_waiting_proc)) == 0)

./mtr --valgrind --mem --parallel=auto --force --retry=0 encryption.innodb-first-page-read encryption.innodb_encryption_tables

10.2 7b5543b21d31bd1517b49c2524a05b5b83fd9691

CURRENT_TEST: encryption.innodb_encryption_tables
2018-04-24 20:08:20 67396928 [Note] /mariadb/10.2/build/sql/mysqld: ready for connections.
Version: '10.2.15-MariaDB-debug-log'  socket: '/mariadb/10.2/build/mysql-test/var/tmp/29/mysqld.1.sock'  port: 16560  Source distribution
2018-04-24 20:23:26 68327168 [Warning] InnoDB: Trying to close/delete/truncate tablespace 'test/innodb_dynamic' but there are 1 pending operations on it.
2018-04-24 20:29:49 68327168 [Warning] InnoDB: Trying to close/delete/truncate tablespace 'test/innodb_dynamic' but there are 1 pending operations on it.
CURRENT_TEST: encryption.innodb-first-page-read
2018-04-24 20:31:03 68327168 [Warning] InnoDB: Trying to close/delete/truncate tablespace 'innodb_test/innodb_pagecomp1' but there are 1 pending operations on it.

Comment by Marko Mäkelä [ 2018-04-24 ]

It may be necessary to run

./mtr --valgrind --mem --parallel=auto --force --suite=innodb,encryption

to repeat the hangs, because the hangs could depend on some purge or change buffer workload created by earlier tests.

Comment by Marko Mäkelä [ 2018-04-24 ]

The test encryption.innodb_encryption_tables seems to always fail. When running the test by itself, you just have to wait 10 minutes for the first message to appear:

./mtr --valgrind encryption.innodb_encryption_tables

2018-04-24 21:03:17 67396928 [Note] /mariadb/10.2/build/sql/mysqld: ready for connections.
Version: '10.2.15-MariaDB-debug-log'  socket: '/mariadb/10.2/build/mysql-test/var/tmp/1/mysqld.1.sock'  port: 16000  Source distribution
2018-04-24 21:13:23 68327168 [Warning] InnoDB: Trying to close/delete/truncate tablespace 'test/innodb_dynamic' but there are 1 pending operations on it.

The message will be repeated several times per second.

Comment by Marko Mäkelä [ 2021-07-01 ]

In MDEV-23855, as part of reducing contention on fil_system.mutex, the tablespace acquisition in i_s_tablespaces_encryption_fill_table() was modified. If I remember correctly, something like this bug was found during internal RQG testing back then, and the code was revised as a result.

The latching order checks were replaced in MDEV-21452.

elenst, have you encountered this bug in any version recently?

Comment by Elena Stepanova [ 2021-07-01 ]

No, I don't have any recent records of hitting it.

Comment by Marko Mäkelä [ 2021-07-01 ]

I see that the function fil_crypt_read_crypt_data() still exists. In an earlier comment, I suggested that it should be removed.

Generated at Thu Feb 08 08:23:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.