[MDEV-20169] main.partition_innodb fails in buildbot with wrong result Created: 2019-07-24  Updated: 2024-01-22  Resolved: 2023-12-04

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB, Tests
Affects Version/s: 10.4, 10.5
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2, 11.4.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Fixed Votes: 1
Labels: innodb-noisy-estimates, rr-profile-analyzed

Issue Links:
Relates
relates to MDEV-21136 InnoDB's records_in_range estimates c... Closed
relates to MDEV-22119 main.innodb_ext_key fails sporadically Closed
relates to MDEV-33080 main.innodb_ext_key test case explain... Open

 Description   

http://buildbot.askmonty.org/buildbot/builders/win32-debug/builds/14995

10.4 11f3e2366282eb8cf1cb0062793d1020

main.partition_innodb 'innodb'           w1 [ fail ]
        Test ended at 2019-07-24 16:28:53
 
CURRENT_TEST: main.partition_innodb
--- D:/win32-debug/build/src/mysql-test/main/partition_innodb.result	2019-07-24 16:10:28.471773300 +0000
+++ D:\win32-debug\build\src\mysql-test\main\partition_innodb.reject	2019-07-24 16:28:53.693111000 +0000
@@ -406,7 +406,7 @@
 insert into t1 values (0), (1), (2), (3);
 show table status;
 Name	Engine	Version	Row_format	Rows	Avg_row_length	Data_length	Max_data_length	Index_length	Data_free	Auto_increment	Create_time	Update_time	Check_time	Collation	Checksum	Create_options	Comment	Max_index_length	Temporary
-t1	InnoDB	10	Dynamic	4	4096	16384	0	0	#	NULL	#	NULL	NULL	latin1_swedish_ci	NULL	partitioned		0	N
+t1	InnoDB	10	Dynamic	5	3276	16384	0	0	#	NULL	#	NULL	NULL	latin1_swedish_ci	NULL	partitioned		0	N
 drop table t1;
 create table t1 (a int auto_increment primary key)
 engine = innodb
 
mysqltest: Result content mismatch



 Comments   
Comment by Marko Mäkelä [ 2020-03-09 ]

I was not able to repeat this bug. The failing test is only performing a single 4-row INSERT into an empty partitioned table. The number of rows (dict_table_t::stat_n_rows) is incremented during an insert by row_insert_for_mysql(). Because the partitions are small, the check in dict_stats_update_if_needed() should not be triggered. The counter is not being updated by any other means. I used a reduced test:

--source include/have_partition.inc
--source include/have_innodb.inc
create table t1 (a int) engine = innodb partition by key (a);
--replace_column 12 Create_time
show table status;
insert into t1 values (0), (1), (2), (3);
--replace_column 12 Create_time 13 Update_time
show table status;
drop table t1;

I ran the above test for more than 2×2600 times without a result difference (on 10.2 and 10.5).

While investigating this, I noticed a strange change in MySQL 5.5.37 that is supposed not to report Rows=2 for an empty partitioned table. Yet, we are reporting 2 rows. It seems that the reason for that is a questionable hack in ha_innobase::info_low():

		/*
		The MySQL optimizer seems to assume in a left join that n_rows
		is an accurate estimate if it is zero. Of course, it is not,
		since we do not have any locks on the rows yet at this phase.
		Since SHOW TABLE STATUS seems to call this function with the
		HA_STATUS_TIME flag set, while the left join optimizer does not
		set that flag, we add one to a zero value if the flag is not
		set. That way SHOW TABLE STATUS will show the best estimate,
		while the optimizer never sees the table empty. */
 
		if (n_rows == 0 && !(flag & HA_STATUS_TIME)) {
			n_rows++;
		}

I believe that the removal of such hacks should be the task of our optimizer team. Also, it is worth noting that ha_partition::info() is invoking handler::info() in two loops. Could we not merge those loops?

Unfortunately, neither in the InnoDB nor in ha_partition, I can find anything that could explain the observed result difference (5 rows instead of 4 being occasionally reported).
elenst, is this reasonably well repeatable on some platform? If yes, then it would be useful to try rr record on that platform.

Comment by Marko Mäkelä [ 2020-03-10 ]

There is a very similar failure of main.innodb_ext_key:

10.5 980108ceebdca5c4f6c9e3a167e9ad40cb062ac8

CURRENT_TEST: main.innodb_ext_key
--- /mnt/buildbot/build/mariadb-10.5.2/mysql-test/main/innodb_ext_key.result	2020-03-09 16:29:55.000000000 -0400
+++ /mnt/buildbot/build/mariadb-10.5.2/mysql-test/main/innodb_ext_key.reject	2020-03-09 22:49:21.007118070 -0400
@@ -656,7 +656,7 @@
 select * from t0, part ignore index (primary)
 where p_partkey=t0.a and p_size=1;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t0	ALL	NULL	NULL	NULL	NULL	5	Using where
+1	SIMPLE	t0	ALL	NULL	NULL	NULL	NULL	4	Using where
 1	SIMPLE	part	eq_ref	i_p_size	i_p_size	9	const,dbt3_s001.t0.a	1	
 select * from t0,  part ignore index (primary)
 where p_partkey=t0.a and p_size=1;

The test case step is as follows. While the table part has been initialized earlier, the problem appears to be related to the freshly initialized InnoDB table t0:

SET SESSION DEFAULT_STORAGE_ENGINE='InnoDB';
create table t0 (a int);
insert into t0 values (1), (2), (3), (4), (5);
create index i_p_size on part(p_size);
 
set optimizer_switch='extended_keys=on';
 
explain
select * from t0, part ignore index (primary)
  where p_partkey=t0.a and p_size=1;
 
select * from t0,  part ignore index (primary)
  where p_partkey=t0.a and p_size=1;
 
drop table t0;
drop index i_p_size on part; 

Comment by Michael Widenius [ 2020-03-19 ]

I got the same thing in my 10.5 tree several times.
I just noticed that all late failures was on 32 bit and I got this on 2 of 3 builds. Could that explains something?

Comment by Marko Mäkelä [ 2020-03-27 ]

monty, I see that you disabled the test main.partition_innodb in 10.5. If you can repeat the failure relatively easily, maybe you could produce a rr record trace for deterministically repeating the failure? With rr replay, it would be trivial to find the cause.

I compiled a -m32 -march=i686 executable, but I was not able to repeat the failure with 6,000 runs of my reduced test, which is at the start of my first comment. I tried both Debug and RelWithDebInfo.

Comment by Alice Sherepa [ 2020-09-25 ]

http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos73-ppc64le/builds/11575/steps/mtr/logs/stdio

main.partition_innodb 'innodb'           w1 [ fail ]
        Test ended at 2020-09-24 14:29:47
 
CURRENT_TEST: main.partition_innodb
--- /usr/share/mysql-test/main/partition_innodb.result	2020-09-24 09:09:48.000000000 +0000
+++ /dev/shm/var/1/log/partition_innodb.reject	2020-09-24 14:29:47.370151994 +0000
@@ -214,10 +214,10 @@
 1	SIMPLE	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Impossible WHERE noticed after reading const tables
 EXPLAIN PARTITIONS SELECT * FROM t1 WHERE a = 89;
 id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t1	p90	ALL	NULL	NULL	NULL	NULL	3	Using where
+1	SIMPLE	t1	p90	ALL	NULL	NULL	NULL	NULL	2	Using where
 EXPLAIN PARTITIONS SELECT * FROM t1 WHERE a >= 89;
 id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	Extra
-1	SIMPLE	t1	p90	ALL	NULL	NULL	NULL	NULL	3	Using where
+1	SIMPLE	t1	p90	ALL	NULL	NULL	NULL	NULL	2	Using where
 EXPLAIN PARTITIONS SELECT * FROM t1 WHERE a > 89;
 id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	Extra
 1	SIMPLE	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Impossible WHERE noticed after reading const tables
 
mysqltest: Result content mismatch

Comment by Elena Stepanova [ 2021-02-16 ]

To continue the tradition:
http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/26683

10.5 30379d906

main.stat_tables_innodb 'innodb'         w4 [ fail ]
        Test ended at 2021-01-27 11:33:21
 
CURRENT_TEST: main.stat_tables_innodb
--- /mnt/buildbot/build/mariadb-10.5.9/mysql-test/main/stat_tables_innodb.result	2021-01-27 07:20:50.000000000 -0500
+++ /mnt/buildbot/build/mariadb-10.5.9/mysql-test/main/stat_tables_innodb.reject	2021-01-27 11:33:21.245423235 -0500
@@ -883,7 +883,7 @@
 analyze
 select * from t1 where a = 1 and b=3;
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	r_rows	filtered	r_filtered	Extra
-1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	10	10.00	100.00	10.00	Using where
+1	SIMPLE	t1	ALL	NULL	NULL	NULL	NULL	1	10.00	100.00	10.00	Using where
 #
 # with use_stat_tables= PREFERABLY_FOR_QUERIES
 # analyze table t1 will  collect statistics if we use PERSISTENT
 
mysqltest: Result length mismatch

Comment by Marko Mäkelä [ 2023-10-11 ]

Still happens:

10.4 3f1a256234138aa07e755f84df1727b9d73bee1f

main.partition_innodb 'innodb'           w5 [ fail ]
        Test ended at 2023-10-11 13:17:47
CURRENT_TEST: main.partition_innodb
--- /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/main/partition_innodb.result	2023-10-11 13:04:03.000000000 +0000
+++ /home/buildbot/amd64-ubuntu-2004-debug/build/mysql-test/main/partition_innodb.reject	2023-10-11 13:17:45.205451204 +0000
@@ -406,7 +406,7 @@
 insert into t1 values (0), (1), (2), (3);
 show table status;
 Name	Engine	Version	Row_format	Rows	Avg_row_length	Data_length	Max_data_length	Index_length	Data_free	Auto_increment	Create_time	Update_time	Check_time	Collation	Checksum	Create_options	Comment	Max_index_length	Temporary
-t1	InnoDB	10	Dynamic	4	4096	16384	0	0	0	NULL	Create_time	Update_time	NULL	latin1_swedish_ci	NULL	partitioned		0	N
+t1	InnoDB	10	Dynamic	3	5461	16384	0	0	0	NULL	Create_time	Update_time	NULL	latin1_swedish_ci	NULL	partitioned		0	N
 drop table t1;
 create table t1 (a int auto_increment primary key)
 engine = innodb

I did not find recent failures of this test on 10.5 or later. A potential explanation could be that a questionable change to disable InnoDB persistent statistics in all tests (MDEV-4750) was reverted by me in MariaDB Server 10.6.5. Unfortunately, a more likely explanation might be that the test was disabled on 10.5, as I noted 3½ years ago.

Comment by Kristian Nielsen [ 2023-11-21 ]

I was able to reproduce in --rr after ~1000 iterations:

(cd mysql-test/ && ./mtr --mem --parallel=12 main.partition_innodb{,,}{,,,} --repeat=250 --force-restart -rr)
 
main.partition_innodb 'innodb'           w8 [ 80 fail ]
        Test ended at 2023-11-21 06:17:41
 
CURRENT_TEST: main.partition_innodb
--- /kvm/src/my/sn-mdev30255/mysql-test/main/partition_innodb.result	2023-11-07 16:07:55.764626348 +0100
+++ /kvm/src/my/sn-mdev30255/mysql-test/main/partition_innodb.reject	2023-11-21 06:17:40.771991331 +0100
@@ -406,7 +406,7 @@
 insert into t1 values (0), (1), (2), (3);
 show table status;
 Name	Engine	Version	Row_format	Rows	Avg_row_length	Data_length	Max_data_length	Index_length	Data_free	Auto_increment	Create_time	Update_time	Check_time	Collation	Checksum	Create_options	Comment	Max_index_length	Temporary
-t1	InnoDB	10	Dynamic	4	4096	16384	0	0	0	NULL	Create_time	Update_time	NULL	latin1_swedish_ci	NULL	partitioned		0	N
+t1	InnoDB	10	Dynamic	5	3276	16384	0	0	0	NULL	Create_time	Update_time	NULL	latin1_swedish_ci	NULL	partitioned		0	N
 drop table t1;
 create table t1 (a int auto_increment primary key)
 engine = innodb

The "Rows" column is wrong because it is being updated in parallel by both row_insert_for_mysql() and dict_stats_update_persistent(). This comment in row_insert_for_mysql() seems relevant:

	/* Not protected by dict_sys.mutex for performance
	reasons, we would rather get garbage in stat_n_rows (which is
	just an estimate anyway) than protecting the following code
	with a latch. */
	dict_table_n_rows_inc(table);

Here are the stacktraces when the stat_n_rows is updated. There are 4 updates from row_insert_for_mysql() and one from dict_stats_update_persistent() (which was update number 3 in my run):

Thread 3 hit Hardware watchpoint 3: -location table->stat_n_rows
 
Old value = 1
New value = 2
dict_table_n_rows_inc (table=0x7f7190010e50) at /kvm/src/my/sn-mdev30255/storage/innobase/include/dict0dict.inl:339
339	}
#0  dict_table_n_rows_inc (table=0x7f7190010e50) at /kvm/src/my/sn-mdev30255/storage/innobase/include/dict0dict.inl:339
#1  0x000055d5cc5b18d8 in row_insert_for_mysql (mysql_rec=0x7f71841f94d8 "\375\001", prebuilt=0x7f71841c4c20, ins_mode=ROW_INS_NORMAL) at /kvm/src/my/sn-mdev30255/storage/innobase/row/row0mysql.cc:1489
#2  0x000055d5cc40a3b8 in ha_innobase::write_row (this=0x7f71841ec530, record=0x7f71841f94d8 "\375\001") at /kvm/src/my/sn-mdev30255/storage/innobase/handler/ha_innodb.cc:8171
#3  0x000055d5cc0d7bf0 in handler::ha_write_row (this=0x7f71841ec530, buf=0x7f71841f94d8 "\375\001") at /kvm/src/my/sn-mdev30255/sql/handler.cc:6860
#4  0x000055d5cc3d26af in ha_partition::write_row (this=0x7f71841f04d0, buf=0x7f71841f94d8 "\375\001") at /kvm/src/my/sn-mdev30255/sql/ha_partition.cc:4423
#5  0x000055d5cc0d7bf0 in handler::ha_write_row (this=0x7f71841f04d0, buf=0x7f71841f94d8 "\375\001") at /kvm/src/my/sn-mdev30255/sql/handler.cc:6860
#6  0x000055d5cbd52664 in write_record (thd=0x7f718400a798, table=0x7f71841ef6b8, info=0x7f7169bcd930) at /kvm/src/my/sn-mdev30255/sql/sql_insert.cc:2093
#7  0x000055d5cbd4f41c in mysql_insert (thd=0x7f718400a798, table_list=0x7f718401cb48, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false) at /kvm/src/my/sn-mdev30255/sql/sql_insert.cc:1086
#8  0x000055d5cbd99caf in mysql_execute_command (thd=0x7f718400a798) at /kvm/src/my/sn-mdev30255/sql/sql_parse.cc:4615
#9  0x000055d5cbda3234 in mysql_parse (thd=0x7f718400a798, rawbuf=0x7f718401ca50 "insert into t1 values (0), (1), (2), (3)", length=40, parser_state=0x7f7169bce3b0, is_com_multi=false, is_next_command=false) at /kvm/src/my/sn-mdev30255/sql/sql_parse.cc:8014
#10 0x000055d5cbd9273b in dispatch_command (command=COM_QUERY, thd=0x7f718400a798, packet=0x7f7184013669 "insert into t1 values (0), (1), (2), (3)", packet_length=40, is_com_multi=false, is_next_command=false) at /kvm/src/my/sn-mdev30255/sql/sql_parse.cc:1857
#11 0x000055d5cbd91509 in do_command (thd=0x7f718400a798) at /kvm/src/my/sn-mdev30255/sql/sql_parse.cc:1378
#12 0x000055d5cbf1ece2 in do_handle_one_connection (connect=0x55d5cfe45578) at /kvm/src/my/sn-mdev30255/sql/sql_connect.cc:1419
#13 0x000055d5cbf1ea51 in handle_one_connection (arg=0x55d5cfe45578) at /kvm/src/my/sn-mdev30255/sql/sql_connect.cc:1323
#14 0x00007f719c15fea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f719c07da2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) 
Continuing.
[Switching to Thread 2049860.2049879]
 
Thread 15 hit Hardware watchpoint 3: -location table->stat_n_rows
 
Old value = 2
New value = 3
dict_stats_update_persistent (table=0x7f7190010e50) at /kvm/src/my/sn-mdev30255/storage/innobase/dict/dict0stats.cc:2268
2268		table->stat_clustered_index_size = index->stat_index_size;
#0  dict_stats_update_persistent (table=0x7f7190010e50) at /kvm/src/my/sn-mdev30255/storage/innobase/dict/dict0stats.cc:2268
#1  0x000055d5cc792a55 in dict_stats_update (table=0x7f7190010e50, stats_upd_option=DICT_STATS_RECALC_PERSISTENT) at /kvm/src/my/sn-mdev30255/storage/innobase/dict/dict0stats.cc:3228
#2  0x000055d5cc796f35 in dict_stats_process_entry_from_recalc_pool () at /kvm/src/my/sn-mdev30255/storage/innobase/dict/dict0stats_bg.cc:431
#3  0x000055d5cc797140 in dict_stats_thread () at /kvm/src/my/sn-mdev30255/storage/innobase/dict/dict0stats_bg.cc:515
#4  0x00007f719c15fea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007f719c07da2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

So it seems clear enough, a race between stats update in INSERT and in the background stats update.

It looks like we should simply put --replace_column 5 Rows in there? This seems to be testing crashes in SHOW TABLE STATUS, not the update of optimiser statistics.

But alternatively, marko, is there a way to disable the background collection of table stats while this part of the test is running?

- Kristian.

Comment by Marko Mäkelä [ 2023-11-21 ]

knielsen, thank you for your update.

There is the global parameter innodb_stats_auto_recalc and the table attribute stats_auto_recalc. In 10.6, the change that had been applied to tests in 10.0 in MDEV-4750 was essentially reverted. Your observation might be unrelated to the test failure in 10.4 or 10.5. Related to this and MDEV-32050 I am just now working on one more fixup, to ensure that all tests that make use of wait_all_purged.inc will disable the InnoDB persistent statistics, to prevent interference between InnoDB transaction commits from dict_stats_save() and the wait for purge of committed transaction history.

When the InnoDB persistent statistics are not in use, so called "transient statistics" will be used instead. The collection algorithm is less accurate on purpose, because the collection would be invoked more often. The failures in 10.4 and 10.5 could be related to that intentional inaccuracy as well.

Comment by Marko Mäkelä [ 2023-11-21 ]

For what it is worth, my attempts at reproducing a failure of main.partition_innodb on 10.4 failed so far, both with and without rr. knielsen, if you are able to reproduce this failure on 10.4 or 10.5, under rr, I’d love to see the analysis of it.

Theoretically, the number of rows could be so much off that a different query plan would be chosen. If we mask the number of rows in the result, then those would become even harder to analyze.

Comment by Kristian Nielsen [ 2023-11-21 ]

marko, my previous comment explains how I reproduced. I have a saved --rr run on my laptop, this is 10.4 (d415f600cde322288742596e870685a03ebc2ecf). I can provide it to you, if you can use it somehow (Debian bullseye). I can also do some analysis if needed.

But it really comes down to the second stacktrace, which is happening in the middle of the insert. It shows how the dict_stats_update_persistent() function in a separate thread is updating stat_n_rows from 2 to 3. Since the insert of 4 rows is causing the value to be incremented 4 times, we end up with the number 5, instead of 4 as expected by the test.

Comment by Kristian Nielsen [ 2023-11-21 ]

Or even simpler, apply the below patch. With this I was able to reproduce more or less reliably, this time in 10.5:

./mtr main.partition_innodb --enable-disabled --rr

Note that there will be result difference in 10.5 with deprecation warning on innodb_thread_concurrency, but I also got a row count difference each time I ran it:

@@ -421,7 +427,7 @@
 insert into t1 values (NULL), (NULL), (NULL), (NULL);
 show table status;
 Name	Engine	Version	Row_format	Rows	Avg_row_length	Data_length	Max_data_length	Index_length	Data_free	Auto_increment	Create_time	Update_time	Check_time	Collation	Checksum	Create_options	Comment	Max_index_length	Temporary
-t1	InnoDB	10	Dynamic	8	2048	16384	0	0	0	9	Create_time	Update_time	NULL	latin1_swedish_ci	NULL	partitioned		0	N
+t1	InnoDB	10	Dynamic	9	1820	16384	0	0	0	9	Create_time	Update_time	NULL	latin1_swedish_ci	NULL	partitioned		0	N
 drop table t1;
 create table t1 (a int)
 partition by key (a)

diff --git a/storage/innobase/dict/dict0stats.cc b/storage/innobase/dict/dict0stats.cc
index ed8f9561a90..af93f6eabea 100644
--- a/storage/innobase/dict/dict0stats.cc
+++ b/storage/innobase/dict/dict0stats.cc
@@ -3191,6 +3191,7 @@ dict_stats_update(
 {
        ut_ad(!mutex_own(&dict_sys.mutex));
 
+my_sleep(25000);
        if (!table->is_readable()) {
                return (dict_stats_report_error(table));
        } else if (srv_force_recovery > SRV_FORCE_NO_IBUF_MERGE) {
diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index 4da48c401f3..6c360651bd8 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -1481,6 +1481,7 @@ row_insert_for_mysql(
                srv_stats.n_rows_inserted.inc(size_t(trx->id));
        }
 
+my_sleep(10000);
        /* Not protected by dict_sys.mutex for performance
        reasons, we would rather get garbage in stat_n_rows (which is
        just an estimate anyway) than protecting the following code

If we don't want to mask the number of rows, then if we can disable the background statistics calculations around the SHOW TABLE STATUS it would be a good way to fix the random test failures.

Comment by Marko Mäkelä [ 2023-11-22 ]

knielsen, thank you. I tried reproducing this on a different system today, with no luck until I added your suggested sleep to row_insert_for_mysql(). I got an rr replay trace for a wrong record count of the last partition of the following:

--echo #
--echo # Bug#51830: Incorrect partition pruning on range partition (regression)
--echo #
CREATE TABLE t1 (a INT NOT NULL)
ENGINE = InnoDB
PARTITION BY RANGE(a)
(PARTITION p10 VALUES LESS THAN (10),
 PARTITION p30 VALUES LESS THAN (30),
 PARTITION p50 VALUES LESS THAN (50),
 PARTITION p70 VALUES LESS THAN (70),
 PARTITION p90 VALUES LESS THAN (90));
INSERT INTO t1 VALUES (10),(30),(50);
INSERT INTO t1 VALUES (70);
INSERT INTO t1 VALUES (80);
INSERT INTO t1 VALUES (89);

At the time the INSERT INTO t1 VALUES (89) is sleeping, we have the following call on the same partition test/t1#P#p90:

10.4 f5fdb9cec52e415628c2b8a353641e425d720695

Thread 15 (Thread 254542.254694 (mysqld)):
#0  dict_stats_update (table=table@entry=0x62e2141b45e0, stats_upd_option=stats_upd_option@entry=DICT_STATS_RECALC_PERSISTENT) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:3190
#1  0x0000562aa9295c45 in dict_stats_process_entry_from_recalc_pool () at /mariadb/10.4/storage/innobase/dict/dict0stats_bg.cc:431
#2  0x0000562aa9295c91 in dict_stats_thread () at /mariadb/10.4/storage/innobase/dict/dict0stats_bg.cc:515
#3  0x0000057e7ffcb3ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#4  0x0000057e8004b970 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

This thread happens to assign the correct value 3 to table->n_stat_rows (even though due to MDEV-4750 I was expecting this to use the "transient" statistics):

10.4 f5fdb9cec52e415628c2b8a353641e425d720695

0x0000562aa929330d in dict_stats_update_persistent (table=table@entry=0x62e2141b45e0) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:2266
2266		table->stat_n_rows = index->stat_n_diff_key_vals[n_unique - 1];

This count was correctly calculated, because the index consists of a single page:

10.4 f5fdb9cec52e415628c2b8a353641e425d720695

Thread 15 hit Hardware watchpoint 5: -location *index->stat_n_diff_key_vals
 
Old value = 2
New value = 3
dict_stats_analyze_index_level (index=index@entry=0x62e2141bd7d0, level=level@entry=0, n_diff=0x62e2141f10f8, total_recs=total_recs@entry=0x430c67b5b548, total_pages=total_pages@entry=0x430c67b5b550, 
    n_diff_boundaries=n_diff_boundaries@entry=0x0, mtr=0x430c67b5b560) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:1190
1190				for (i = matched_fields; i < n_uniq; i++) {
(rr) bt
#0  dict_stats_analyze_index_level (index=index@entry=0x62e2141bd7d0, level=level@entry=0, n_diff=0x62e2141f10f8, total_recs=total_recs@entry=0x430c67b5b548, total_pages=total_pages@entry=0x430c67b5b550, 
    n_diff_boundaries=n_diff_boundaries@entry=0x0, mtr=0x430c67b5b560) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:1190
#1  0x0000562aa92928a1 in dict_stats_analyze_index (index=index@entry=0x62e2141bd7d0) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:1998
#2  0x0000562aa92931e2 in dict_stats_update_persistent (table=table@entry=0x62e2141b45e0) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:2253
#3  0x0000562aa9293f67 in dict_stats_update (table=table@entry=0x62e2141b45e0, stats_upd_option=stats_upd_option@entry=DICT_STATS_RECALC_PERSISTENT) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:3228
#4  0x0000562aa9295c45 in dict_stats_process_entry_from_recalc_pool () at /mariadb/10.4/storage/innobase/dict/dict0stats_bg.cc:431
#5  0x0000562aa9295c91 in dict_stats_thread () at /mariadb/10.4/storage/innobase/dict/dict0stats_bg.cc:515
#6  0x0000057e7ffcb3ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#7  0x0000057e8004b970 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

A little later, row_insert_for_mysql() would increase the table->stat_n_rows from 3 to 4. The count would remain at that until DROP TABLE t1.

I don’t think that there is an easy fix for this. We might want to implement a special case for a table that was created while the server process is running, and was never evicted from the dict_sys cache. In those cases, we could in theory guarantee that dict_table_t::stat_n_rows matches the READ UNCOMMITTED number of rows in the table. But there are also other statistics that would not be that feasible to keep up to date on each operation.

The most feasible fix could indeed be to just mask the number of rows in the affected tests. psergei, how much do you think the number of records could deviate in the tests main.partition_innodb, main.innodb_ext_key, or main.stat_tables before a different query plan could be chosen? If all these tests are only using INSERT (no UPDATE or DELETE) and a single index page per table or partition, I think that the worst case should be that the statistics are off by one row.

Comment by Sergei Petrunia [ 2023-11-24 ]

Sergei Petrunia, how much do you think the number of records could deviate in the tests main.partition_innodb, main.innodb_ext_key, or main.stat_tables before a different query plan could be chosen?

No idea. There are test cases there that have secondary indexes and only a few rows. This means the difference between query plan costs could be very small.

Comment by Sergei Petrunia [ 2023-11-24 ]

marko, If we just do ANALYZE TABLE before doing any SELECTs on the table, will this guarantee that the #rows after it is stable?

(IIRC the regular approach of "use #rows_in_table from EITS" is not possible here as EITS estimates won't be used for partitioned tables. There's no per-partition EITS statistics and partition pruning can make whole-table estimates very bad)

Comment by Marko Mäkelä [ 2023-11-24 ]

psergei, yes, ANALYZE TABLE should produce stable results, ever since MDEV-10682 was fixed. Can you revise the affected tests accordingly?

Comment by Kristian Nielsen [ 2023-11-28 ]

Pushed to 10.4. But the test should be re-enabled in 10.5 (there's some .result update needed IIRC) before closing this bug.

Comment by Sergei Petrunia [ 2023-12-04 ]

Enabled the test in 10.5:

commit b97f4c340ead53c42ec674da19b420e57bf40ddc
Author: Sergei Petrunia <sergey@mariadb.com>
Date:   Mon Dec 4 20:34:45 2023 +0300
 
    Followup for fix for MDEV-20169: enable main.partition_innodb

Generated at Thu Feb 08 08:57:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.