[MDEV-15656] Assertion `is_last_prefix <= 0' failed in QUICK_GROUP_MIN_MAX_SELECT::get_next Created: 2018-03-25  Updated: 2023-12-11  Resolved: 2023-12-07

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Oleg Smirnov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MDEV-29243 Assertion `is_last_prefix <= 0' faile... Closed
Relates
relates to MDEV-27601 Assertion `is_last_prefix <= 0' fail... Confirmed

 Description   

Note: the test case contains a little bit of everything – versioning, partitioning, views, InnoDB, concurrency, I got rid of whatever I could, the rest seems to play some role, even if only creating the right conditions for race.
Note: the test case is not deterministic, don't add it to the regression suite, create a better one when the reason of the problem is clear.

--source include/have_innodb.inc
--source include/have_partition.inc
 
# Restoring default values
SET @lru_depth.save= @@innodb_lru_scan_depth, @stats.save= @@innodb_stats_persistent;
SET GLOBAL innodb_lru_scan_depth= 1024, innodb_stats_persistent= ON;
 
CREATE OR REPLACE TABLE t1 (pk INT AUTO_INCREMENT, a VARCHAR(4), PRIMARY KEY (pk)) ENGINE=InnoDB 
WITH SYSTEM VERSIONING PARTITION BY key (pk) PARTITIONS 2;
 
CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1;
INSERT INTO t1 (a) VALUES  
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo');
 
--connect (con1,localhost,root,,test)
 
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
UPDATE t1 SET a = 'qux';
 
--let $run= 100
--disable_result_log
while ($run)
{
  --send
    SELECT DISTINCT pk FROM v1;
  --connection default
  INSERT INTO v1 (pk) VALUES (NULL);
  --connection con1
  --reap
  --dec $run
  --echo # $run trials left
}
--enable_result_log
 
# Cleanup
DROP VIEW v1;
DROP TABLE t1;
SET GLOBAL innodb_lru_scan_depth= @lur_depth.save, innodb_stats_persistent= @stats.save;

10.3 ad647cc84

mysqld: /data/src/10.3/sql/opt_range.cc:14009: virtual int QUICK_GROUP_MIN_MAX_SELECT::get_next(): Assertion `is_last_prefix <= 0' failed.
180327  3:34:17 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f5a793c8ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055a6f2c37025 in QUICK_GROUP_MIN_MAX_SELECT::get_next (this=0x7f5a24023e20) at /data/src/10.3/sql/opt_range.cc:14009
#9  0x000055a6f2c424e8 in rr_quick (info=0x7f5a2401f080) at /data/src/10.3/sql/records.cc:366
#10 0x000055a6f274499b in READ_RECORD::read_record (this=0x7f5a2401f080) at /data/src/10.3/sql/records.h:73
#11 0x000055a6f284bc43 in sub_select (join=0x7f5a24016af8, join_tab=0x7f5a2401efb8, end_of_records=false) at /data/src/10.3/sql/sql_select.cc:19251
#12 0x000055a6f284b0ff in do_select (join=0x7f5a24016af8, procedure=0x0) at /data/src/10.3/sql/sql_select.cc:18772
#13 0x000055a6f28241b0 in JOIN::exec_inner (this=0x7f5a24016af8) at /data/src/10.3/sql/sql_select.cc:3987
#14 0x000055a6f2823648 in JOIN::exec (this=0x7f5a24016af8) at /data/src/10.3/sql/sql_select.cc:3781
#15 0x000055a6f2824889 in mysql_select (thd=0x7f5a24000b00, tables=0x7f5a24013b98, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748609, result=0x7f5a24016ad8, unit=0x7f5a24004938, select_lex=0x7f5a240050b0) at /data/src/10.3/sql/sql_select.cc:4186
#16 0x000055a6f2816b67 in handle_select (thd=0x7f5a24000b00, lex=0x7f5a24004870, result=0x7f5a24016ad8, setup_tables_done_option=0) at /data/src/10.3/sql/sql_select.cc:382
#17 0x000055a6f27e1d31 in execute_sqlcom_select (thd=0x7f5a24000b00, all_tables=0x7f5a24013b98) at /data/src/10.3/sql/sql_parse.cc:6552
#18 0x000055a6f27d834b in mysql_execute_command (thd=0x7f5a24000b00) at /data/src/10.3/sql/sql_parse.cc:3763
#19 0x000055a6f27e572a in mysql_parse (thd=0x7f5a24000b00, rawbuf=0x7f5a24013988 "SELECT DISTINCT pk FROM v1", length=26, parser_state=0x7f5a6e6a05d0, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:8009
#20 0x000055a6f27d2dde in dispatch_command (command=COM_QUERY, thd=0x7f5a24000b00, packet=0x7f5a2400b251 "", packet_length=26, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1838
#21 0x000055a6f27d181d in do_command (thd=0x7f5a24000b00) at /data/src/10.3/sql/sql_parse.cc:1383
#22 0x000055a6f2935c85 in do_handle_one_connection (connect=0x55a6f52a3530) at /data/src/10.3/sql/sql_connect.cc:1402
#23 0x000055a6f2935a12 in handle_one_connection (arg=0x55a6f52a3530) at /data/src/10.3/sql/sql_connect.cc:1308
#24 0x000055a6f2dbd029 in pfs_spawn_thread (arg=0x55a6f52e5630) at /data/src/10.3/storage/perfschema/pfs.cc:1862
#25 0x00007f5a7b09f494 in start_thread (arg=0x7f5a6e6a1700) at pthread_create.c:333
#26 0x00007f5a7948593f in clone () from /lib/x86_64-linux-gnu/libc.so.6

No visible effect on non-debug build.
Couldn't check earlier versions because of versioning.



 Comments   
Comment by Eugene Kosov (Inactive) [ 2018-03-30 ]

The same crash without partitions, system versioning and views.

--source include/have_innodb.inc
 
# Restoring default values
SET @lru_depth.save= @@innodb_lru_scan_depth, @stats.save= @@innodb_stats_persistent;
SET GLOBAL innodb_lru_scan_depth= 1024, innodb_stats_persistent= ON;
 
CREATE TABLE t1 (
  pk INT AUTO_INCREMENT,
  a VARCHAR(4),
  row_start timestamp(6) default current_timestamp,
  PRIMARY KEY (pk, row_start)
) ENGINE=InnoDB;
 
INSERT INTO t1 (a) VALUES  
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo'),
('foo'),('bar'),('foo'),('bar'),('foo');
 
--connect (con1,localhost,root,,test)
 
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
 
--let $run= 100
--disable_result_log
while ($run)
{
 --send
    SELECT DISTINCT pk FROM t1;
 --connection default
  INSERT INTO t1 (pk) VALUES (NULL);
 --connection con1
 --reap
  --dec $run
  --echo # $run trials left
}
--enable_result_log
 
# Cleanup
--disconnect con1
--connection default
DROP TABLE t1;
select @lru_depth.save;
SET GLOBAL innodb_lru_scan_depth= @lru_depth.save, innodb_stats_persistent= @stats.save;

Comment by Elena Stepanova [ 2018-11-18 ]

With kevg's test case above, reproducible on all 10.x, I've updated affected and fix versions accordingly. Run with --repeat=N if it doesn't fail right away.

Comment by Ramesh Sivaraman [ 2021-03-26 ]

Server debug build is crashing with similar assertion when we set group_concat_max_len to 0 and run INSERT ... SELECT DISTINCT statement with debug_dbug sync point.

SET SESSION group_concat_max_len=0;
CREATE TABLE t0 (i INT,KEY USING BTREE (i)) ENGINE=InnoDB;
SET debug_dbug='+d,kill_join_init_read_record';
INSERT INTO t0 VALUES(0xA0C0);
INSERT INTO t0 SELECT DISTINCT i FROM t0;

Leads to:

10.6.0 8dd35a2507f8d63ca8df9335d2c6072d5c0e3b86 (Debug)

Core was generated by `/test/MD160321-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
    at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x1521b57b1700 (LWP 2310674))]
(gdb) bt
#0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x000055dbaed17d0b in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
#2  0x000055dbae4af313 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:331
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00001521cc619859 in __GI_abort () at abort.c:79
#6  0x00001521cc619729 in __assert_fail_base (fmt=0x1521cc7af588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55dbaf04590d "is_last_prefix <= 0", file=0x55dbaf042560 "/test/10.6_dbg/sql/opt_range.cc", line=15281, function=<optimized out>) at assert.c:92
#7  0x00001521cc62af36 in __GI___assert_fail (assertion=assertion@entry=0x55dbaf04590d "is_last_prefix <= 0", file=file@entry=0x55dbaf042560 "/test/10.6_dbg/sql/opt_range.cc", line=line@entry=15281, function=function@entry=0x55dbaf044640 "virtual int QUICK_GROUP_MIN_MAX_SELECT::get_next()") at assert.c:101
#8  0x000055dbae68039c in QUICK_GROUP_MIN_MAX_SELECT::get_next (this=0x15217c070c30) at /test/10.6_dbg/sql/opt_range.cc:15281
#9  0x000055dbae68a4ae in rr_quick (info=0x15217c016e80) at /test/10.6_dbg/sql/records.cc:403
#10 0x000055dbae252e70 in READ_RECORD::read_record (this=0x15217c016e80) at /test/10.6_dbg/sql/records.h:81
#11 join_init_read_record (tab=0x15217c016db8) at /test/10.6_dbg/sql/sql_select.cc:21727
#12 0x000055dbae239962 in sub_select (join=0x15217c0159b8, join_tab=0x15217c016db8, end_of_records=<optimized out>) at /test/10.6_dbg/sql/sql_select.cc:20749
#13 0x000055dbae27208f in do_select (procedure=0x0, join=0x15217c0159b8) at /test/10.6_dbg/sql/sql_select.cc:20299
#14 JOIN::exec_inner (this=this@entry=0x15217c0159b8) at /test/10.6_dbg/sql/sql_select.cc:4477
#15 0x000055dbae272540 in JOIN::exec (this=this@entry=0x15217c0159b8) at /test/10.6_dbg/sql/sql_select.cc:4257
#16 0x000055dbae270619 in mysql_select (thd=thd@entry=0x15217c000db8, tables=0x15217c014988, fields=@0x15217c0144c0: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x15217c014940, last = 0x15217c014940, elements = 1}, <No data fields>}, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2202244745985, result=0x15217c015900, unit=0x15217c004f90, select_lex=0x15217c014370) at /test/10.6_dbg/sql/sql_select.cc:4733
#17 0x000055dbae2708df in handle_select (thd=thd@entry=0x15217c000db8, lex=lex@entry=0x15217c004ec8, result=result@entry=0x15217c015900, setup_tables_done_option=setup_tables_done_option@entry=1073741824) at /test/10.6_dbg/sql/sql_select.cc:417
#18 0x000055dbae1f415c in mysql_execute_command (thd=thd@entry=0x15217c000db8) at /test/10.6_dbg/sql/sql_parse.cc:4691
#19 0x000055dbae1de876 in mysql_parse (thd=thd@entry=0x15217c000db8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x1521b57b0410) at /test/10.6_dbg/sql/sql_parse.cc:7998
#20 0x000055dbae1ed1e7 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x15217c000db8, packet=packet@entry=0x15217c00b359 "INSERT INTO t0 SELECT DISTINCT i FROM t0", packet_length=packet_length@entry=40, blocking=blocking@entry=true) at /test/10.6_dbg/sql/sql_class.h:1318
#21 0x000055dbae1f05c1 in do_command (thd=0x15217c000db8, blocking=blocking@entry=true) at /test/10.6_dbg/sql/sql_parse.cc:1397
#22 0x000055dbae348178 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55dbb0e39218, put_in_cache=put_in_cache@entry=true) at /test/10.6_dbg/sql/sql_connect.cc:1410
#23 0x000055dbae34877d in handle_one_connection (arg=arg@entry=0x55dbb0e39218) at /test/10.6_dbg/sql/sql_connect.cc:1312
#24 0x000055dbae7f3a5b in pfs_spawn_thread (arg=0x55dbb0d5da58) at /test/10.6_dbg/storage/perfschema/pfs.cc:2201
#25 0x00001521ccb27609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#26 0x00001521cc716293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.5.10 (dbg), 10.6.0 (dbg)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.2.38 (dbg), 10.3.29 (dbg), 10.4.19 (dbg)

Comment by Alice Sherepa [ 2022-03-29 ]

reproducible on 10.4-10.8 as

./mtr main.ctype_utf8mb4_innodb  --mysqld=--default-storage-engine=innodb
 
At line 1844: query 'SELECT COUNT(DISTINCT a) FROM t1' failed: <Unknown> (2013): Lost connection to server during query
 
The result from queries just before the failure was:
< snip >
ALTER TABLE t1 MODIFY a VARCHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
SELECT id,HEX(a) FROM t1 ORDER BY a;
id	HEX(a)
1	61
2	C280
3	DFBF
4	E0A080
6	F0908080
7	F48FBFBF
5	EFBFBF
SELECT id,HEX(a) FROM t1 ORDER BY a DESC,id DESC;
id	HEX(a)
5	EFBFBF
7	F48FBFBF
6	F0908080
4	E0A080
3	DFBF
2	C280
1	61
SELECT COUNT(DISTINCT a) FROM t1;

Version: '10.8.3-MariaDB-debug-log' 
mariadbd:/10.8/src/sql/opt_range.cc:15436: virtual int QUICK_GROUP_MIN_MAX_SELECT::get_next(): Assertion `is_last_prefix <= 0' failed.
220329 17:27:37 [ERROR] mysqld got signal 6 ;
 
Server version: 10.8.3-MariaDB-debug-log
 
??:0(abort)[0x7f4bc0da2859]
/lib/x86_64-linux-gnu/libc.so.6(+0x22729)[0x7f4bc0da2729]
??:0(__assert_fail)[0x7f4bc0db4006]
sql/opt_range.cc:15440(QUICK_GROUP_MIN_MAX_SELECT::get_next())[0x55c36b86e6d8]
sql/records.cc:403(rr_quick(READ_RECORD*))[0x55c36b8b6b16]
sql/records.h:81(READ_RECORD::read_record())[0x55c36b882ff0]
sql/sql_select.cc:21114(sub_select(JOIN*, st_join_table*, bool))[0x55c36bd20790]
sql/sql_select.cc:20640(do_select(JOIN*, Procedure*))[0x55c36bd1e51a]
sql/sql_select.cc:4749(JOIN::exec_inner())[0x55c36bcaa844]
sql/sql_select.cc:4528(JOIN::exec())[0x55c36bca7d44]
sql/sql_select.cc:5009(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55c36bcac2b3]
sql/sql_select.cc:543(handle_select(THD*, LEX*, select_result*, unsigned long))[0x55c36bc7c56b]
sql/sql_parse.cc:6252(execute_sqlcom_select(THD*, TABLE_LIST*))[0x55c36bba2e7e]
sql/sql_parse.cc:3943(mysql_execute_command(THD*, bool))[0x55c36bb91a47]
sql/sql_parse.cc:8027(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55c36bbae11d]
sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x55c36bb84348]
sql/sql_parse.cc:1402(do_command(THD*, bool))[0x55c36bb8106c]
sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x55c36c028d8d]
sql/sql_connect.cc:1314(handle_one_connection)[0x55c36c028619]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x55c36ccf2244]
nptl/pthread_create.c:478(start_thread)[0x7f4bc12ce609]
??:0(clone)[0x7f4bc0e9f163]
 
Query (0x6290001092a8): SELECT COUNT(DISTINCT a) FROM t1

Comment by Elena Stepanova [ 2022-08-04 ]

A test case from MDEV-29243 may be worth checking after the fix.

Comment by Oleksandr Byelkin [ 2023-09-25 ]

I checked it after MDEV-29243 was fixed, still repeatables

Comment by Oleksandr Byelkin [ 2023-09-25 ]

It hit assert assert in range optimiser, maybe optimizer can esily to find the cause

Comment by Oleg Smirnov [ 2023-11-09 ]

1. Assertion DBUG_ASSERT(is_last_prefix <= 0) fails here during processing of "SELECT DISTINCT pk FROM t1":

QUICK_GROUP_MIN_MAX_SELECT::get_next()

    result= next_prefix();
    /*
      Check if this is the last group prefix. Notice that at this point
      this->record contains the current prefix in record format.
    */
    if (!result)
    {
      is_last_prefix= key_cmp(index_info->key_part, last_prefix,
                              group_prefix_len);
      DBUG_ASSERT(is_last_prefix <= 0);
    }

because is_last_prefix==1 which means the value of index_info->key_part > last_prefix.

2. last_value of the index is initialized at the start of the SELECT processing , at QUICK_GROUP_MIN_MAX_SELECT::reset(void) which is called from join_init_read_record().

3. After that QUICK_GROUP_MIN_MAX_SELECT::get_next() is called multiple times to iterate through all index values. The check [1] is performed on every retrieved value.

4. During the cycle [3] another thread inserts a new row into the table (INSERT INTO t1 (pk) VALUES (NULL)) with auto_increment field pk, so last_value determined at [2] becomes invalid. For example, initial last_value=42, then index scanning starts, then a new row inserted, so the actual last value of the index turns to 43. When the checkpoint [1] is reached, there is an assertion failure.

Comment by Oleg Smirnov [ 2023-11-09 ]

The assert [1] has been there from the start (see commit in the MySQL repository):

commit e2cd3dd1ce66d59eaea06e18f04a6fa5f1848684 (HEAD)
Author: timour@mysql.com <>
Date:   Fri Aug 27 16:37:13 2004 +0300
 
    WL#1724 "Min/Max Optimization for Queries with Group By Clause"
    - after-review changes
    - merged with the source tree from 204-08-27

Probably such a scenario when another thread updates the index was not expected by the author. I suggest the following fix:

diff --git a/sql/opt_range.cc b/sql/opt_range.cc
index 0049be0daf4..04e31b24e0b 100644
--- a/sql/opt_range.cc
+++ b/sql/opt_range.cc
@@ -15024,7 +15024,16 @@ int QUICK_GROUP_MIN_MAX_SELECT::get_next()
     {
       is_last_prefix= key_cmp(index_info->key_part, last_prefix,
                               group_prefix_len);
-      DBUG_ASSERT(is_last_prefix <= 0);
+      if (is_last_prefix == 1)
+      {
+         /*
+            Current key value is larger than last_value, this can happen if
+            another thread updated the index after QUICK_GROUP_MIN_MAX_SELECT
+            was initialized.
+         */
+        result= HA_ERR_END_OF_FILE;
+        break;
+      }
     }
     else 
     {

Comment by Sergei Petrunia [ 2023-11-09 ]

Agree with analysis, but I have questions to the patch.

Should we really return HA_ERR_END_OF_FILE in this case? What about assuming that the current prefix is the new last prefix and continuing processing?

(A side comment: it seems quite odd that SELECT DISTINCT pk FROM t1 is processed with loose scan... We do know that all PK values are distinct, and the statistics should tell us so. What is the point of trying to do Loose Scan's jumps in this case?)

Comment by Oleg Smirnov [ 2023-11-10 ]

I thought about continuing of the processing, in this case I think we don't need to store and check last_value at all, just loop until HA_ERR_KEY_NOT_FOUND or HA_ERR_END_OF_FILE. But this change would be rather intrusive, let me investigate this further.

About the side comment: the "pk" field name is misleading here 'cause this field is not a PRIMARY KEY actually. Actual PRIMARY KEY consists of two fields {pk, row_start}, that may be the reason for the selection of loose index scan.

Comment by Oleg Smirnov [ 2023-11-14 ]

psergei, can you please review bb-10.4-MDEV-15656?

Comment by Sergei Petrunia [ 2023-11-15 ]

oleg.smirnov, the patch looks good to me, although I'm still concerned if there was some hidden purpose in last_prefix. Let's request a testing pass for it.

Comment by Lena Startseva [ 2023-12-06 ]

Testing done. Ok to push

Comment by Oleg Smirnov [ 2023-12-07 ]

Pushed to 10.4

Generated at Thu Feb 08 08:23:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.