[MDEV-24847] Server crashes in Field_long::cmp()[field.cc:4438] when index merge is used Created: 2021-02-11  Updated: 2021-10-28  Resolved: 2021-07-02

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5.8
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Sergei Petrunia
Resolution: Cannot Reproduce Votes: 0
Labels: index_merge
Environment:

Windows



 Description   

Server crashes on Windows with the following stack trace:

Thread pointer: 0x20e647bafb8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
server.dll!Field_long::cmp()[field.cc:4438]
server.dll!ha_innobase::cmp_ref()[ha_innodb.cc:16831]
server.dll!_downheap()[queues.c:307]
server.dll!merge_buffers()[filesort.cc:1982]
server.dll!merge_many_buff()[filesort.cc:1636]
server.dll!Unique::merge()[uniques.cc:749]
server.dll!Unique::get()[uniques.cc:826]
server.dll!read_keys_and_merge_scans()[opt_range.cc:11847]
server.dll!QUICK_INDEX_INTERSECT_SELECT::read_keys_and_merge()[opt_range.cc:11920]
server.dll!join_init_read_record()[sql_select.cc:21547]
server.dll!JOIN_CACHE::join_matching_records()[sql_join_cache.cc:2260]
server.dll!JOIN_CACHE::join_records()[sql_join_cache.cc:2092]
server.dll!sub_select_cache()[sql_select.cc:20414]
server.dll!evaluate_join_record()[sql_select.cc:20843]
server.dll!sub_select()[sql_select.cc:20619]
server.dll!evaluate_join_record()[sql_select.cc:20843]
server.dll!sub_select()[sql_select.cc:20619]
server.dll!evaluate_join_record()[sql_select.cc:20843]
server.dll!sub_select()[sql_select.cc:20619]
server.dll!evaluate_join_record()[sql_select.cc:20843]
server.dll!sub_select()[sql_select.cc:20658]
server.dll!do_select()[sql_select.cc:20153]
server.dll!JOIN::exec_inner()[sql_select.cc:4459]
server.dll!JOIN::exec()[sql_select.cc:4241]
server.dll!mysql_select()[sql_select.cc:4657]
server.dll!handle_select()[sql_select.cc:417]
server.dll!execute_sqlcom_select()[sql_parse.cc:6266]
server.dll!mysql_execute_command()[sql_parse.cc:3968]
server.dll!mysql_parse()[sql_parse.cc:8048]
server.dll!dispatch_command()[sql_parse.cc:1875]
server.dll!do_command()[sql_parse.cc:1353]
server.dll!threadpool_process_request()[threadpool_common.cc:363]
server.dll!tp_callback()[threadpool_common.cc:194]
ntdll.dll!RtlReleaseSRWLockExclusive()
ntdll.dll!RtlReleaseSRWLockExclusive()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x20e647c8510): SELECT DISTINCTROW  ...

while executing a complex SELECT query with many tables and index_merge used in the plan. The crash happens here it seems:

   4433 int Field_long::cmp(const uchar *a_ptr, const uchar *b_ptr) const
   4434 {
   4435   int32 a,b;
   4436   a=sint4korr(a_ptr);
   4437   b=sint4korr(b_ptr);
   4438   if (unsigned_flag) -- < HERE
   4439     return ((uint32) a < (uint32) b) ? -1 : ((uint32) a > (uint32) b) ? 1 : 0;
   4440   return (a < b) ? -1 : (a > b) ? 1 : 0;
   4441 }

and Windows mindump reports this:

CONTEXT:  (.ecxr)
.ecxr
rax=00007ff9d90e6488 rbx=0000020e5d587ef0 rcx=0000020e5971cca0
rdx=4148214a00000000 rsi=4148214a00000000 rdi=0000020e6af3debc
rip=00007ff9d87bf6f7 rsp=00000076df8ad128 rbp=0000020e5d587f10
 r8=0000020e6af3debc  r9=0000000000192813 r10=0000020e6af3deb4
r11=0000020e6af87058 r12=0000000000000000 r13=0000000000000007
r14=0000020e5971cca0 r15=0000000000000003
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
server!Field_long::cmp+0x7:
00007ff9`d87bf6f7 448b0a          mov     r9d,dword ptr [rdx] ds:4148214a`00000000=????????
.cxr
Resetting default scope
 
FAULTING_IP: 
server!Field_long::cmp+7 [D:\winx64-packages\build\src\sql\field.cc @ 4438]
00007ff9`d87bf6f7 448b0a          mov     r9d,dword ptr [rdx]
 
EXCEPTION_RECORD:  (.exr -1)
.exr -1
ExceptionAddress: 00007ff9d87bf6f7 (server!Field_long::cmp+0x0000000000000007)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: ffffffffffffffff
Attempt to read from address ffffffffffffffff

not sure why.



 Comments   
Comment by Sergei Petrunia [ 2021-07-01 ]

After loading the minidump, I see this:

Unhandled exception thrown: read access violation.
a_ptr was 0xFFFFFFFFFFFFFFFF.

unfortunately there is not much else.

The rowids are in the InnoDB table, they are 8-byte integer fields.

This

+		t_file2	{pos_in_file=0 end_of_file=42 359 480 read_pos=0x0000020e66810028 <Error reading characters of string.> ...}	st_io_cache

seems to hint we're sorting 42M data which is ~5M rows..

Comment by Sergei Petrunia [ 2021-07-01 ]

It doesn't look like it's realistic to infer something from this info.

Comment by Sergei Petrunia [ 2021-07-02 ]

Tried reproducing, no success.
Please re-open when there's more data available. We'll need the contents of the table.

Another item we need to know: does the problem happen every time the query is ran or only sometimes?

Generated at Thu Feb 08 09:33:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.