[MDEV-30118] exception in ha_maria::extra Created: 2022-11-29  Updated: 2023-01-05  Resolved: 2023-01-05

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Aria
Affects Version/s: 10.5.11, 10.10.2
Fix Version/s: 10.11.2, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Jongbeom Park Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None
Environment:

apache-tomcat-8.0.50
java - 1.8
os : windows Server 2019 Standard
cpu : Xeon(R) Gold 6240 CPU 2.60GHz
ram : 32g


Attachments: PNG File image-2022-12-02-16-47-03-935.png     PNG File image-2022-12-05-17-12-47-616.png     PNG File image-2022-12-06-09-59-25-195.png     PNG File image-2022-12-06-09-59-29-911.png     PNG File screenshot-1.png    
Issue Links:
Relates
relates to MDEV-30162 Fix occasional "Permission denied" on... Closed
relates to MDEV-30117 MariaDB Service keeps shutting down w... Closed

 Description   

Even though I used version 10.5.11 previously and reinstalled it with 10.10.2 after the issue occurred, the same error occurs.
The error usually works normally, but when select query is executed, it shuts down after 3 seconds.
Additionally, SET STATEMENT max_statement_time=5000 FOR is automatically appended to queries remaining in the err file. Why is that?
Of course, there are also unattached queries.

[ERROR] mysqld got exception 0xc0000005 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.10.2-MariaDB
key_buffer_size=805306368
read_buffer_size=25165824
max_used_connections=37
max_threads=65537
thread_count=23
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5906501846 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x21f27b97ef8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
server.dll!ha_maria::extra()[ha_maria.cc:2682]
server.dll!JOIN::cleanup()[sql_select.cc:14901]
server.dll!JOIN::join_free()[sql_select.cc:14773]
server.dll!do_select()[sql_select.cc:21408]
server.dll!JOIN::exec_inner()[sql_select.cc:4823]
server.dll!mysql_select()[sql_select.cc:5081]
server.dll!handle_select()[sql_select.cc:581]
server.dll!execute_sqlcom_select()[sql_parse.cc:6261]
server.dll!mysql_execute_command()[sql_parse.cc:3945]
server.dll!mysql_parse()[sql_parse.cc:8025]
server.dll!dispatch_command()[sql_parse.cc:1896]
server.dll!do_command()[sql_parse.cc:1407]
server.dll!tp_callback()[threadpool_common.cc:245]
KERNEL32.DLL!LCMapStringEx()
ntdll.dll!RtlAddRefActivationContext()
ntdll.dll!RtlAcquireSRWLockExclusive()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x193f57092d0): SET STATEMENT max_statement_time=5000 FOR 
        SELECT ATTACH_GRP_NO    
             , ATTACH
             , MODULE
             , trim(CD_FILE_TYPE) AS CD_FILE_TYPE
             , NAME
             , CONCAT(LEFT(NAME, INSTR(NAME, SUBSTRING_INDEX(NAME, '_', -1))-2), '.', FORMAT) AS ORIGIN_NAME
             , PATH
             , FORMAT
             , FILE_SIZE
               , @ROWNUM := @ROWNUM +1 AS SEQ_DSP
             , MEMO
             , CNT_DOWNLOAD
             , DATE_C
             , DATE_E
             , THUMB_YN
          FROM SYS_ATTACH , (SELECT @ROWNUM := 0)RN
         WHERE CD_DEL = '1'
          AND ATTACH_GRP_NO = '1234'
        ORDER BY SEQ_DSP,DATE_C DESC
Connection ID (thread ID): 4362
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file at C:\Program Files\MariaDB 10.5\data\

When the query is executed, it works without problems.



 Comments   
Comment by Daniel Black [ 2022-11-29 ]

Thanks for the bug report. jbpark can you include SHOW CREATE TABLE SYS_ATTACH output here?

Comment by Jongbeom Park [ 2022-11-29 ]

SHOW CREATE TABLE SYS_ATTACH output :
CREATE TABLE IF NOT EXISTS `sys_attach` (
`ATTACH_GRP_NO` bigint(20) NOT NULL,
`ATTACH` bigint(20) NOT NULL,
`COMPANY_ID` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`MODULE` varchar(50) COLLATE utf8_bin DEFAULT NULL,
`CD_FILE_TYPE` char(2) COLLATE utf8_bin DEFAULT NULL,
`NAME` varchar(1000) COLLATE utf8_bin DEFAULT NULL,
`PATH` varchar(1000) COLLATE utf8_bin DEFAULT NULL,
`FORMAT` varchar(1000) COLLATE utf8_bin DEFAULT NULL,
`FILE_SIZE` bigint(20) DEFAULT NULL,
`SEQ_DSP` bigint(20) DEFAULT NULL,
`MEMO` varchar(2000) COLLATE utf8_bin DEFAULT NULL,
`CNT_DOWNLOAD` bigint(20) DEFAULT NULL,
`DATE_C` datetime DEFAULT NULL,
`DATE_E` datetime DEFAULT NULL,
`CD_DEL` char(2) COLLATE utf8_bin DEFAULT NULL,
`ORIGIN_NAME` varchar(1000) COLLATE utf8_bin DEFAULT NULL,
`THUMB_YN` char(1) COLLATE utf8_bin DEFAULT '0',
PRIMARY KEY (`ATTACH_GRP_NO`,`ATTACH`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

I've had the same error with other select queries besides that table.
Additionally SHOW PROCESSLIST; Why is SET STATEMENT max_statement_time=5000 FOR automatically prepended to some queries in ?

Comment by Daniel Black [ 2022-11-29 ]

Something in your application stack is prepending SET STATEMENT max_statement_time... to your queries. Perhaps something in your java connector.

Can you obtain the debug symbols zip from downloads and try to get a stack trace with:

cdb -lines -z   C:\Program Files\MariaDB 10.5\data\mariadbd.dmp -i  C:\Program Files\MariaDB 10.5\data\bin -y   C:\Program Files\MariaDB 10.5\data\bin -logo out.txt -c "!sym noisy;!sym prompts;.reload;lml;!uniqstack -p;q"

Adjust for the 10.10.2 version if that's easier too.

Comment by Marko Mäkelä [ 2022-11-29 ]

The stack trace points to the Aria storage engine, not InnoDB:

int ha_maria::extra(enum ha_extra_function operation)
{
  int tmp;
  TRN *old_trn= file->trn;
  if ((specialflag & SPECIAL_SAFE_MODE) && operation == HA_EXTRA_KEYREAD)
    return 0;

The crash is attributed to the line of the if statement, but that line is not dereferencing any pointers. specialflag is a global variable. I suspect that handler::file is an invalid pointer here, and the exception actually occurs while initializing the variable old_trn.

Comment by Vladislav Vaintroub [ 2022-11-29 ]

"SET STATEMENT max_statement_time=5000" is probably because you're using JDBC, and restrict query time. Or there is another client, like .NET, that has the ability to restricts the statement time, and was instructed to restrict that execution time to approx 1.5 hours

Comment by Daniel Black [ 2022-11-30 ]

Further to marko's analysis, ha_maria::file is set to 0 when ha_maria::close is called and ha_maria::drop_table.

ha_maria::drop_table is called by free_tmp_table. With the ordering in query that crashed its quite possible that an Aria tmp table is used to generate the result.

As a test I added a few assignments to null of freed handlers/tables in bb-10.10-danielblack-MDEV-30118-pkgtest-aria-extra-segv. A package built form this change (and other stable 10.10 fixes so far) is in https://ci.mariadb.org/30437/amd64-windows-packages/.

jbpark if you are willing to test the package that would be much appreciated to see if the same exception occurs that would be appreciated.

Other questions that might help produce a test case:

  • Was there a MariaDB version that worked correctly with this query?
  • What does EXPLAIN FORMAT=JSON $the_query show?
Comment by Jongbeom Park [ 2022-11-30 ]

I have been using the 10.5.11 version for about 10 months and there was no problem until the error occurred.

{
  "query_block": {
    "select_id": 1,
    "filesort": {
      "sort_key": "@ROWNUM:=@`ROWNUM` + 1, sys_attach.DATE_C desc",
      "temporary_table": {
        "nested_loop": [
          {
            "table": {
              "table_name": "<derived2>",
              "access_type": "system",
              "rows": 1,
              "filtered": 100,
              "materialized": {
                "query_block": {
                  "select_id": 2,
                  "table": {
                    "message": "No tables used"
                  }
                }
              }
            }
          },
          {
            "table": {
              "table_name": "SYS_ATTACH",
              "access_type": "ref",
              "possible_keys": ["PRIMARY"],
              "key": "PRIMARY",
              "key_length": "8",
              "used_key_parts": ["ATTACH_GRP_NO"],
              "ref": ["const"],
              "rows": 1,
              "filtered": 100,
              "attached_condition": "sys_attach.CD_DEL = '1'"
            }
          }
        ]
      }
    }
  }
}

Shutdown case of the same error for reference :

{
  "query_block": {
    "select_id": 1,
    "window_functions_computation": {
      "sorts": [
        {
          "filesort": {
            "sort_key": "@NEWROW:=@NEWROW + 1"
          }
        }
      ],
      "temporary_table": {
        "nested_loop": [
          {
            "table": {
              "table_name": "<derived8>",
              "access_type": "system",
              "rows": 1,
              "filtered": 100,
              "materialized": {
                "query_block": {
                  "select_id": 8,
                  "table": {
                    "message": "No tables used"
                  }
                }
              }
            }
          },
          {
            "table": {
              "table_name": "<derived2>",
              "access_type": "ALL",
              "rows": 213,
              "filtered": 100,
              "materialized": {
                "query_block": {
                  "select_id": 2,
                  "nested_loop": [
                    {
                      "table": {
                        "table_name": "<derived3>",
                        "access_type": "ALL",
                        "rows": 213,
                        "filtered": 100,
                        "materialized": {
                          "query_block": {
                            "select_id": 3,
                            "const_condition": "1",
                            "nested_loop": [
                              {
                                "table": {
                                  "table_name": "<derived4>",
                                  "access_type": "ALL",
                                  "rows": 213,
                                  "filtered": 100,
                                  "materialized": {
                                    "query_block": {
                                      "select_id": 4,
                                      "nested_loop": [
                                        {
                                          "table": {
                                            "table_name": "<derived5>",
                                            "access_type": "system",
                                            "rows": 1,
                                            "filtered": 100,
                                            "materialized": {
                                              "query_block": {
                                                "select_id": 5,
                                                "table": {
                                                  "message": "No tables used"
                                                }
                                              }
                                            }
                                          }
                                        },
                                        {
                                          "table": {
                                            "table_name": "SYS_DIVISION",
                                            "access_type": "ref",
                                            "possible_keys": [
                                              "sys_division_COMPANY_ID_IDX"
                                            ],
                                            "key": "sys_division_COMPANY_ID_IDX",
                                            "key_length": "159",
                                            "used_key_parts": [
                                              "COMPANY_ID",
                                              "CD_DEL"
                                            ],
                                            "ref": ["const", "const"],
                                            "rows": 213,
                                            "filtered": 100,
                                            "attached_condition": "sys_division.COMPANY_ID = '1100' and sys_division.CD_DEL = '1' and @`id` is not null",
                                            "using_index": true
                                          }
                                        }
                                      ]
                                    }
                                  }
                                }
                              },
                              {
                                "table": {
                                  "table_name": "SYS_DIVISION",
                                  "access_type": "eq_ref",
                                  "possible_keys": [
                                    "PRIMARY",
                                    "sys_division_COMPANY_ID_IDX"
                                  ],
                                  "key": "PRIMARY",
                                  "key_length": "304",
                                  "used_key_parts": ["DIVISION", "COMPANY_ID"],
                                  "ref": ["FNC.DIVISION", "const"],
                                  "rows": 1,
                                  "filtered": 62.09912491,
                                  "attached_condition": "trigcond(sys_division.COMPANY_ID = '1100' and sys_division.CD_DEL = '1' and fnc.DIVISION = sys_division.DIVISION and trigcond(fnc.DIVISION is not null))"
                                }
                              }
                            ]
                          }
                        }
                      }
                    }
                  ]
                }
              }
            }
          }
        ]
      }
    }
  }
}

Comment by Daniel Black [ 2022-11-30 ]

I don't see a query with the SYS_DIVISION table. So is the query different from the one in the error log? Was it causing troubles as well?

How does the test of 10.10.3 package in https://ci.mariadb.org/30437/amd64-windows-packages/ perform?

Comment by Jongbeom Park [ 2022-11-30 ]

There are several queries that give the same error.
I am currently testing on the local server, but it is difficult to answer because I do not know exactly under what circumstances it occurs.
In addition, a Permission denied message appears in my WAS Log as shown below. Is this related to the issue?

java.sql.SQLException: (conn=6036) File 'C:\Windows\SERVIC~1\MARIAD~1\AppData\Local\Temp\#sql-temptable-1794-1794-50.MAD' not found (Errcode: 13 "Permission denied")

Comment by Daniel Black [ 2022-11-30 ]

> Is this related to the issue?

Maybe. Permission denied is the more correct error. Do logs of it correspond to the server exception?

Comment by Jongbeom Park [ 2022-11-30 ]

The corresponding log can be checked in the WAS log, but not in the DB Server error log.
This log is generated intermittently during normal operation, but it is not shut down.

Comment by Daniel Black [ 2022-12-02 ]

If you want to attach your minidump here or on the sftp server please do so (and identify the version running) that can be done too.

Comment by Jongbeom Park [ 2022-12-02 ]

i proceeded with the file object access check through the Windows local policy.
Can you tell if mysqld deletes the tmp file and processes it normally, but throws an error sometimes because it can't find the file?

java.sql.SQLTransientConnectionException: (conn=9320) File 'C:\Windows\SERVIC~1\MariaDB\AppData\Local\Temp#sql-temptable-3558-2468-4f0.MAD' not found (Errcode: 13 "Permission denied")

Comment by Daniel Black [ 2022-12-02 ]

Can you translate the event view message particularly the error code/message. I assume its a delete operation?

Is this on the original 10.10.2 package or the 10.10.3 package I built?

Is it still shutting down with the "mysqld got exception" error in the log?

Can you use procmon in a test environment and show the sequence of tempory file events around when the error occurs?

Comment by Jongbeom Park [ 2022-12-05 ]

First, I'm using the 10.10.3 package and I traced that exception file using procmon. It is not shutdown, but the following process of java does not work with the exception.
Enter exactly what the capture you are talking about is what it is.
java.sql.SQLTransientConnectionException: (conn=12630) Could not find file 'C:\Windows\SERVIC~1\MariaDB\AppData\Local\Temp#sql-temptable-36ec-3156-183.MAD' (Errcode: 13 "Permission denied")

In addition, In addition, when the permission log comes out, the system's ASDSvc.exe (antivirus) behaves as shown in the image below. Can that action affect the occurrence of errors?

Comment by Daniel Black [ 2022-12-06 ]

Thanks. As you can see with MDEV-30162, wlad has written a work around for MariaDB to be more tolerant of the intrusive practices like what your AV exhibits. In the mean time you can exclude MariaDB's data directory from the AV scanning and do let us know if that eliminates the problem for you.

Can you also report the problem to Avast? As you can see with your ASDVsvc.exe QueryOpen, the sharing permissions aren't sufficient to allow MariaDB to delete the file.

Comment by Jongbeom Park [ 2022-12-07 ]

Since setting it as an exception folder for scanning of Antivirus, permission issues have not occurred until now.
The cause of the issue seems to be an application conflict between antivirus and mariadb.
Monitoring is still ongoing.
Thank you for your efforts in resolving the issue.

Comment by Daniel Black [ 2022-12-07 ]

Thanks for the feedback jbpark.

wlad's workaround only keeps trying 50 times on a 10ms between retries, so 0.5 seconds. Its enough to avoid small transitory interruptions, but per MDEV-30162 we're still blaming the AV for not enough sharing on what aren't its files.

Comment by Michael Widenius [ 2022-12-13 ]

First part of the patch, setting table->file=NULL is not needed as we set it a few lines later and there is nothing in between that can use it.
The second part of the patch is ok (will probably not help but not harm either). I will push this to 10.5 as the original problem originated from there

Comment by Michael Widenius [ 2022-12-13 ]

I may have found the issue.

The problem is probably that when we close a temporary file, we don't set created to 0, which causes wrong cleanup code to be executed later.
I will push a fix for this in 10.5 shortly. Hopefully this makes this problem go away

Comment by Michael Widenius [ 2023-01-05 ]

Pushed fix that 'should' fix the issue.
Cannot test as there is no reproducible test case.

Generated at Thu Feb 08 10:13:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.