Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33845

MariaDB Crashes when specific queries are run but not others that are very similar. It appear related to where greater then and less than on large intergers.

    XMLWordPrintable

Details

    Description

      Hello thanks for taking the time to review this. I am happy to add additional information as needed.
      Explanation.
      The most basic way to explain this issue is if we run a query with MsgIds from 2.3 million to 4.3 million with an MsgId greater then 237007468 million but less then 436312266 million and include in the output the message content and that the message includes a specific MsgSource. The server Crashes and restarts.

      Queries
      – Crashes
      WHERE MsgId >= 237007468 AND MsgId <= 436312266
      WHERE MsgId > 237007467 AND MsgId < 436312266
      WHERE MsgId > 237007466

      – Work
      WHERE MsgId >= 237007467 AND MsgId <= 436312266
      WHERE MsgId BETWEEN 237007467 AND 436312266
      This only happens if the return has a blob and we pull back the MsgContent field with these numbers.

      The full query that was crashing the DB is number 1.
      Further testing showed that number 2 and 3 also crashes the DB
      Yet number 4 does not

      1. EXPLAIN SELECT MsgId as lastid, MsgId, MsgSource, CONVERT(MsgContent, CHAR) AS EDImessage FROM MessageStore
      WHERE MsgId > 237007466 AND MsgId < 436312267
      AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571")
      ORDER BY MsgId asc

      2. SELECT MsgId as lastid, MsgId, MsgSource, MsgContent FROM MessageStore
      WHERE MsgId > 237007466 AND MsgId < 436312267
      AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571")
      ORDER BY MsgId asc

      3. SELECT MsgId AS lastid, MsgId, MsgSource, CONVERT( MsgContent , CHAR) AS EDImessage  FROM MessageStore
       WHERE MsgId > 237007466 AND MsgId < 436312267
         AND MsgSource IN ('MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571')
      ORDER BY MsgId ASC LIMIT 1;

      4. SELECT MsgId AS lastid, MsgId, MsgSource, CONVERT( MsgContent , CHAR) AS EDImessage FROM warehouse.MessageStore
      WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN ('MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571')
      ORDER BY MsgId ASC
      LIMIT 10,10;

      We ran a lot of queries to pull all records one of those gave us the MsgID's that match the IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571") statement then wrote a script to go through and select * from MessageStore where MsgID = (All the Ids returned). This also works.

      We ran a mysqlcheck -all-databases and everything came back with an OK

      When it fails the entire server locks and reboots the error before that and a bunch of what appears to be base64 code is sent to the logs

      The errors we receive are below.

       
      2024-04-03 15:39:26 867 [Note] InnoDB: Uncompressed page, stored checksum in field1 5595288, calculated checksums for field1: crc32 3970976477, innodb 5595288,  page type 10 == BLOB.none 3735928559, stored checksum in field2 3970976477, calculated checksums for field2: crc32 3970976477, innodb 3970976477, none 3735928559,  page LSN 11534 1034519743, low 4 bytes of LSN at page end 1034519743, page number (if stored to page already) 319240129, space id (if created with >= MySQL-4.1.1 and stored already) 61
      InnoDB: Page may be a BLOB page
      2024-04-03 15:39:26 867 [Note] InnoDB:  You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      2024-04-03 15:39:26 867 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './warehouse/MessageStore.ibd' page [page id: space=61, page number=319240129]. You may have to recover from a backup.
      2024-04-03 15:39:26 867 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
      len 16384; hex 00556 ETC ETC but the bottom of this bump is a ton of zeros.
      The first part of the dump gives what looks like data followed by a bunch of zeros. Then asc and what appears to be base64 encoded data and then end of page DUMP.
      2024-04-03 16:28:50 139 [Note] InnoDB: Uncompressed page, stored checksum in field1 5595288, calculated checksums for field1: crc32 3970976477, innodb 5595288,  page type 10 == BLOB.none 3735928559, stored checksum in field2 3970976477, calculated checksums for field2: crc32 3970976477, innodb 3970976477, none 3735928559,  page LSN 11534 1034519743, low 4 bytes of LSN at page end 1034519743, page number (if stored to page already) 319240129, space id (if created with >= MySQL-4.1.1 and stored already) 61
      InnoDB: Page may be a BLOB page
      2024-04-03 16:28:50 139 [Note] InnoDB:  You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      240403 16:28:50 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.5.9-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=36
      max_threads=153
      thread_count=37
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467864 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7e96900009b8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7e991f0afc90 thread_stack 0x49000
       
      2024-04-03 15:39:19 867 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './warehouse/MessageStore.ibd' page [page id
      : space=61, page number=319240129]. You may have to recover from a backup.
      

      EXPLAINS

      Query 1.
       
      MariaDB [warehouse]> EXPLAIN SELECT MsgId as lastid , MsgId , MsgSource , CONVERT(MsgContent, CHAR) AS EDImessage FROM MessageStore WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571") ORDER BY MsgId asc\G
      *************************** 1. row ***************************
              id: 1
        select_type: SIMPLE
              table: MessageStore
              type: ref
      possible_keys: PRIMARY,MsgSourceStatusReceivedFacility,MsgSourceStatusStatusDateFacility
              key: MsgSourceStatusReceivedFacility
            key_len: 259
              ref: const
              rows: 16024
              Extra: Using index condition; Using where; Using filesort
      1 row in set (0.001 sec)
      Query 2.
      MariaDB [warehouse]> EXPLAIN SELECT MsgId as lastid, MsgId, MsgSource, MsgContent FROM MessageStore WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN("MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571") ORDER BY MsgId asc\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: MessageStore
               type: ref
      possible_keys: PRIMARY,MsgSourceStatusReceivedFacility,MsgSourceStatusStatusDateFacility
                key: MsgSourceStatusReceivedFacility
            key_len: 259
                ref: const
               rows: 16248
              Extra: Using index condition; Using where; Using filesort
      1 row in set (0.001 sec)
       
      Query 3.
       
       
      Query 4.
      EXPLAIN SELECT MsgId AS lastid, MsgId, MsgSource, CONVERT( MsgContent , CHAR) AS EDImessage FROM warehouse.MessageStore WHERE MsgId > 237007466 AND MsgId < 436312267 AND MsgSource IN ('MIRTH_bf7b3a56-80a5-417e-a32a-005124f45571') ORDER BY MsgId ASC LIMIT 10,10\G
      *************************** 1. row ***************************
                 id: 1
        select_type: SIMPLE
              table: MessageStore
               type: ref
      possible_keys: PRIMARY,MsgSourceStatusReceivedFacility,MsgSourceStatusStatusDateFacility
                key: MsgSourceStatusReceivedFacility
            key_len: 259
                ref: const
               rows: 16248
              Extra: Using index condition; Using where; Using filesort
      1 row in set (0.001 sec)
      

      MsgID information.
      HEX of ID (237007466): E20726A
      HEX of ID (436312267): 1A0198CB
      This is only interesting because the hex for 1 is 7 digits and the hex for the higher ID is 8. If the programming only reads 7 digits when running the query.

      Table Configuration

      CREATE TABLE `MessageStore` (
        `MsgId` bigint(20) NOT NULL AUTO_INCREMENT,
        `MsgContent` longblob DEFAULT NULL,
        `MsgReceived` timestamp /* mariadb-5.3 */ NOT NULL DEFAULT current_timestamp(),
        `MsgSource` varchar(256) DEFAULT NULL,
        `MsgFacility` varchar(256) DEFAULT NULL,
        `MsgType1` varchar(256) DEFAULT NULL,
        `MsgType2` varchar(256) DEFAULT NULL,
        `MsgAltId1` varchar(256) DEFAULT NULL,
        `MsgAltId2` varchar(256) DEFAULT NULL,
        `MsgMisc1` varchar(256) DEFAULT NULL,
        `MsgMisc2` varchar(256) DEFAULT NULL,
        `MsgMisc3` varchar(256) DEFAULT NULL,
        `MsgStatus` char(3) DEFAULT NULL,
        `MsgStatusDate` timestamp /* mariadb-5.3 */ NULL DEFAULT NULL,
        `MsgStatusInfo` text DEFAULT NULL,
        PRIMARY KEY (`MsgId`),
        KEY `MsgAltId2` (`MsgAltId2`),
        KEY `MsgType1` (`MsgType1`),
        KEY `MsgMisc2` (`MsgMisc2`),
        KEY `MsgSourceStatusReceivedFacility` (`MsgSource`,`MsgStatus`,`MsgReceived`,`MsgFacility`),
        KEY `MsgFacility` (`MsgFacility`),
        KEY `MsgStatus` (`MsgStatus`),
        KEY `MsgStatusDate` (`MsgStatusDate`),
        KEY `MsgSourceStatusStatusDateFacility` (`MsgSource`,`MsgStatus`,`MsgStatusDate`,`MsgFacility`)
      ) ENGINE=InnoDB AUTO_INCREMENT=660698066 DEFAULT CHARSET=latin1
      

      Additional Thoughts
      Value limits on the size of the MsgId or another column may be making the calculation fail. Seems like a value on the ID is "wrapping" if you have a 32 bit number and go over the max value and go up one it will wrap over to 0 like an odometer. This may not be the real issue but the evidence looks like this behavior.

      Things done to attempt to resolve.
      We ran a mysqlcheck -all-databases Everything reported OK
      Moved to a replica and got the same error. I guess corrupted data could have been replicated? I am not certain that can happen.
      We are discussing upgrading to the latest 10.5.* but wanted to wait till we here back just in case you need us to test something.
      Configuration

      [mysqld]
      datadir=/var/lib/mysql
      socket=/var/lib/mysql/mysql.sock
      max_allowed_packet=1024M
      innodb_max_dirty_pages_pct=0
      innodb_fast_shutdown=0
      innodb_buffer_pool_size=768G
      innodb_log_file_size=64G
      log-bin
      log-slave-updates
      log_error=error.log
      slow_query_log = 1
      slow_query_log_file = /var/log/mysql/slow-query.log
      long_query_time = 15
      innodb_print_all_deadlocks = 1
      open_files_limit=200000
      sync_binlog=1
      skip-slave-start=ON
      slave-skip-errors=1062
      server_id=70148
      innodb_open_files=6000
      table_open_cache=4000
      join_buffer_size=1048576
      key_buffer_size=134217728
      read_buffer_size=131072
      table_definition_cache=4000
      log-basename=master1
      log_error=/var/log/mariadb/error.log
      binlog-format=mixed
      expire_logs_days=90
      log_bin_compress=ON
      # Disabling symbolic-links is recommended to prevent assorted security risks
      symbolic-links=0
      log_queries_not_using_indexes = ON
      log-error = /var/log/mariadb/mariadb.log
      # Settings user and group are ignored when systemd is used.
      # If you need to run mysqld under a different user or group,
      # customize your systemd unit file for mariadb according to the
      # instructions in http://fedoraproject.org/wiki/Systemd
       
      [mysqld_safe]
      pid-file=/var/run/mariadb/mariadb.pid
       
      #
      # include all files from the config directory
      #
      !includedir /etc/my.cnf.d
      [mariabackup]
      open_files_limit=65535
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              JoshuaGordon Joshua Gordon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.