Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27031

InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\trx\trx0trx.cc line 1288

Details

    Description

      Several times a day the MariaDB windows services crashes. Every time the same error in the log is shown. Please advise which steps to take to fix it. I am using version:

      Server version: 10.6.1-MariaDB - mariadb.org binary distribution

      Error log:

      2021-11-12 08:27:15 0x37d8 InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\trx\trx0trx.cc line 1288
      InnoDB: Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      211112 8:27:15 [ERROR] mysqld got exception 0x80000003 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.6.1-MariaDB
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=141
      max_threads=65537
      thread_count=120
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 139782 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x27a7abc4038
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      server.dll!my_parameter_handler()[my_init.c:278]
      ucrtbase.dll!raise()
      ucrtbase.dll!abort()
      server.dll!ut_dbg_assertion_failed()[ut0dbg.cc:60]
      server.dll!trx_t::commit_in_memory()[trx0trx.cc:1288]
      server.dll!trx_t::commit()[trx0trx.cc:1511]
      server.dll!trx_commit_for_mysql()[trx0trx.cc:1625]
      server.dll!innobase_commit()[ha_innodb.cc:4417]
      server.dll!commit_one_phase_2()[handler.cc:1999]
      server.dll!ha_commit_one_phase()[handler.cc:1980]
      server.dll!ha_commit_trans()[handler.cc:1772]
      server.dll!trans_commit_stmt()[transaction.cc:473]
      server.dll!mysql_execute_command()[sql_parse.cc:6047]
      server.dll!mysql_parse()[sql_parse.cc:8023]
      server.dll!dispatch_command()[sql_parse.cc:1899]
      server.dll!do_command()[sql_parse.cc:1406]
      server.dll!threadpool_process_request()[threadpool_common.cc:395]
      server.dll!tp_callback()[threadpool_common.cc:203]
      KERNEL32.DLL!LCMapStringEx()
      ntdll.dll!RtlAddRefActivationContext()
      ntdll.dll!RtlAcquireSRWLockExclusive()
      KERNEL32.DLL!BaseThreadInitThunk()
      ntdll.dll!RtlUserThreadStart()

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x27a7a987720): SELECT

      i.ips_id,

      i.event_type_id,

      i.event_type,

      i.category,

      i.ts,

      i.log_comment,

      i.log_decision,

      i.log_topwind_user,

      i.log_observations,

      i.log_actions,

      i.log_pictures,

      md.meteo_datetime > DATE_SUB(i.ts, INTERVAL 5 MINUTE) as ShowSensor,

      md_one_hour_ago.meteo_datetime > DATE_SUB(md.meteo_datetime, INTERVAL 2 HOUR) as ShowSensorDiff,

      round(md.temperature,1) as Temp,

      sign(round(1/0.2*

      (md.temperature-md_one_hour_ago.temperature)

      /(TIMESTAMPDIFF(minute,md_one_hour_ago.meteo_datetime,md.meteo_datetime)/60)

      )) as Temp_Change_Sign,

      round(

      (md.temperature-md_one_hour_ago.temperature)

      /(TIMESTAMPDIFF(minute,md_one_hour_ago.meteo_datetime,md.meteo_datetime)/60),

      1

      ) as Temp_Change,

      round(md.relative_humidity) as Humidity,

      sign(round(1/3*

      (md.relative_humidity-md_one_hour_ago.relative_humidity)

      /(TIMESTAMPDIFF(minute,md_one_hour_ago.meteo_datetime,md.meteo_datetime)/60)

      )) as Humidity_Change_Sign,

      round(

      (md.relative_humidity-md_one_hour_ago.relative_humidity)

      /(TIMESTAMPDIFF(minute,md_one_hour_ago.meteo_datetime,md.meteo_datetime)/60),

      0

      ) as Humidity_Change,

      md.precipitation as Precip,

      sign(

      (md.precipitation-md_one_hour_ago.precipitation)

      ) as Precip_Change_Sign,

      round(md.risk_number,0) as Risk,

      sign(round(1/5*

      (md.risk_number-md_one_hour_ago.risk_number)

      /(TIMESTAMPDIFF(minute,md_one_hour_ago.meteo_datetime,md.meteo_datetime)/60)

      )) as Risk_Change_Sign,

      round(

      (md.risk_number-md_one_hour_ago.risk_number)

      /(TIMESTAMPDIFF(minute,md_one_hour_ago.meteo_datetime,md.meteo_datetime)/60)

      ) as Risk_Change

      FROM

      (

      SELECT

      e.ips_id,

      event_type.event_type_id,

      event_type.event_type,

      'EventStart' as category,

      e.start_datetime as ts,

      NULL as log_comment,

      NULL as log_decision,

      NULL as log_topwind_user,

      NULL as log_observations,

      NULL as log_actions,

      NULL as log_pictures

      FROM

      event_recent e

      LEFT OUTER JOIN event_type ON e.event_type_id=event_type.event_type_id

      LEFT OUTER JOIN ips ON e.ips_id = ips.ips_id

      WHERE

      ips.ips_cluster_id=26 AND

      e.start_datetime > DATE_SUB(UTC_TIMESTAMP(), INTERVAL 1 DAY)

      UNION ALL

      SELECT

      e2.ips_id,

      event_type.event_type_id,

      event_type.event_type,

      'EventEnd',

      e2.end_datetime,

      NULL,NULL,NULL,NULL,NULL,NULL

      FROM

      event_recent e2

      LEFT OUTER JOIN event_type ON e2.event_type_id=event_type.event_type_id

      LEFT OUTER JOIN ips ON e2.ips_id = ips.ips_id

      WHERE

      e2.end_datetime is NOT NULL AND

      ips.ips_cluster_id=26 AND

      e2.end_datetime > DATE_SUB(UTC_TIMESTAMP(), INTERVAL 1 DAY)

      UNION ALL

      SELECT

      -1 as ips_id,

      NULL,

      NULL,

      'Log',

      log.datetime,

      comment as log_comment,

      decision.decision_nl as log_decision,

      topwind_user.username as log_topwind_user,

      IFNULL(GROUP_CONCAT(DISTINCT observation.observation_nl),'') as log_observations,

      IFNULL(GROUP_CONCAT(DISTINCT action.action_nl),'') as log_actions,

      IFNULL(GROUP_CONCAT(DISTINCT CONCAT(

      picture.turbine_id,';',

      picture.picture_datetime,';',

      picture.filename

      )),'') as log_pictures

      FROM

      log

      LEFT OUTER JOIN decision ON log.decision_id = decision.decision_id

      LEFT OUTER JOIN topwind_user ON log.topwind_user_id = topwind_user.topwind_user_id

      LEFT OUTER JOIN log_observation ON log.log_id = log_observation.log_id

      LEFT OUTER JOIN observation ON log_observation.observation_id = observation.observation_id

      LEFT OUTER JOIN log_action ON log.log_id = log_action.log_id

      LEFT OUTER JOIN action ON log_action.action_id = action.action_id

      LEFT OUTER JOIN picture ON log.log_id = picture.log_id

      WHERE log.ips_cluster_id=26 AND log.datetime>DATE_SUB(UTC_TIMESTAMP(), INTERVAL 1 DAY)

      GROUP BY log.log_id

      ) i

      LEFT OUTER JOIN `meteo_data_last_day` md ON md.ips_id = i.ips_id

      LEFT OUTER JOIN `meteo_data_last_day` md_one_hour_ago ON md_one_hour_ago.ips_id = i.ips_id

      WHERE

      (

      md.meteo_datetime = (

      SELECT MAX(`meteo_data_last_day`.`meteo_datetime`)

      FROM `meteo_data_last_day`

      WHERE ips_id = i.ips_id AND `meteo_data_last_day`.`meteo_datetime`<i.ts

      ) OR

      md.ips_id IS NULL

      ) AND

      (

      md_one_hour_ago.meteo_datetime = (

      SELECT MAX(`meteo_data_last_day`.`meteo_datetime`)

      FROM `meteo_data_last_day`

      WHERE ips_id = i.ips_id AND

      `meteo_data_last_day`.`meteo_datetime` < DATE_SUB(md.meteo_datetime, INTERVAL 1 HOUR)

      ) OR

      md_one_hour_ago.ips_id IS NULL

      )

      ORDER BY i.ts
      Connection ID (thread ID): 19138
      Status: NOT_KILLED

      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off

      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file at D:\MariaDB\data\

      Attachments

        1. 2021.11.22-tail-error-log.txt
          18 kB
        2. 2021-11-15 13_05_52-.png
          2021-11-15 13_05_52-.png
          60 kB
        3. all_threads_stack.txt
          161 kB
        4. my.ini
          1 kB
        5. mysqld.dmp
          1.04 MB
        6. TW-EDE-SRV01.tail.err
          9 kB
        7. windbg_unique_stacks.txt
          65 kB

        Issue Links

          Activity

            Sorry, setting "rr-profile" and forgetting the location was a mistake.
            The cases which I found were crashes during shutdown like

            1. 2022-07-26T09:15:22 [256926] | [rr 260977 1461454]2022-07-26 9:14:32 0 [Note] InnoDB: Starting shutdown...
            2. 2022-07-26T09:15:22 [256926] | [rr 260977 1461529]2022-07-26 9:14:32 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/rqg/1658847117/108/1/data/ib_buffer_pool
            3. 2022-07-26T09:15:22 [256926] | [rr 260977 1461564]2022-07-26 9:14:32 0 [Note] InnoDB: Instance 0, restricted to 94 pages due to innodb_buf_pool_dump_pct=25
            4. 2022-07-26T09:15:22 [256926] | [rr 260977 1461572]2022-07-26 9:14:32 0 [Note] InnoDB: Buffer pool(s) dump completed at 220726 9:14:32
            5. 2022-07-26T09:15:22 [256926] | [rr 260977 1461683]2022-07-26 09:14:33 0x7f21f6013700[rr 260977 1461686] InnoDB: Assertion failure in file /data/Server/10.4/storage/innobase/buf/buf0flu.cc line 3321
            6. 2022-07-26T09:15:22 [256926] | [rr 260977 1461688]InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0

            origin/10.4 9a897335eb4387980aed7698b832a893dbaa3d81 2022-07-26T16:45:10+03:00
            sdp:/data/results/1658847117/TBR-1163-MDEV-27031_1/dev/shm/rqg/1658847117/108/1/rr
            But some "rr replay" does not replay the assert observed from whatever reason.

            mleich Matthias Leich added a comment - Sorry, setting "rr-profile" and forgetting the location was a mistake. The cases which I found were crashes during shutdown like 2022-07-26T09:15:22 [256926] | [rr 260977 1461454] 2022-07-26 9:14:32 0 [Note] InnoDB: Starting shutdown... 2022-07-26T09:15:22 [256926] | [rr 260977 1461529] 2022-07-26 9:14:32 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/rqg/1658847117/108/1/data/ib_buffer_pool 2022-07-26T09:15:22 [256926] | [rr 260977 1461564] 2022-07-26 9:14:32 0 [Note] InnoDB: Instance 0, restricted to 94 pages due to innodb_buf_pool_dump_pct=25 2022-07-26T09:15:22 [256926] | [rr 260977 1461572] 2022-07-26 9:14:32 0 [Note] InnoDB: Buffer pool(s) dump completed at 220726 9:14:32 2022-07-26T09:15:22 [256926] | [rr 260977 1461683] 2022-07-26 09:14:33 0x7f21f6013700 [rr 260977 1461686] InnoDB: Assertion failure in file /data/Server/10.4/storage/innobase/buf/buf0flu.cc line 3321 2022-07-26T09:15:22 [256926] | [rr 260977 1461688] InnoDB: Failing assertion: UT_LIST_GET_LEN(buf_pool->flush_list) == 0 origin/10.4 9a897335eb4387980aed7698b832a893dbaa3d81 2022-07-26T16:45:10+03:00 sdp:/data/results/1658847117/TBR-1163- MDEV-27031 _1/dev/shm/rqg/1658847117/108/1/rr But some "rr replay" does not replay the assert observed from whatever reason.

            mleich, that assertion expression is different from the bug description. There are many data structures that use the InnoDB home-brew doubly linked list.

            That said, in MDEV-29440 you posted an rr replay trace that might explain this bug.

            marko Marko Mäkelä added a comment - mleich , that assertion expression is different from the bug description. There are many data structures that use the InnoDB home-brew doubly linked list. That said, in MDEV-29440 you posted an rr replay trace that might explain this bug.

            Copying and adapting from MDEV-29440:

            ssh pluto
            rr replay /data/results/1662464247/TBR-1442-TBR-1443/1/rr/latest-trace
            

            That trace involves innodb_rollback_on_timeout=ON.

            marko Marko Mäkelä added a comment - Copying and adapting from MDEV-29440 : ssh pluto rr replay /data/results/1662464247/TBR-1442-TBR-1443/1/rr/latest-trace That trace involves innodb_rollback_on_timeout=ON .

            In MDEV-29576 there is an rr replay trace where a being-rolled-back transaction will be flagged as a deadlock victim. Like the original report of this, it involves innodb_rollback_on_timeout=ON.

            marko Marko Mäkelä added a comment - In MDEV-29576 there is an rr replay trace where a being-rolled-back transaction will be flagged as a deadlock victim. Like the original report of this, it involves innodb_rollback_on_timeout=ON .
            vlad.lesin Vladislav Lesin added a comment - - edited

            The scenario is the following:
            1. trx1 is resumed in lock_wait() by timeout, enters in lock_sys_t::cancel(lock=L1), but it has not yet acquired lock_sys.latch,
            2. trx2 executes lock_move_reorganize_page() during purging. It acquires lock_sys.latch, moves L1 to L2, resetting waiting flag for L1, sets trx1->lock.wait_lock=L2 and releases lock_sys.latch.
            3. trx1 acquires lock_sys.latch, and checks lock->is_waiting(). "lock" is equal to L1, as this is argument of lock_sys_t::cancel(lock=L1) . As L1's waiting flag was reset by trx2, trx1 does not invoke lock_cancel_waiting_and_release(), and trx->lock.wait_thr stays unreset, what causes assertion failure on trx1 rollback.

            This is one more variant of MDEV-27701, which is already fixed.

            vlad.lesin Vladislav Lesin added a comment - - edited The scenario is the following: 1. trx1 is resumed in lock_wait() by timeout, enters in lock_sys_t::cancel(lock=L1), but it has not yet acquired lock_sys.latch, 2. trx2 executes lock_move_reorganize_page() during purging. It acquires lock_sys.latch, moves L1 to L2, resetting waiting flag for L1, sets trx1->lock.wait_lock=L2 and releases lock_sys.latch. 3. trx1 acquires lock_sys.latch, and checks lock->is_waiting(). "lock" is equal to L1, as this is argument of lock_sys_t::cancel(lock=L1) . As L1's waiting flag was reset by trx2, trx1 does not invoke lock_cancel_waiting_and_release(), and trx->lock.wait_thr stays unreset, what causes assertion failure on trx1 rollback. This is one more variant of MDEV-27701 , which is already fixed.

            People

              vlad.lesin Vladislav Lesin
              ruut Ruut
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.