Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-40035

InnoDB purge subsystem assertion failure under cached UPDATE KEY/NO KEY workloads

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.8.8, 12.3.2
    • None
    • None

    Description

      Benchmarking TidesDB vs InnoDB exposed repeated InnoDB purge assertions under cached UPDATE KEY/NO KEY micro‑workloads. The issue was first observed during short 240‑second development runs, and later confirmed during full production sweeps (512 threads, 1800‑second duration) on both official InnoDB and TidesDB builds. The failure occurs consistently once UPDATE pressure saturates the buffer pool and the purge subsystem begins processing large volumes of undo records.

      Affects:

      • MariaDB 11.8.6 (TidesDB install.sh build, TidesDB & InnoDB)
      • MariaDB 11.8.8 (TidesDB install.sh build, TidesDB and InnoDB)
      • MariaDB 11.8.8 (official)
      • MariaDB 12.3.2 (official)

      Component: InnoDB → Purge subsystem
      Severity: Critical (server abort, corrupted undo pointers, invalid history list ordering)
      Reproducible: Always (under high‑concurrency UPDATE KEY/NO KEY workloads)

      ================ 12.3.2 release bin ==================

      MariaDB 12.3.2 (official bintar), InnoDB, UK workload (512 threads)

      Just passed 6 seconds of first iteration.

      [jeb@hz-bench-jeb archive]$ cat Error_hz-bench-jeb_sysbench-lua_init-start-db-run-tests_2026_6_14_17_39_49/hz-bench-jeb_sysbench-lua_UPDATE_KEY_6336_1_512/run-result.out
      sysbench 1.1.0-2e6b7d5 (using bundled LuaJIT 2.1.0-beta3)

      Running the test with following options:
      Number of threads: 512
      Report intermediate results every 2 second(s)
      Initializing random number generator from current time

      Forcing shutdown in 1800 seconds

      Initializing worker threads...

      Threads started!

      [ 2s ] thds: 512 tps: 37242.69 qps: 37242.69 (r/w/o: 0.00/37242.69/0.00) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 0.00
      [ 4s ] thds: 512 tps: 39461.40 qps: 39461.40 (r/w/o: 0.00/39461.40/0.00) lat (ms,95%): 15.55 err/s: 0.00 reconn/s: 0.00
      [ 6s ] thds: 512 tps: 35346.22 qps: 35346.22 (r/w/o: 0.00/35346.22/0.00) lat (ms,95%): 21.89 err/s: 0.00 reconn/s: 0.00
      FATAL: mysql_stmt_execute() returned error 2013 (Lost connection to server during query) for query 'UPDATE sbtest7 SET k=k+1 WHERE id=?'
      FATAL: mysql_stmt_execute() returned error 2013 (Lost connection to server during query) for query 'UPDATE sbtest2 SET k=k+1 WHERE id=?'

      Server version: 12.3.2-MariaDB source revision: 9f98f82b14a9b939834281672b6d0cf965db69a3

      The information page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/
      contains instructions to obtain a better version of the backtrace below.
      Following these instructions will help MariaDB developers provide a fix quicker.

      Attempting backtrace. Include this in the bug report.
      (note: Retrieving this information may fail)

      Thread pointer: 0x562486a1a570
      stack_bottom = 0x77e04e569000 thread_stack 0x49000
      mysys/stacktrace.c:216(my_print_stacktrace)[0x562484c1db5e]
      sql/signal_handler.cc:230(handle_fatal_signal)[0x5624846afb9d]
      /lib64/libc.so.6(+0x3fc30)[0x7fe09723fc30]
      /lib64/libc.so.6(+0x8d02c)[0x7fe09728d02c]
      /lib64/libc.so.6(raise+0x16)[0x7fe09723fb86]
      /lib64/libc.so.6(abort+0xd3)[0x7fe097229873]
      ut/ut0rbt.cc:471(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .cold.31])[0x5624841e38de]
      trx/trx0purge.cc:884(purge_sys_t::choose_next_log(trx_t*))[0x5624841ddf94]
      trx/trx0purge.cc:848(purge_sys_t::rseg_get_next_history_log(trx_t*))[0x562484ab53c1]
      trx/trx0purge.cc:1006(purge_sys_t::get_next_rec(trx_t*, unsigned long))[0x562484ab65db]
      srv/srv0srv.cc:1421(purge_coordinator_state::do_purge(trx_t*))[0x562484aa3cc8]
      tpool/task_group.cc:74(tpool::task_group::execute(tpool::task*))[0x562484bc1389]
      tpool/tpool_generic.cc:529(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x562484bbf1bf]
      /lib64/libstdc++.so.6(+0xdbae4)[0x7fe0976dbae4]
      /lib64/libc.so.6(+0x8b2ea)[0x7fe09728b2ea]
      /lib64/libc.so.6(+0x1103d0)[0x7fe0973103d0]

      Connection ID (thread ID): 0
      Status: NOT_KILLED
      Query (0x0): (null)
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,duplicateweedout=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off,hash_join_cardinality=on,cset_narrowing=on,sargable_casefold=on,reorder_outer_joins=off

      Writing a core file...
      Working directory at /data/data
      Resource Limits (excludes unlimited resources):
      Limit Soft Limit Hard Limit Units
      Max stack size 8388608 unlimited bytes
      Max processes 254911 254911 processes
      Max open files 65535 65535 files
      Max locked memory 8388608 8388608 bytes
      Max pending signals 254911 254911 signals
      Max msgqueue size 819200 819200 bytes
      Max nice priority 0 0
      Max realtime priority 0 0
      Core pattern: core.%p

      Kernel version: Linux version 5.14.0-611.35.1.el9_7.x86_64 (mockbuild@x64-builder02.almalinux.org) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11), GNU ld version 2.35.2-67.el9_7.1) #1 SMP PREEMPT_DYNAMIC Wed Feb 25 03:46:09 EST 2026

      Thread 1 (LWP 3950466):

      #0 0x00007fe09728d02c in __pthread_kill_implementation () from /lib64/libc.so.6
      #1 0x00007fe09723fb86 in raise () from /lib64/libc.so.6
      #2 0x00007fe097229905 in abort () from /lib64/libc.so.6
      #3 0x00005624841e38de in ut_dbg_assertion_failed(
      expr="tail.trx_no <= last_trx_no",
      file="/home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc",
      line=885)
      at storage/innobase/ut/ut0dbg.cc:60
      #4 0x00005624841ddf94 in purge_sys_t::choose_next_log(
      this=0x562486330540 <purge_sys>, trx=0x7fe0942deb80)
      at storage/innobase/trx/trx0purge.cc:885
      #5 0x0000562484ab53c1 in purge_sys_t::rseg_get_next_history_log(
      this=0x562486330540 <purge_sys>, trx=0x7fe0942deb80)
      at storage/innobase/trx/trx0purge.cc:847
      #6 0x0000562484ab65db in purge_sys_t::get_next_rec(
      this=0x562486330540 <purge_sys>, trx=0x7fe0942deb80,
      roll_ptr=5629499552115161)
      at storage/innobase/trx/trx0purge.cc:1003
      #7 purge_sys_t::fetch_next_rec(
      this=0x562486330540 <purge_sys>, trx=0x7fe0942deb80)
      at storage/innobase/trx/trx0purge.cc:1043
      #8 trx_purge_attach_undo_recs(
      trx=0x7fe0942deb80, n_work_items=...)
      at storage/innobase/trx/trx0purge.cc:1325
      purge_rec =

      {undo_rec = 0x350375000934ff80 (invalid), roll_ptr = 290696591361180169}

      head =

      {trx_no = 378954, undo_no = 0}

      #9 trx_purge(trx=0x7fe0942deb80, n_tasks=4, history_size=...)
      at storage/innobase/trx/trx0purge.cc:1457
      #10 purge_coordinator_state::do_purge(trx=0x7fe0942deb80)
      at storage/innobase/srv/srv0srv.cc:1421

      ======== official 11.8.8 ===============
      #0 0x00007feeec28d02c in __pthread_kill_implementation () from /lib64/libc.so.6
      No symbol table info available.
      #1 0x00007feeec23fb86 in raise () from /lib64/libc.so.6
      No symbol table info available.
      #2 0x00007feeec229905 in abort () from /lib64/libc.so.6
      No symbol table info available.
      #3 0x0000557b3ab3a422 in ut_dbg_assertion_failed (expr=expr@entry=0x557b3b85d2b7 "tail.trx_no <= last_trx_no",
      file=file@entry=0x557b3b85d040 "/home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc", line=line@entry=885)
      at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/ut/ut0dbg.cc:60
      No locals.
      #4 0x0000557b3ab34b18 in purge_sys_t::choose_next_log (this=this@entry=0x557b3cac8500 <purge_sys>, trx=trx@entry=0x7feee891bb80)
      at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc:885
      last_trx_no = <optimized out>
      #5 0x0000557b3b2e08e1 in purge_sys_t::rseg_get_next_history_log (this=this@entry=0x557b3cac8500 <purge_sys>, trx=trx@entry=0x7feee891bb80)
      at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc:847
      prev_log_addr = <optimized out>
      #6 0x0000557b3b2e1afb in purge_sys_t::get_next_rec (roll_ptr=35747322072663421, trx=0x7feee891bb80, this=0x557b3cac8500 <purge_sys>)
      at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc:1003
      rec2 = <optimized out>
      page_id = <optimized out>
      locked = true
      b = 0x77ef3f0028f8
      rec2_page = 0x77ef3f0028f8
      got_no_rec = <optimized out>
      got_rec = <optimized out>
      page_id = <optimized out>
      locked = <optimized out>
      b = <optimized out>
      rec2_page = <optimized out>
      rec2 = <optimized out>
      next = <optimized out>
      next_page = <optimized out>
      #7 purge_sys_t::fetch_next_rec (trx=0x7feee891bb80, this=0x557b3cac8500 <purge_sys>) at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc:1043
      roll_ptr = 35747322072663421
      got_nothing = <optimized out>
      roll_ptr = <optimized out>
      locked = <optimized out>
      #8 trx_purge_attach_undo_recs (n_work_items=<synthetic pointer>, trx=0x7feee891bb80) at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc:1319
      purge_rec =

      {undo_rec = 0x908fb6ff0402c3ff <error: Cannot access memory at address 0x908fb6ff0402c3ff>, roll_ptr = 5189000293286286341}

      table_id = <optimized out>
      table_node = <optimized out>
      size = <optimized out>
      thd = <optimized out>
      thr = 0x0
      head =

      {trx_no = 85241034, undo_no = 0}

      table_id_map = std::unordered_map with 0 elements
      enqueue = <optimized out>
      thr = <optimized out>
      head = <optimized out>
      table_id_map = <optimized out>
      -Type <RET> for more, q to quit, c to continue without paging-c
      thd = <optimized out>
      purge_rec = <optimized out>
      table_id = <optimized out>
      table_node = <optimized out>
      size = <optimized out>
      pt = <optimized out>
      #9 trx_purge (trx=trx@entry=0x7feee891bb80, n_tasks=n_tasks@entry=4, history_size=<optimized out>) at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/trx/trx0purge.cc:1457
      no_throttle = <optimized out>
      thd = <optimized out>
      n_work = 0
      head = <optimized out>
      n_pages = <optimized out>
      thr = <optimized out>
      #10 0x0000557b3b2cf318 in purge_coordinator_state::do_purge (trx=0x7feee891bb80, this=0x557b3cac78a0 <purge_state>) at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/srv/srv0srv.cc:1420
      n_pages_handled = <optimized out>
      first_loop = <optimized out>
      n_threads = 4
      no_history = <optimized out>
      first_loop = <optimized out>
      n_threads = <optimized out>
      lk = <optimized out>
      n_pages_handled = <optimized out>
      lk = <optimized out>
      #11 purge_coordinator_callback () at /home/buildbot/amd64-almalinux-8-bintar/build/storage/innobase/srv/srv0srv.cc:1513
      ctx = 0x77ed1c008850
      thd = <optimized out>
      trx = 0x7feee891bb80
      #12 0x0000557b3b3e8c09 in tpool::task_group::execute (this=0x557b3cac7700 <purge_coordinator_task_group>, t=0x557b3cac7660 <purge_coordinator_task>)
      at /home/buildbot/amd64-almalinux-8-bintar/build/tpool/task_group.cc:73
      lk = {_M_device = <optimized out>, _M_owns = false}
      #13 0x0000557b3b3e6a3f in tpool::thread_pool_generic::worker_main (this=0x557b3e4ff540, thread_var=0x557b3e84d8b0) at /home/buildbot/amd64-almalinux-8-bintar/build/tpool/tpool_generic.cc:531
      task = 0x557b3cac7660 <purge_coordinator_task>
      #14 0x00007feeec6dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
      No symbol table info available.
      #15 0x00007feeec28b2ea in start_thread () from /lib64/libc.so.6
      No symbol table info available.
      #16 0x00007feeec3103d0 in clone3 () from /lib64/libc.so.6
      No symbol table info available.

      Assertion: tail.trx_no <= last_trx_no
      File: trx0purge.cc
      Line: 885

      ==== MariaDB Config =======

      [mysqld]
      performance_schema=OFF
      disable-log-bin
      sync_binlog=0
      general_log=OFF
      slow_query_log=OFF
      default_storage_engine=InnoDB
      default_authentication_plugin=mysql_native_password
      character_set_server=utf8mb4
      collation_server=utf8mb4_general_ci
      tls_version = TLSv1.2,TLSv1.3
      max_connections=600
      max_prepared_stmt_count=1000000
      open_files_limit=65535
      table_open_cache=4000
      thread_cache_size=128

      transaction_isolation=READ-COMMITTED

      1. InnoDB
        innodb_buffer_pool_size=32G
        innodb_doublewrite=1
        innodb_flush_method=O_DIRECT
        innodb_flush_neighbors=0
        innodb_log_file_size=8G
        innodb_undo_tablespaces=2
        innodb_undo_log_truncate=ON
        innodb_io_capacity=2000
        innodb_io_capacity_max=4000
        innodb_read_io_threads=8
        innodb_write_io_threads=8
        innodb_file_per_table=1
        innodb_default_row_format=dynamic
        innodb_compression_algorithm=none
        innodb_compression_level=0
      1. sync
        sync_master_info=0
        sync_relay_log=0
        sync_relay_log_info=0

      aria_pagecache_buffer_size = 128M
      aria_sort_buffer_size = 128M

      ==== test case ===

      Will upload both TAF user properties and mariadb config file to the MDEV

      1. Load 8 tables × 1M rows (sysbench prepare)
      2. Run UPDATE_KEY or UPDATE_NO_KEY with 512 threads
      3. InnoDB aborts within 2–10 seconds with purge assertion

      Attachments

        1. innodb_compare.cnf
          1.0 kB
          Jonathan Jeb Miller
        2. sysbench_tidesdb_compare_innodb_write.properties
          5 kB
          Jonathan Jeb Miller

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              jeb Jonathan Jeb Miller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.