[MDEV-36472] 10.6 History length regression - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.6
Fix Version/s: 10.6
Component/s: Storage Engine - InnoDB
Labels:
- performance
- regression-10.6

Description

Original finding here. There is a history length regression in 10.6;

Testcase setup:

sudo apt install python3-pymysql

Then start a fresh server instance with the following options:

--innodb_flush_method=O_DIRECT --innodb_change_buffering=none --innodb_buffer_pool_size=17179869184

Start a client and add this table:

CREATE TABLE `catalog` (

  `uid` BIGINT UNSIGNED NOT NULL,

  `security_uid` INT UNSIGNED,

  `owner` varchar(255) binary NOT NULL default '',

  `viewable_owner` varchar(255) binary NOT NULL default '',

  `path` varchar(255) NOT NULL default '',

  `relative_url` varchar(255) NOT NULL default '',

  `parent_uid` BIGINT UNSIGNED default '0',

  `id` varchar(255) default '',

  `description` text,

  `title` varchar(255) default '',

  `meta_type` varchar(255) default '',

  `portal_type` varchar(255) default '',

  `opportunity_state` varchar(255) default '',

  `corporate_registration_code` varchar(255),

  `ean13_code` varchar(255),

  `validation_state` varchar(255) default '',

  `simulation_state` varchar(255) default '',

  `causality_state` varchar(255) default '',

  `invoice_state` varchar(255) default '',

  `payment_state` varchar(255) default '',

  `event_state` varchar(255) default '',

  `immobilisation_state` varchar(255) default '',

  `reference` varchar(255) binary default '',

  `grouping_reference` varchar(255) default '',

  `grouping_date` datetime,

  `source_reference` varchar(255) default '',

  `destination_reference` varchar(255) default '',

  `string_index` varchar(255),

  `int_index` INT,

  `float_index` real,

  `has_cell_content` bool,

  `creation_date` datetime,

  `modification_date` datetime,

  `indexation_timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  PRIMARY KEY  (`uid`),

  KEY `security_uid` (`security_uid`),

  KEY `owner` (`owner`),

  KEY `viewable_owner` (`viewable_owner`),

  KEY `Parent` (`parent_uid`),

  KEY `Path` (`path`),

  KEY `Title` (`title`),

  KEY `Reference` (`reference`),

  KEY `relative_url` (`relative_url`),

  KEY `Portal Type` (`portal_type`, `reference`),

  KEY `opportunity_state` (`opportunity_state`),

  KEY `validation_state_portal_type` (`validation_state`, `portal_type`),

  KEY `simulation_state_portal_type` (`simulation_state`, `portal_type`),

  KEY `causality_state_portal_type` (`causality_state`, `portal_type`),

  KEY `invoice_state` (`invoice_state`),

  KEY `payment_state` (`payment_state`),

  KEY `event_state` (`event_state`)

) ENGINE=InnoDB;

Next find out the port of the server (ps -ef | grep $PWD | grep port or similar), and change the port = variable setting at the head of test.py, which is otherwise as follows:

import threading

import time

import pymysql

import random

host = "127.0.0.1"

port = 10238

user = "root"

password = ""

database = "test"

num_threads = 512

num_items = 2

query_list = [

    "BEGIN",

    "SELECT uid, path from catalog WHERE path IN (" + ", ".join("'%s'" for _ in range(num_items)) + ")",

    "SELECT uid, path from catalog WHERE uid IN (" + ", ".join("%s" for _ in range(num_items)) + ")",

    "REPLACE INTO catalog (`uid`, `path`) VALUES " + ", ".join("(%s, '%s')" for _ in range(num_items)),

    "COMMIT",

],

    "BEGIN",

    "SELECT uid, path from catalog WHERE path IN (" + ", ".join("'%s'" for _ in range(num_items)) + ")",

    "SELECT uid, path from catalog WHERE uid IN (" + ", ".join("%s" for _ in range(num_items)) + ")",

    "COMMIT",

],

connection_list = [

  pymysql.connect(host=host, port=port, user=user, password=password, database=database, cursorclass=pymysql.cursors.DictCursor)

  for _ in range(num_threads + 1)

def worker(thread_id):

"""

  Worker function for each thread.

"""

  conn = connection_list[thread_id]

  cursor = conn.cursor()

  try:

    for _ in range(100):

      uid_list = [random.randint(1, 10**12) for _ in range(num_items)]

      if thread_id % 2 == 0:

        query = query_list[0]

        args = sum(([e, e, e, e] for e in uid_list), [])

      else:

        query = query_list[1]

        args = sum(([e, e] for e in uid_list), [])

      for q in query:

        num_args = q.count('%s')

        args_ = args[:num_args]

        cursor.execute(q, args_)

        args = args[num_args:]

  except Exception as e:

    print(f"Thread {thread_id}: Error: {e}")

    raise

  finally:

    cursor.close()

if __name__ == "__main__":

  conn = connection_list[-1]

  cursor = conn.cursor()

  for i in range(100):

    threads = []

    start_time = time.time()

    for thread_id in range(num_threads):

        thread = threading.Thread(target=worker, args=(thread_id,))

        threads.append(thread)

        thread.start()

    for thread in threads:

        thread.join()

    end_time = time.time()

    print(f"Iteration {i}: executed in {end_time - start_time:.4f} seconds", flush=True)

    cursor.execute("SHOW TABLE STATUS LIKE 'catalog'")

    print(cursor.fetchall(), flush=True)

    cursor.execute("SHOW ENGINE INNODB STATUS")

    print([e for e in cursor.fetchone()['Status'].splitlines() if 'History' in e][0], flush=True)

Note that the script was updated from the original source version. Run as follows:

python3 test.py | tee log.txt

You will see per-iteration output. To capture details from the same, you can use:

cat log.txt | grep Histor | grep -o '[0-9]\+' | head -n23  # History length

cat log.txt | grep exec | sed 's|.*in||' | grep -o '[\.0-9]\+' | head -n23  # Execution time

cat log.txt | grep -o 'Rows.: [0-9]\+' | grep -o '[\.0-9]\+' | head -n23  # Number of rows as per SHOW TABLE STATUS

Attached (MDEV-36472_test_results_2.png) an overview of the first 23 iterations which show an significant increase in history length in 10.6.

Versions used
CS 10.5.29 c43d0a015f974c5a0142e6779332089a7a979853 (Optimized) Build 15/02/2025
CS 10.6.22 f1d7e0c17e33f77278e6226dd94aeb30fc856bf0 (Optimized) Build 15/02/2025

The increase in history length seems to go matched with a longer per-iteration duration, and a lesser number of rows processed. Based on results during test tuning, it is expected that additional tuning of the workload can further increase the history length and other offsets.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-36472_test_results_2.png
222 kB
2025-04-03 12:44

Issue Links

relates to

MDEV-17598 InnoDB index option for per-record transaction ID

Open

MDEV-34515 Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size

Closed

MDEV-33099 Regression in select count query while large transaction keeps undo logs

Needs Feedback

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Roel Van de Paar

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2025-04-03 11:42

Updated:: 2025-04-10 07:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration