[MDEV-32750] Performance degradation in MariaDB 10.11 Created: 2023-11-09  Updated: 2023-11-09

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.11.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Keshan Nageswaran Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: performance, regression
Environment:

Amazon AL2 I3EN.6XL
24 vCPUs, 192GB mem, 15000 GB NVMe SSD


Attachments: Text File 10_11_tpcc_stats.txt     Text File 10_4_tpcc_stats.txt     PNG File innodb_buffer_pool_read_requests.png     PNG File innodb_buffer_pool_reads.png     PNG File innodb_buffer_pool_write_requests.png     Text File mycnf.txt     PNG File slow_queries_10_11.png     PNG File tps_stats.png    

 Description   

Hi, we are trying to migrate MariaDB from 10.4.10 to 10.11.4. We addressed most of the breaking changes specified in official documentation.

Below are cnf changes made to 10.11.4.
innodb_checksum_algorithm = CRC32 (previously INNODB)
innodb_adaptive_hash_index = 1 (previously defaulted to 1)
innodb_buffer_pool_instances = 1 (previously logical CPU count ~24)

Removed all deprecated variables innodb_autoextend_increment, innodb_thread_concurrency, innodb_thread_sleep_delay, innodb_log_compressed_pages, innodb_sync_array_size.

Regarding innodb_buffer_pool_instances and innodb_buffer_pool_size, innodb_buffer_pool_size value gets calculated based on the following logic.

Calculation logic:

  • pool_size_bytes = available_physical_memory_bytes * 0.7
  • chunk_size_bytes = (pool_size_bytes / 1024 no.of. chunks ) / 1024 / 1024/ 1024
  • allocation_units = pool_size_bytes/ (chunck_size_bytes * pool_instances)
  • buffer_pool_size_bytes = allocation_units * chunck_size_bytes * pool_instances
  • buffer_pool_plus_buffer_size_bytes = buffer_pool_size_bytes * 1.1
  • huge_page_count = buffer_pool_plus_buffer_size_bytes / huge page size bytes (2*1024*1024)

pool_instances reflects innodb_buffer_pool_instances in above calculation logic.

In 10.11, innodb_buffer_pool_instances variable is removed (MDEV-15058) we are defaulting pool_instance to 1. Please advise anything to be changed to above logic as our performance test results shows more slow queries and innodb_buffer_pool not efficiently performing.

I have also attached the cnf config for reference as well. mycnf.txt

Below are details on performance and high disk I/O observed during testing:

Executed TPCC like workload provided by Percona-labs on both host (10.4.10 and 10.11.4). Intention is to find any regression with the current cnf config used in 10.11.

Setup
Hardware configuration: I3EN.6XL, 24 vCPUs, 192GB mem, 15000 GB NVMe SSD
Threads: 1024 (thread execution ran for 1hr) with unlimited tps
Dataset: 100 tables with 1 data warehouse

Observation:

  • We did see slow queries recorded in 10.11.4. slow_query config (long_query_time: 1 second)
    attached screenshot - slow_queries_10_11.png
  • We did see there were more request in 10.4 to be read from innodb_buffer_pool compared to 10.11. attached screenshot - innodb_buffer_pool_read_requests.png
  • Same with write request. attached screenshot - innodb_buffer_pool_write_requests.png
  • Overall tps stats seems low for 10.11 compared to 10.4. screenshot - tps_stats.png
  • I have also attached tpcc statistic summary from sysbench for both versions of mariadb (10_11_tpcc_stats.txt and 10_4_tpcc_stats.txt)

Appreciate if I get any suggestions to fine tune the performance in 10_11 and if any other metrics to be checked please guide on that.


Generated at Thu Feb 08 10:33:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.