[MDEV-27702] Innodb Purge is very slow on 64-bit Arm architecture Created: 2022-02-01  Updated: 2022-02-02

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.4.21, 10.4.22, 10.6.5
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Niranjan Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: performance
Environment:

64-bit Arm architecture


Attachments: PNG File arm.read.write.png     PNG File history length for loading the same amount of the data on arm and x86.png     PNG File x86.read.write.png    

 Description   
  • Purge process is very slow on 64-bit Arm architecture.
    == My test machine has 2 cores of CPU and 4 GB of RAM
  • Running the below sysbench command on the mariaDB server, increases the History list length to a very high value and it takes around ~15 minutes for it to drop to a lower value.
    As and when the workload increases, the HLL value reaches a very high value and takes even days to drop to a low value.

sysbench ./oltp_read_write.lua --db-driver=mysql --mysql-db=test --mysql-user=root --mysql-password=***** --mysql-host=172.31.35.75 --table_size=16600000 --tables=10 prepare --threads=5

  • Same command run on a MariaDB server set up on "x86_64" machine does "NOT" have any issue.
    == Same number of cpu cores and memory as that used in "ARM64"
    == Same engine configurations (.cnf file)
  • Purge configuration:

tbd-mar-hll-10421> show global variables like '%purge%';
------------------------------------------------+

Variable_name Value

------------------------------------------------+

aria_log_purge_type immediate
innodb_max_purge_lag 0
innodb_max_purge_lag_delay 0
innodb_max_purge_lag_wait 4294967295
innodb_purge_batch_size 300
innodb_purge_rseg_truncate_frequency 128
innodb_purge_threads 4
relay_log_purge ON

------------------------------------------------+
8 rows in set (0.00 sec)



 Comments   
Comment by Niranjan [ 2022-02-01 ]

Modifying the purge configurations such as "innodb_purge_threads" and/or having more number of CPU cores indeed makes the purge faster.

Having said that, the main concern is the performance difference between "x86_64" and "arm64" machines.
With exact same engine configuration and workload, the purge operation takes a lot of time to complete on "arm64"

In the above(sysbench) test :
== purge took ~15 minutes to complete after the sysbench test completed : on "arm64"
== purge completed within a minute after the sysbench test completed : on "x86_64"

Comment by Krunal Bauskar [ 2022-02-02 ]

1. Issue is that purge is slow is on ARM (when compared to x86).

2. So I tried running an extended workload to help trace how the history length continues to reduce over a period of time.

<NOTE: My ARM and x86 VM are not comparable so don't compare the tps directly but they have same number of cores and same kind of Cloud Disk>

Check the graph attached. I see ARM continue to purge as quickly as x86.

I don't have a that small VM so I have used VM with 24 cores out of which 20 cores are allotted to server and 4 for sysbench workload.
Also, if you need to check the exact configuration you can find it here. https://github.com/mysqlonarm/benchmark-suites/blob/master/mysql-sbench/conf/mdb.cnf/100tx3m_106_cpubound.cnf
server-version: 10.6.5

Quick note about workload: 5 round of 300 sec each with 20 sec gap after each round and then just continue to monitor stats (like history-length, etc..) for another 100 secs.

Comment by Krunal Bauskar [ 2022-02-02 ]

I am not sure why someone would like to measure history length during the loading of the data (given it hardly grow the significant level) but assuming there is use-case I tried to measure the same too as suggested in the original text.
Again, ARM was able to load the data faster than x86 and also was a forerunner in fading out history length. check the attached graph.

Comment by Marko Mäkelä [ 2022-02-02 ]

Thank you for checking this, krunalbauskar.

The difference could very well be due to something else than the instruction set architecture or its implementation in a CPU core. If the ARM system has only 2 ARMv8 cores, then I would think that it should be compared to a similar AMD64 implementation. I see that all Intel Atom products that were launched in 2021 feature 2 or 4 cores. A dual-core Atom might be more comparable to a 2-core ARMv8 system. Also the I/O and memory bus architecture could play a significant role.

When it comes to sysbench prepare, it could be sped up in MariaDB 10.6 and later by applying a change like this to enable the use of bulk insert. In 10.6, that will disable row-level undo logging when inserting into an empty table. MDEV-24621 in 10.7 improved the index tree creation too. Unfortunately, due to current limitations of the storage engine interface, MDEV-24621 only applies to the first statement that inserts into an empty table or partition.

Generated at Thu Feb 08 09:54:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.