Hi Sergey: Here's what I found: The cacheline used for my_pthread_fastmutex_t is the hottest in all of MariaDB, getting 10X more shared data cacheline accesses than any other. The accesses that don't miss in the last level cache aren't interesting. But this particular line also got the majority of LLC misses, where the data resided in a modified cache on another node. Of the loads that missed due to the data being in a remote modified cache 31% of them were to the "spins" field, and 41% of them were to the "rng_state" field. The rest were in the mutex. Of the stores to that line that missed (couldn't get ownership of the line because a remote cpu was currently modifying it), 14% of those were to the "rng_state" field, and 83% were to fields in the mutex. All accesses to the rng_state and spins fields came from the mysqld binary, while all accesses to the mutex came from libpthread.so (except for a small amount coming from the kernel to access the lock). All the above accesses came from cpus on all four nodes in the system - which understandably leads to wasted cycles tugging and synchronizing the cacheline. Many of the loads to addresses in those structures had "average" times of 1000 to 4000+ cycles. When I made the changes I suggested, the hotness of that cacheline (where hotness is determined by how many LLC misses needed to be satisfied by modified cachelines on remote nodes), dropped considerably. And both the spins and rng_state fields dropped off the radar entirely. The number of shared accesses to that cacheline dropped by 2/3. The mutex is still very hot and dominates the performance, as I'm sure you know. I ran it on prototype hardware (4-socket, 128GB) that we're qualifying our Linux bits on. The disk was an SSD located on a PCIe3 bus. Now that I think of it, if you're using spinning media for the database, it may suppress some of the gains. I've seen this with other performance work. I build MariaDB using rpmbuild - which means it got built the same way as our RHEL OS builds build it. Here is the script I use for every run. I was focusing on the transaction count/100 sec. I don't have regular access to that 4-node system, so retesting soon might be difficult. I am not a regular MariaDB user, so I welcome any comments you might have. Joe # Script to build database and run sysbench # First stop db in case it's running. mysqladmin shutdown sleep 5 # Start with a fresh database. # rm -rf /home/perf1/data/mysql mkdir /home/perf1/data/mysql rm -rf /home/perf2/LOG/mysql_log mkdir /home/perf2/LOG/mysql_log # Initial parameters--default-table-type=INNODB mysqld_safe --user=root --basedir=/usr --skip-grant-tables \ --innodb_data_home_dir=/home/perf1/data/mysql \ --innodb_buffer_pool_size=20480M \ --innodb_log_group_home_dir=/home/perf2/LOG/mysql_log --innodb_log_buffer_size=64M \ --innodb_additional_mem_pool_size=32M --innodb_flush_log_at_trx_commit=0 \ --innodb_log_file_size=1G --innodb_thread_concurrency=1000 --max_connections=1000 \ --table_cache=4096 --innodb_flush_method=O_DIRECT & sleep 20 # Drop and create the database # drop database sbtest (uneeded?) cat <<'EOF' > `pwd`/a.a7 drop database sbtest; EOF mysql < `pwd`/a.a7 cat <<'EOF' > `pwd`/a.a7 create database sbtest; EOF mysql < `pwd`/a.a7 sleep 5 # Prepare the database and load data sysbench prepare --test=oltp --mysql-table-engine=innodb --oltp-table-size=10000000 sleep 2 # Run sysbench at various thread levels. # sysbench --test=oltp --num-threads=12 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=12 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=12 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=20 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=20 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=20 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=30 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=30 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=40 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=40 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=50 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=50 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=60 --max-requests=1000000 --max-time=100 run sysbench --test=oltp --num-threads=60 --max-requests=1000000 --max-time=100 run mysqladmin shutdown