Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
11.8.6, 12.3.1
-
None
-
Related to performance
Description
A test was run by Steve Shaw using HammerDB TPROC-C against 12.3.1 on EPYC 9335 32-core/64 thread
processor : 63
|
vendor_id : AuthenticAMD
|
cpu family : 26
|
model : 2
|
model name : AMD EPYC 9335 32-Core Processor
|
stepping : 1
|
as CPUs/threads got busy, the TPM figure dropped from ~3.5M down to ~2.5M about 1 minute into the test as shown in the attached picture.
Steve Shaw hypothesis was that this behavior was fixed with https://jira.mariadb.org/browse/MDEV-16168 in 12.1.1, however then we used to either dip and recover or dip and drop much lower, now we start well, dip and it stays like this.
The config file is here: https://www.hammerdb.com/ci-config/maria.cnf which typically has given the best performance.
The system CPU goes up from native_spinlock_slowpath so the cause is futex contention.
Identical test on 8-core 16-thread shows no dip so it is CPU architecture/NUMA that highlights the contention.
Below is reported perf top output from before the dip:
Samples: 1M of event 'cycles:P', 4000 Hz, Event count (approx.): 1210334225952 lost: 0/0 drop: 0/0
|
Overhead Shared Object Symbol
|
7.67% mariadbd [.] ssux_lock_impl<true>::rd_lock_spin()
|
2.00% mariadbd [.] cmp_dtuple_rec_bytes(unsigned char const*, dict_index_t const&, dtuple_t const&, int*, unsigned long)
|
1.58% mariadbd [.] int page_cur_dtuple_cmp<false>(dtuple_t const&, unsigned char const*, dict_index_t const&, unsigned short
|
1.56% mariadbd [.] buf_page_get_gen(page_id_t, unsigned long, rw_lock_type_t, buf_block_t*, unsigned long, mtr_t*, dberr_t*)
|
1.50% mariadbd [.] check_table_access(THD*, privilege_t, TABLE_LIST*, bool, unsigned int, bool)
|
1.50% mariadbd [.] page_cur_search_with_match(dtuple_t const*, page_cur_mode_t, unsigned short*, unsigned short*, page_cur_t
|
1.28% mariadbd [.] mtr_t::rollback_to_savepoint(unsigned long, unsigned long)
|
1.25% mariadbd [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, unsigned long, unsigned
|
1.23% mariadbd [.] cmp_data(unsigned long, unsigned long, bool, unsigned char const*, unsigned long, unsigned char const*, u
|
1.03% mariadbd [.] btr_cur_t::search_leaf(dtuple_t const*, page_cur_mode_t, btr_latch_mode, mtr_t*)
|
1.03% mariadbd [.] build_template_field(row_prebuilt_t*, dict_index_t*, dict_index_t*, TABLE*, Field const*, unsigned long,
|
0.80% hammerdbcli [.] TEBCresume
|
0.80% mariadbd [.] MYSQLparse(THD*)
|
0.78% mariadbd [.] alloc_root
|
0.71% libc.so.6 [.] pthread_mutex_unlock
|
0.69% mariadbd [.] srw_mutex_impl<true>::wait_and_lock()
|
After dip:
Samples: 2M of event 'cycles:P', 4000 Hz, Event count (approx.): 2052979194279 lost: 0/0 drop: 0/0
|
Overhead Shared Object Symbol
|
8.69% [kernel] [k] native_queued_spin_lock_slowpath
|
7.76% mariadbd [.] ssux_lock_impl<true>::rd_lock_spin()
|
4.26% libc.so.6 [.] pthread_mutex_lock
|
1.63% mariadbd [.] cmp_dtuple_rec_bytes(unsigned char const*, dict_index_t const&, dtuple_t const&, int*, unsigned long)
|
1.24% mariadbd [.] int page_cur_dtuple_cmp<false>(dtuple_t const&, unsigned char const*, dict_index_t const&, unsigned short
|
1.18% mariadbd [.] page_cur_search_with_match(dtuple_t const*, page_cur_mode_t, unsigned short*, unsigned short*, page_cur_t
|
1.17% mariadbd [.] buf_page_get_gen(page_id_t, unsigned long, rw_lock_type_t, buf_block_t*, unsigned long, mtr_t*, dberr_t*)
|
0.99% mariadbd [.] check_table_access(THD*, privilege_t, TABLE_LIST*, bool, unsigned int, bool)
|
0.98% mariadbd [.] mtr_t::rollback_to_savepoint(unsigned long, unsigned long)
|
0.97% mariadbd [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned short*, unsigned long, unsigned
|
0.94% libc.so.6 [.] pthread_mutex_unlock
|
0.94% mariadbd [.] cmp_data(unsigned long, unsigned long, bool, unsigned char const*, unsigned long, unsigned char const*, u
|
0.82% mariadbd [.] build_template_field(row_prebuilt_t*, dict_index_t*, dict_index_t*, TABLE*, Field const*, unsigned long,
|
0.80% mariadbd [.] btr_cur_t::search_leaf(dtuple_t const*, page_cur_mode_t, btr_latch_mode, mtr_t*)
|
0.75% mariadbd [.] srw_mutex_impl<true>::wait_and_lock()
|
0.64% mariadbd [.] alloc_root
|
0.63% mariadbd [.] MYSQLparse(THD*)
|
0.63% hammerdbcli [.] TEBCresume
|
Flamegraphs are not available for this run unfortunately.
thread_pool settings were increased and the performance and profile was the same.
It has been confirmed that the AMD EPYC 9454P 48-Core Processor is not reproducing the same issue, with the only difference in thread_pool setting which has been ruled out.
It can be speculated that this is related to MDEV-38814 (or MDEV-39272), but there is no hard evidence.
Currently this only reproduced on AMD EPYC 9335 32-Core Processor.
EDIT: another test have been performed on the previous system now equipped with AMD EPYC 9255 24-Core Processor, the behavior reproduces also there both in 12.3.1 and 11.8.6
Attachments
Issue Links
- relates to
-
MDEV-38814 High rate of index_lock_upgrades due to btr_cur_need_opposite_intention() mostly returning true
-
- In Progress
-
-
MDEV-16168 Performance regression on sysbench write benchmarks from 10.2 to 10.3
-
- Closed
-