[MDEV-7943] pthread_getspecific() takes 0.76% in OLTP RO Created: 2015-04-09 Updated: 2015-06-19 Resolved: 2015-06-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | 10.1 |
| Fix Version/s: | 10.1.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sergey Vojtovich | Assignee: | Sergey Vojtovich |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Epic Link: | Performance: micro optimizations |
| Sprint: | 10.1.6-1 |
| Description |
|
Data comes from Sandy Bridge system running sysbench OLTP RO in 1 thread against 1 table. Call graphs:
The most frequent caller is trx_is_interrupted()/thd_kill_level(): it calls current_thd unconditionally. |
| Comments |
| Comment by Sergei Golubchik [ 2015-04-09 ] | |
|
one option would be to use thread local variables in gcc. they might be faster (needs to be tested) and with macros one can easily hide the underlying implementation (getspecific or tls) from the caller. | |
| Comment by Sergey Vojtovich [ 2015-04-28 ] | |
|
serg, please review 3 patches for this task. | |
| Comment by Sergey Vojtovich [ 2015-05-13 ] | |
|
serg, please also review 3-d patch for this task. | |
| Comment by Alexey Kopytov [ 2015-05-20 ] | |
|
Out of curiosity, what happened to the thread-local variables idea? Has it proved to be not fast enough to replace pthread_getspecific() calls? | |
| Comment by Sergey Vojtovich [ 2015-05-20 ] | |
|
alexeykopytov, according to my study (with no good benchmarks though) TLS should be faster than pthread_getspecific(), but still slower than passing function args. Currently we reduced number of pthread_getspecific() calls from ~1100 to ~300 per OLTP RO transaction. Alas there're different workloads which won't benefit from this. The plan is: pass THD through whenever it is possible, otherwise fallback to TLS if there're worthy cases. | |
| Comment by Alexey Kopytov [ 2015-05-20 ] | |
|
I see, thanks. I was asking, because I was considering the same idea for Percona Server a few years ago. Leveraging thread-local storage looked like a low-hanging fruit to optimize all those pthread_getspecific() call sites without introducing invasive code changes, but I never got around to evaluating it. | |
| Comment by Sergey Vojtovich [ 2015-06-18 ] | |
|
serg, please review another patch for this bug:
| |
| Comment by Sergey Vojtovich [ 2015-06-19 ] | |
|
Number of pthread_getspecific() calls was reduced from ~1100 to 290. Further improvements (if any) will be done separately. |