[MDEV-6675] Use PGO in builds to help reduce icache miss overhead Created: 2014-09-01  Updated: 2015-04-24

Status: Stalled
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Kristian Nielsen Assignee: Kristian Nielsen
Resolution: Unresolved Votes: 0
Labels: performance

Attachments: File mdev6675.patch    

 Description   

I wrote about this in January:

https://lists.launchpad.net/maria-developers/msg06693.html
http://kristiannielsen.livejournal.com/17676.html
http://kristiannielsen.livejournal.com/18168.html

Even for simple queries, profiling shows that icache misses is a major
bottleneck to performance. The total amount of code executed is larger than
the icache, and prefetch is not sufficiently effective, making the CPU spend
most of its time waiting for new instructions to be fetched and decoded.

A partial but easy-to-implement fix is to use GCC profile-guided
optimisations. Tests have shown this to significantly reduce icache misses, as
well as causing other small improvements, for a nice total speedup in
single-threaded performance.

I already have a script that generates a suitable test load, and the commands
needed to build using PGO:

https://github.com/knielsen/gen_profile_load

  mkdir bld
  cd bld
  cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 --coverage" ..
  make
 
  tests/gen_profile_load
 
  cmake -DWITHOUT_PERFSCHEMA_STORAGE_ENGINE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 -fprofile-use -fprofile-correction" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-Wno-maybe-uninitialized -g -O3 -fprofile-use -fprofile-correction"
  make

It just needs to be integrated into the .deb build scripts (native Debian as
well as MariaDB 3rd-party repos) as well as bintar scripts.



 Comments   
Comment by Kristian Nielsen [ 2014-09-05 ]

Hm, I have a patch for using PGO when building .debs.

But I did a quick test with a bunch of simple queries, and the PGO binaries were not seen to be faster. In fact, they were seen to be a few percent slower.

So I need to analyse this before proceeding, need to find the explanation for this, to see if the PGO idea is at all viable.

Comment by Kristian Nielsen [ 2014-11-26 ]

I made a patch to use PGO in the debian package builds.

But then a quick benchmark showed that the resulting binaries were slower, not faster, than the original. This probably needs to be understood before going further with this task.

Comment by Kristian Nielsen [ 2015-04-24 ]

I attached my patch to this issue.

This is mainly extending debian/rules to build with profiling, then run the profile load, then build again using PGO. And it uses the load generator from here:

https://github.com/knielsen/gen_profile_load

It also includes the simple test script that showed poorer performance of the PGO binaries.

The patch should be complete (it is based on an older 10.0 tree). But the issue that even simple performance tests become slower using PGO probably needs to be investigated before using this ...

Generated at Thu Feb 08 07:13:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.