[MCOL-4117] Don't disable ccache in ColumnStore CMake Created: 2020-06-26  Updated: 2020-09-21  Resolved: 2020-07-06

Status: Closed
Project: MariaDB ColumnStore
Component/s: Build
Affects Version/s: None
Fix Version/s: 5.5.1

Type: Bug Priority: Major
Reporter: Otto Kekäläinen Assignee: Otto Kekäläinen
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File screenshot-1.png     PNG File screenshot-2.png     PNG File screenshot-3.png     PNG File screenshot-4.png    
Issue Links:
Relates
relates to MDEV-23031 Make MariaDB Server build reproducibly Stalled
relates to MCOL-4057 Package ColumnStore 5.x with 10.5 ser... Closed

 Description   

While doing test builds, I noticed CS builds are super slow. The basic amd64 build increased from 45 minutes before CS merge on 10.5 to 4+ hours on Launchpad: https://launchpad.net/~mysql-ubuntu/+archive/ubuntu/mariadb-10.5/+builds?build_text=&build_state=all

My Salsa-CI builds at https://salsa.debian.org/mariadb-team/mariadb-server/-/pipelines are timing out after 3 hours now all the time, so this phenomenon hurts the MCOL_4057 Debian packaging effort since I don't have full QA systems at disposal now.

Related to this slowness, danblack noticed that storage/columnstore/columnstore/CMakeLists.txt is hard-coded to specifically disable ccache:

OPTION(USE_CCACHE "reduce compile time with ccache." FALSE)
if(NOT USE_CCACHE)
    set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE "")
    set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK "")
else()
  find_program(CCACHE_FOUND ccache)
  if(CCACHE_FOUND)
      set_property(GLOBAL PROPERTY RULE_LAUNCH_COMPILE ccache)
      set_property(GLOBAL PROPERTY RULE_LAUNCH_LINK ccache)
  endif(CCACHE_FOUND)
endif()

Seems this was added by mott.david.j@gmail.com in https://github.com/mariadb-corporation/mariadb-columnstore-engine/commit/4b9d046c6e338d6fe63a8a960d9df62b6efb68f7 but there are no inline comments or git commit message about why this was made as is now.

Could we please remove this setting and let ccache run automatically if it is installed and active in current path? That would be the logic the rest of the MariaDB Server follows and most other software in general. If a user has ccache on their system and gcc is wrapped with ccache, it feels kind of odd that for one component in MariaDB Server you specifically circumvent this?

Otherwise I would need to add -DUSE_CCACHE=YES to debian/rules to activate this?



 Comments   
Comment by Otto Kekäläinen [ 2020-06-26 ]

I tested adding -DUSE_CCACHE=YES to debian/rules, but it did not seem to have any effect. On Zulip drrtuy said "ccache proved to be useless". I wonder what the "proof" was. When I ran two consecutive builds in an identical way on the identical commit, the second build had a ccache hit rate of 1% after 20 minutes of running, and the counter had reached only 30%. So it seems something is completely busting the ccache, and it is not enough to enable ccache, the underlying bug needs to be fixed as well for ccache to have an effect.

For reference, one build line:

[ 30%] Building CXX object storage/columnstore/columnstore/dbcon/execplan/CMakeFiles/execplan.dir/arithmeticcolumn.cpp.o
cd /tmp/build/source/builddir/storage/columnstore/columnstore/dbcon/execplan && ccache /usr/lib/ccache/x86_64-linux-gnu-g++  -DDBUG_TRACE -DHAVE_CONFIG_H -D_FILE_OFFSET_BITS=64 -Dexecplan_EXPORTS -I/tmp/build/source/wsrep-lib/include -I/tmp/build/source/wsrep-lib/wsrep-API/v26 -I/tmp/build/source/builddir/include -I/tmp/build/source/builddir/storage/columnstore/columnstore -I/tmp/build/source/storage/columnstore/columnstore/dbcon/execplan/. -I/tmp/build/source/storage/columnstore/columnstore/dbcon/execplan/.. -I/tmp/build/source/storage/columnstore/columnstore/dbcon/execplan/../.. -I/tmp/build/source/include -I/tmp/build/source/storage/columnstore/columnstore/utils/messageqcpp -I/tmp/build/source/storage/columnstore/columnstore/writeengine/shared -I/tmp/build/source/storage/columnstore/columnstore/utils/idbdatafile -I/tmp/build/source/storage/columnstore/columnstore/utils/loggingcpp -I/tmp/build/source/builddir/storage/columnstore/columnstore/utils/loggingcpp -I/tmp/build/source/storage/columnstore/columnstore/utils/configcpp -I/tmp/build/source/storage/columnstore/columnstore/utils/compress -I/tmp/build/source/storage/columnstore/columnstore/versioning/BRM -I/tmp/build/source/storage/columnstore/columnstore/utils/rowgroup -I/tmp/build/source/storage/columnstore/columnstore/utils/common -I/tmp/build/source/storage/columnstore/columnstore/utils/dataconvert -I/tmp/build/source/storage/columnstore/columnstore/utils/rwlock -I/tmp/build/source/storage/columnstore/columnstore/utils/funcexp -I/tmp/build/source/storage/columnstore/columnstore/oamapps/alarmmanager -I/tmp/build/source/storage/columnstore/columnstore/utils -I/tmp/build/source/storage/columnstore/columnstore/oam/oamcpp -I/tmp/build/source/storage/columnstore/columnstore/dbcon/ddlpackageproc -I/tmp/build/source/storage/columnstore/columnstore/dbcon/ddlpackage -I/tmp/build/source/storage/columnstore/columnstore/dbcon/execplan -I/tmp/build/source/storage/columnstore/columnstore/utils/startup -I/tmp/build/source/storage/columnstore/columnstore/dbcon/joblist -I/tmp/build/source/storage/columnstore/columnstore/writeengine/wrapper -I/tmp/build/source/storage/columnstore/columnstore/writeengine/server -I/tmp/build/source/storage/columnstore/columnstore/dbcon/dmlpackage -I/tmp/build/source/storage/columnstore/columnstore/writeengine/client -I/tmp/build/source/storage/columnstore/columnstore/dbcon/dmlpackageproc -I/tmp/build/source/storage/columnstore/columnstore/utils/cacheutils -I/tmp/build/source/storage/columnstore/columnstore/utils/mysqlcl_idb -I/tmp/build/source/storage/columnstore/columnstore/utils/querytele -I/tmp/build/source/storage/columnstore/columnstore/utils/thrift -I/tmp/build/source/storage/columnstore/columnstore/utils/joiner -I/tmp/build/source/storage/columnstore/columnstore/utils/threadpool -I/tmp/build/source/storage/columnstore/columnstore/utils/batchloader -I/tmp/build/source/storage/columnstore/columnstore/utils/ddlcleanup -I/tmp/build/source/storage/columnstore/columnstore/utils/querystats -I/tmp/build/source/storage/columnstore/columnstore/writeengine/xml -I/tmp/build/source/sql -I/tmp/build/source/include/../pcre -I/tmp/build/source/storage/columnstore/columnstore/utils/udfsdk -I/tmp/build/source/storage/columnstore/columnstore/utils/libmysql_client -I/usr/include/readline -I/usr/include/libxml2  -g -O2 -fdebug-prefix-map=/tmp/build/source=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -pie -fPIC -Wl,-z,relro,-z,now -fstack-protector --param=ssp-buffer-size=4 -DCOLUMNSTORE_MATURITY=MariaDB_PLUGIN_MATURITY_BETA -O3 -g -static-libgcc -fno-omit-frame-pointer -fno-strict-aliasing -Wno-uninitialized -fno-omit-frame-pointer -D_FORTIFY_SOURCE=2 -DDBUG_OFF -Wall -Wextra -Wformat-security -Wno-format-truncation -Wno-init-self -Wno-nonnull-compare -Wno-unused-parameter -Woverloaded-virtual -Wnon-virtual-dtor -Wvla -Wwrite-strings -fPIC   -Wdate-time -D_FORTIFY_SOURCE=2 -std=c++11 -o CMakeFiles/execplan.dir/arithmeticcolumn.cpp.o -c /tmp/build/source/storage/columnstore/columnstore/dbcon/execplan/arithmeticcolumn.cpp

Comment by Otto Kekäläinen [ 2020-06-26 ]

Comparing two builds I can see that at least the contents of INFO_BIN and INFO_SRC have runtime time and hostname in them, breaking build reproducibility (https://reproducible-builds.org/), but that should not affect ccache.

Looking at the CMakeOutput.log there is some temporary path name that changes from run to run, could it maybe affect ccache?

Comment by Otto Kekäläinen [ 2020-06-26 ]

I removed the whole USE_CCACHE thing and the build ended up omitting one extra 'ccache', no other changes. This seems like a rational thing to do.

Note that this USE_CCACHE is not just about ColumnStore. When combined with the server sources, it spills out to the entire server build.

Unfortunately ccache -s shows a low hit rate for the entire MariaDB Server build (ranging between 2 and 70 % on the same build on consecutive runs, for reasons I don't grasp), so this apparently did not solve it. Something is busting the ccache.

Comment by Otto Kekäläinen [ 2020-06-26 ]

Build time for latest build was 59 minutes and ccache -s showed

cache hit (direct)                   937
cache hit (preprocessed)              34
cache miss                          2637
cache hit rate                     26.91 %
called for link                      808
called for preprocessing              22
compile failed                        26
preprocessor error                    33
no input file                          4
cleanups performed                    76
files in cache                      5173
cache size                           4.3 GB
max cache size                       5.0 GB

Two builds of the same sources should generate the exact same results and binaries. For MariaDB Server this does not happen, apparently because timestamps are injected during the build. This is potentially also the reason why ccache has so many misses.

diff -r -U 0 server.build1/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/CXX.includecache server.build2/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/con
--- server.build1/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/CXX.includecache    2020-06-26 16:05:03.561817052 +0300
+++ server.build2/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/CXX.includecache    2020-06-26 20:55:22.776550329 +0300
@@ -445,2 +444,0 @@
-storage/columnstore/columnstore/utils/loggingcpp/messageids.h
-
diff -r -U 0 server.build1/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/depend.internal server.build2/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/conf
--- server.build1/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/depend.internal     2020-06-26 16:05:03.561817052 +0300
+++ server.build2/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/depend.internal     2020-06-26 20:55:22.772550275 +0300
@@ -40,2 +39,0 @@
- storage/columnstore/columnstore/utils/loggingcpp/errorids.h
- storage/columnstore/columnstore/utils/loggingcpp/messageids.h
@@ -78 +75,0 @@
- storage/columnstore/columnstore/utils/loggingcpp/messageids.h
diff -r -U 0 server.build1/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/depend.make server.build2/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcp
--- server.build1/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/depend.make 2020-06-26 16:05:03.561817052 +0300
+++ server.build2/builddir/storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/depend.make 2020-06-26 20:55:22.772550275 +0300
@@ -39,2 +38,0 @@
-storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/configcpp.cpp.o: storage/columnstore/columnstore/utils/loggingcpp/errorids.h
-storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/configcpp.cpp.o: storage/columnstore/columnstore/utils/loggingcpp/messageids.h
@@ -77 +74,0 @@
-storage/columnstore/columnstore/utils/configcpp/CMakeFiles/configcpp.dir/configstream.cpp.o: storage/columnstore/columnstore/utils/loggingcpp/messageids.h

Comment by Otto Kekäläinen [ 2020-06-26 ]

Filed PR https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1314 that helps in this but does not yet make ccache hit rate increase.

Comment by Otto Kekäläinen [ 2020-06-27 ]

Skipping the CS build makes the ccache work again for the whole server:

--- a/debian/autobake-deb.sh
+++ b/debian/autobake-deb.sh
@@ -43,6 +43,11 @@ then
+sed 's|-DPLUGIN_COLUMNSTORE=YES|-DPLUGIN_COLUMNSTORE=NO|' -i debian/rules
+sed "/Package: mariadb-plugin-columnstore/,/^$/d" -i debian/control

ccache -s
 
stats zero time                     Sat Jun 27 17:42:16 2020
cache hit (direct)                  2909
cache hit (preprocessed)              17
cache miss                             1
cache hit rate                     99.97 %
called for link                      659
called for preprocessing              22
compile failed                        21
preprocessor error                    32
no input file                          4
cleanups performed                     0
files in cache                      9584
cache size                           4.6 GB
max cache size                       5.0 GB

Build time for my own personal debian build thing with uploads to Launchpad included total run time is 11-33 minutes, and ccache hit rates vary 50-99% depending from commit to commit, while re-building the same commit naturally is fastest and has 99% hit rate.

I desperately need to get this working with ColumnStore as well, otherwise my workflow is all messed up and QA efforts and test builds hindered.

Comment by Otto Kekäläinen [ 2020-06-27 ]

On Salsa-CI, the builds without ColumnStore take about 1,5 hours: https://salsa.debian.org/mariadb-team/mariadb-server/-/pipelines/150040/builds
With ColumnStore included, builds timeout after 3 hours. This should be fixed, otherwise I cannot use Salsa-CI for QA on MariaDB Server anymore...

On Launchpad the build time is similar: https://launchpad.net/~mysql-ubuntu/+archive/ubuntu/mariadb-10.5/+builds?build_text=&build_state=all
When ColumnStore is enabled, it more than doubles the build time for amd64 (as CS does not build on other platforms) from 1,5 hours to 4 hours.

Comment by Otto Kekäläinen [ 2020-07-05 ]

My test builds at https://salsa.debian.org/mariadb-team/mariadb-server/-/pipelines/152618 cannot finish, since they take over 3 hours, which is the maximum time limit. Even though MCOL-4030 has been addressed the extreme build time is still an issue (MariaDB Server with ColumnStore takes more than twice as long to build).

Should I maybe file a new issue about it? Fixing ccache alone will not make the build more efficient.

Comment by Otto Kekäläinen [ 2020-07-06 ]

Closed via https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1314

Comment by Otto Kekäläinen [ 2020-07-28 ]

As a follow-up, today I ran a new build on 10.5 and it seems the ccache did now cover the entire build, including ColumnStore:

stats zero time                     Tue Jul 28 11:24:01 2020
cache hit (direct)                  3588
cache hit (preprocessed)              18
cache miss                             1
cache hit rate                     99.97 %
called for link                      807
called for preprocessing              22
compile failed                        27
preprocessor error                    33
no input file                          4
cleanups performed                     0
files in cache                      9906
cache size                           8.7 GB
max cache size                      10.0 GB

Generated at Thu Feb 08 02:47:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.