Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-2089

High CPU usage and slow performance appears when load data with remote mcsimport

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 1.2.2
    • 1.2.4
    • None
    • None
    • mcsimport tool run remotely to mcs single server
    • 2019-02, 2019-03

    Description

      High CPU usage and slow performance appears when load data with remote mcsimport

      run autopilot cpimportLineitem test case group with option mcsimport .All test passed
      but it's observed height cpu usage and tests finished slowly even in comparison to
      maridb mysqlimport which is using SQL statement: LOAD DATA LOCAL INFILE on MCS.

      how to repeat:
      run remotely autopilot cpimportLineitem test case group with option mcsimport
      run remotely autopilot cpimportLineitem test case group with option mysqlimport
      ./autopilot.sh features cpimportLineitem

      Remote Load Method Elapsed Time [s]
      MCSIMPORT 6918
      MYSQLIMPORT 2180

      during all time of data loading with mcsimport was observed high cpu usage

      # top
      top - 14:04:09 up 53 days,  2:36,  4 users,  load average: 0.83, 0.82, 0.62
      Tasks: 180 total,   3 running, 167 sleeping,   8 stopped,   2 zombie
      %Cpu(s): 10.3 us,  0.2 sy,  0.0 ni, 89.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
      KiB Mem : 65975072 total,   454704 free, 13455012 used, 52065356 buff/cache
      KiB Swap:  1048572 total,   745468 free,   303104 used. 49717568 avail Mem
       
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
      21814 root      20   0  500040 170864   2524 R  83.7  0.3   9:40.07 mcsimport
      10284 root      20   0  162004   2320   1584 R   0.3  0.0   0:12.78 top
      17218 mysql     20   0 4911300   1.0g  17944 S   0.3  1.6  76:38.07 mysqld
          1 root      20   0  191548   2920   1924 S   0.0  0.0   0:17.49 systemd
          2 root      20   0       0      0      0 S   0.0  0.0   0:00.60 kthreadd
          3 root      20   0       0      0      0 S   0.0  0.0   0:03.22 ksoftirqd/0
          5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
          7 root      rt   0       0      0      0 S   0.0  0.0   0:22.88 migration/0
          8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
      
      

      BF Passed rowCnt=1024 actRowCnt=1024
      BC Passed rowCnt=1025 actRowCnt=1025
      TF Passed rowCnt=253952 actRowCnt=253952
      TC Passed rowCnt=253953 actRowCnt=253953
      CF Passed rowCnt=516096 actRowCnt=516096
      CC Passed rowCnt=516097 actRowCnt=516097
      EF Passed rowCnt=8380416 actRowCnt=8380416
      EC Passed rowCnt=8380417 actRowCnt=8380417
      SF Passed rowCnt=33546240 actRowCnt=33546240
      SW Passed rowCnt=33546241 actRowCnt=33546241
      PF Passed rowCnt=67100672 actRowCnt=67100672
      PC Passed rowCnt=67100673 actRowCnt=67100673
      [root@cps tests]#
      
      

      trace get during the loading of EC test

      # gdb -batch -ex 'thr a a bt' -p=$(pgrep mcsimport)
      [New LWP 21818]
      [New LWP 21817]
      [New LWP 21816]
      [New LWP 21815]
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib64/libthread_db.so.1".
      0x00007fd4e3962cc9 in ____strtod_l_internal () from /lib64/libc.so.6
       
      Thread 5 (Thread 0x7fd4e250a700 (LWP 21815)):
      #0  0x00007fd4e2730995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007fd4e31a4fc9 in uv_cond_wait () from /lib64/libuv.so.1
      #2  0x00007fd4e3194136 in worker () from /lib64/libuv.so.1
      #3  0x00007fd4e272ce25 in start_thread () from /lib64/libpthread.so.0
      #4  0x00007fd4e3a22bad in clone () from /lib64/libc.so.6
       
      Thread 4 (Thread 0x7fd4e1d09700 (LWP 21816)):
      #0  0x00007fd4e2730995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007fd4e31a4fc9 in uv_cond_wait () from /lib64/libuv.so.1
      #2  0x00007fd4e3194136 in worker () from /lib64/libuv.so.1
      #3  0x00007fd4e272ce25 in start_thread () from /lib64/libpthread.so.0
      #4  0x00007fd4e3a22bad in clone () from /lib64/libc.so.6
       
      Thread 3 (Thread 0x7fd4e1508700 (LWP 21817)):
      #0  0x00007fd4e2730995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007fd4e31a4fc9 in uv_cond_wait () from /lib64/libuv.so.1
      #2  0x00007fd4e3194136 in worker () from /lib64/libuv.so.1
      #3  0x00007fd4e272ce25 in start_thread () from /lib64/libpthread.so.0
      #4  0x00007fd4e3a22bad in clone () from /lib64/libc.so.6
       
      Thread 2 (Thread 0x7fd4e0d07700 (LWP 21818)):
      #0  0x00007fd4e2730995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007fd4e31a4fc9 in uv_cond_wait () from /lib64/libuv.so.1
      #2  0x00007fd4e3194136 in worker () from /lib64/libuv.so.1
      #3  0x00007fd4e272ce25 in start_thread () from /lib64/libpthread.so.0
      #4  0x00007fd4e3a22bad in clone () from /lib64/libc.so.6
       
      Thread 1 (Thread 0x7fd4e4967740 (LWP 21814)):
      #0  0x00007fd4e3962cc9 in ____strtod_l_internal () from /lib64/libc.so.6
      #1  0x00007fd4e453d0bb in mcsapi::ColumnStoreDataConvert::convert (toMeta=toMeta@entry=0x7ffd3f545630, cont=0x34db038, fromValue=...) at /data/buildbot/bb-worker/centos7/mariadb-columnstore-api/src/util_dataconvert.cpp:1151
      #2  0x00007fd4e45384bc in mcsapi::ColumnStoreBulkInsertImpl::setCharColumn (this=0xa8f1e0, columnNumber=6, value=..., status=0x7ffd3f545704) at /data/buildbot/bb-worker/centos7/mariadb-columnstore-api/src/mcsapi_bulk.cpp:481
      #3  0x00007fd4e45387a8 in mcsapi::ColumnStoreBulkInsert::setColumn (this=0xa905e0, columnNumber=<optimized out>, value=..., status=<optimized out>) at /data/buildbot/bb-worker/centos7/mariadb-columnstore-api/src/mcsapi_bulk.cpp:75
      #4  0x0000000000431ef2 in MCSRemoteImport::import() ()
      #5  0x000000000042ceab in main ()
      
      

      Attachments

        Issue Links

          Activity

            Please update the "Affected Version" field in the jira item winstone

            dshjoshi Dipti Joshi (Inactive) added a comment - Please update the "Affected Version" field in the jira item winstone

            Made mcsimport multi threaded.
            One thread reads the csv file, one file parses it into csv fields, and one thread writes the csv fields to CS.
            They communicate through 2 FiFo queues implemented utilizing ring buffers.

            Performance gain is around 25% compared to the single threaded 1.2.2 implementation of mcsimport.
            Used the test suite's load_test_2 (1.2GiB CSV file) as reference.
            On the downside of using more threads and buffers the implementation now consumes around 10 times more RAM and 1.5 times more CPU cycles.

            Test suite successfully executed on Windows 10 against a remote CS 1.2.2-1 instance on CentOS 7.

            jens.rowekamp Jens Röwekamp (Inactive) added a comment - Made mcsimport multi threaded. One thread reads the csv file, one file parses it into csv fields, and one thread writes the csv fields to CS. They communicate through 2 FiFo queues implemented utilizing ring buffers. Performance gain is around 25% compared to the single threaded 1.2.2 implementation of mcsimport. Used the test suite's load_test_2 (1.2GiB CSV file) as reference. On the downside of using more threads and buffers the implementation now consumes around 10 times more RAM and 1.5 times more CPU cycles. Test suite successfully executed on Windows 10 against a remote CS 1.2.2-1 instance on CentOS 7.

            For QA:

            • execute test suite (or verify buildbot's execution)
            • as some major changes have been introduced please test it more extensively
            • also verify on bigger datasets if there is a performance gain compared to the old 1.2.2 implementation
            jens.rowekamp Jens Röwekamp (Inactive) added a comment - For QA : execute test suite (or verify buildbot's execution) as some major changes have been introduced please test it more extensively also verify on bigger datasets if there is a performance gain compared to the old 1.2.2 implementation

            I've extended my tests / profiling to also examine the performance impact of multi-threaded mcsimport on Linux operating systems. They differ from the results for Windows.

            First test case with CentOS 7 and Ubuntu 18.04 in a Virtual Box environment
            A single server installation of ColumnStore 1.2.2-1 from the package repo was performed. mcsimport is executed on the same machine.
            1.2.3 labels the single-threaded mcsimport from develop-1.2 (as baseline), MCOL-2089 the new multi-threaded implementation and -O3 the optimizer flag used during compiling. Executed was load_test_2 from mcsimports regression test suite which imports a single 1.28GB csv file with three columns of integers.

            Installed kernels:
            Linux centos7 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
            Linux ubuntu18 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

            Virtual Box tests against ColumnStore 1.2.2-1 on VMs with 8GiB of memory and 4 cores and 8 threads. [host maximum]
            In the gcc-7 case of CentOS 7, mcsapi was also compiled with gcc-7. (load_test_2)

                                                  1.2.3 | MCOL-2089      -O3
            CentOS 7      gcc       4.8.5-16      444s  | 403s           334s | 320s
                                                  429s  | 405s           326s | 337s
                                                        |                332s | 337s
                                               [436.5s] | [-7.4%]    [-24.2%] | [-24.1%]
                          gcc-7     7.3.1-5       432s  | 380s           328s | 333s
                                                  441s  | 371s           331s | 340s
                                                   [0%] | [-14%]     [-24.5%] | [-22.9%]
            Ubuntu 18.04  gcc-7     7.3.0-27      325s  | 236s           209s | 184s
                                                  338s  | 239s           212s | 169s
                                               [331.5s] | [-28.4%]   [-36.5%] | [-46.8%]
            

            In an over-threaded setup, the single threaded mcsimport outperforms the multi-threaded. Except on Ubuntu 18.04; it seems to be able to deal with over-threaded setups and shows a similar performance as in the optimal case with 2 cores and 4 threads. It also shows this behaviour in the over-threaded buildbot sample. The CentOS 7 compiler difference is marginal.

            Virtual Box tests against ColumnStore 1.2.2-1 on VMs with 8GiB of memory and 2 cores and 4 threads.
            In the gcc-7 case of CentOS 7, mcsapi was also compiled with gcc-7. (load_test_2)

                                                  1.2.3 | MCOL-2089      -O3
            CentOS 7      gcc       4.8.5-16      440s  | 361s           348s | 271s
                                                  437s  | 348s           345s | 276s
                                               [438.5s] | [-19.2%]     [-21%] | [-37.6%]
                          gcc-7     7.3.1-5       451s  | 339s           345s | 256s
                                                  446s  | 345s           339s | 252s
                                                [+2.3%] | [-22%]       [-22%] | [-42.1%]
            Ubuntu 18.04  gcc-7     7.3.0-27      335s  | 295s           229s | 189s
                                                  361s  | 303s           230s | 192s
                                                 [348s] | [-14.1%]   [-34.1%] | [-45.3%]
            

            This seems to be the optional test case setup for multi-threaded. There is one thread for CS and three threads for mcsimport.
            Here the multi-threaded mcsimport outperforms the single threaded. The CentOS 7 compiler difference only takes effect in the optimized multi-threaded use-case.

            Virtual Box tests against ColumnStore 1.2.2-1 on VMs with 8GiB of memory and 1 core and 2 threads.
            In the gcc-7 case of CentOS 7, mcsapi was also compiled with gcc-7. (load_test_2)

                                                  1.2.3 | MCOL-2089      -O3
            CentOS 7      gcc       4.8.5-16      424s  | 545s           344s | 434s
                                                  421s  | 558s           338s | 429s
                                               [422.5s] | [+30.5%]   [-19.3%] | [+2.1%]
                          gcc-7     7.3.1-5       434s  | 562s           327s | 400s
                                                  426s  | 568s           328s | 407s
                                                [+1.8%] | [+33.7%]   [-22.5%] | [-4.5%]
            Ubuntu 18.04  gcc-7     7.3.0-27      357s  | 505s           219s | 293s
                                                  359s  | 504s           220s | 276s
                                                 [358s] | [+40.9%]   [-38.7%] | [-20.5%]
            

            Not suprisingly, in an under-threaded machine the single threaded mcsimport outperforms the multi-threaded.
            CentOS'es gcc-7 compiler performs better in an under-threaded environment than the default version.

            Second test case - buildbot execution times of load_test_2
            Similar test as above, but using buildbot for the execution. The EC2 instances used by buildbot are c4.2xlarge ones which have 8vCPUs and 15GiB of memory. Therefore, an over-threaded environment.

                                                  1.2.3 | MCOL-2089     -O3
            CentOS 7      gcc       4.8.5-16      207s  | 259s          173s | 259s
                                                        | [+25.1%]  [-16.4%] | [+25.1%]
            Debian 8      gcc-4.9   4.9.2-2       199s  | 261s          164s | 272s
                                                        | [+31.2%]  [-17.6%] | [+36.7%]
            Ubuntu 16.04  gcc-5     5.3.1-3       164s  | 261s          117s | 204s
                                                        | [+59.1%]  [-28.7%] | [+24.4%]
            Debian 9      gcc-6     6.3.0-9       165s  | 223s          124s | 218s
                                                        | [+35.2%]  [-24.8%] | [+32.1%]
            Ubuntu 18.04  gcc-7     7.3.0-27      158s  | 121s          115s | 105s
                                                        | [-23.4%]  [-27.2%] | [-33.5%]
            

            This shows us that the single threaded mcsimport outperforms the multi-threaded mcsimport on every OS except Ubuntu 18.04 during
            the buildbot test execution. It further states a performance gain of around 23% for the single threaded mcsimport while using the optimization flag -O3. This contradicts directly with the findings on my Virtual Box setup, as I expected a difference of up to 10% between the multi-threaded and single-threaded execution; Not more than 50%.

            Third test case - mcsimport injection from Windows 10
            CentOS 7, Ubuntu 18.04 ColumnStore 1.2.2-1 (Virtual Box VM) mcsimport injection from Windows 10 (4 cores) comparison (load_test_2)

                                1.2.3 | MCOL-2089
            CentOS 7 (CS)       167s  | 145s
                                164s  | 146s
                             [165.5s] | [-12.1%]
            Ubuntu 18.04 (CS)   129s  | 112s
                                127s  | 111s
                               [128s] | [-12.9%]
            

            This shows us that there is a performance difference of around 23% only based on the choice of operating system used for ColumnStore.
            This is probably amongst others about the different version of C++ compiler used while building the ColumnStore packages. This also shows that the multi-threaded implementation of mcsimport performs around 12.5% better than the single threaded on a Windows 10 machine with 4 cores. As Windows uses an optimizer by default, there is no -O3 flag.

            Fourth test case - mcsapi compiler / optimizer impact
            CentOS 7 API 1.2.3 Million Row tests

                        4.8.5-16    -O3        7.3.1-5       -O3
            cpp         19.59s      n/a        19.69s        19.57s
                        19.55s      n/a        19.25s        20.43s
                        20.60s      n/a        19.23s        19.62s
                       [19.91s]                [-2.6%]       [-0.2%]
            python2     22.82s      n/a        20.72s        21.08s
                                               [-9.2%]       [-7.6%]
            python3     54.19s      n/a        51.67s        52.66s
                                               [-4.7%]       [-2.8%]
            java        17.55s      n/a        18.46s        18.12s
                                               [+5.2%]       [+3.25%]
            

            This shows us that the choice of C++ compiler and optimizer option can have a around 5% effect on the performance.
            But more data-points should be collected to verify this thesis.

            My conclusion:

            • Using the -O3 optimizer flag gives us an around 24.5% performance enhancement for the single-threaded mcsimport. [median of all tests]
            • The multi-threaded optimized mcsimport can give performance enhancements between 37% and 46% under optimal conditions.
            • In under-threaded environments the optimized multi-threaded mcsimport performs between 20% and 30% worse than the optimized single threaded mcsimport. We could change the program to use either the multi or single threaded mcsimport depending on the cores it detected during execution.
            • In over-threaded environments the optimized multi-threaded mcsimport performance degrades on every Linux operating system except Ubuntu 18.04. The degradation is between 15% and more than 50% depending on the execution setup (local VM vs. buildbot). Therefore, here it is less efficient than the single-threaded optimized mcsimport. I don't have any explanation for this behaviour. But due to the CentOS 7 tests, we can rule out that it is gcc-7/compiler related. Some advise on how to investigate further would be great.
            • On Windows switching from single-threaded to multi-threaded mcsimport has a positive impact of around 12.5% on multi-core systems. The performance in under-threaded environments hasn't been evaluated yet.
            • Switching the CentOS compiler to gcc-7 has especially good impacts on the multi-threaded performance, but not so much on the single-threaded.
            • We might want to evaluate why there is a server side performance difference during the injection of around 12% depending on the OS used to host ColumnStore and the effect of using the -O3 flag for mcsapi as well.

            TL/DR: We can get 24.5% optimization right away by enabling -O3 for single threaded mcsimport. We could squeeze out 20% more performance if we use pipelining and figure out why the performance degrades while executing on over-threaded Linux operating systems (except Ubuntu 18.04). We also have to find a solution to minimize the performance degradation while executed on under-threaded operating systems.

            My suggestion: Merge PR 34 and close PR 33 with the note that over-threaded and under-threaded environments need to be considered better. Then move MCOL-2089 to testing and create a new ticket to address the changes for multi-threaded.

            jens.rowekamp Jens Röwekamp (Inactive) added a comment - - edited I've extended my tests / profiling to also examine the performance impact of multi-threaded mcsimport on Linux operating systems. They differ from the results for Windows. First test case with CentOS 7 and Ubuntu 18.04 in a Virtual Box environment A single server installation of ColumnStore 1.2.2-1 from the package repo was performed. mcsimport is executed on the same machine. 1.2.3 labels the single-threaded mcsimport from develop-1.2 (as baseline), MCOL-2089 the new multi-threaded implementation and -O3 the optimizer flag used during compiling. Executed was load_test_2 from mcsimports regression test suite which imports a single 1.28GB csv file with three columns of integers. Installed kernels: Linux centos7 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Linux ubuntu18 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Virtual Box tests against ColumnStore 1.2.2-1 on VMs with 8GiB of memory and 4 cores and 8 threads. [host maximum] In the gcc-7 case of CentOS 7, mcsapi was also compiled with gcc-7. (load_test_2) 1.2.3 | MCOL-2089 -O3 CentOS 7 gcc 4.8.5-16 444s | 403s 334s | 320s 429s | 405s 326s | 337s | 332s | 337s [436.5s] | [-7.4%] [-24.2%] | [-24.1%] gcc-7 7.3.1-5 432s | 380s 328s | 333s 441s | 371s 331s | 340s [0%] | [-14%] [-24.5%] | [-22.9%] Ubuntu 18.04 gcc-7 7.3.0-27 325s | 236s 209s | 184s 338s | 239s 212s | 169s [331.5s] | [-28.4%] [-36.5%] | [-46.8%] In an over-threaded setup, the single threaded mcsimport outperforms the multi-threaded. Except on Ubuntu 18.04; it seems to be able to deal with over-threaded setups and shows a similar performance as in the optimal case with 2 cores and 4 threads. It also shows this behaviour in the over-threaded buildbot sample. The CentOS 7 compiler difference is marginal. Virtual Box tests against ColumnStore 1.2.2-1 on VMs with 8GiB of memory and 2 cores and 4 threads. In the gcc-7 case of CentOS 7, mcsapi was also compiled with gcc-7. (load_test_2) 1.2.3 | MCOL-2089 -O3 CentOS 7 gcc 4.8.5-16 440s | 361s 348s | 271s 437s | 348s 345s | 276s [438.5s] | [-19.2%] [-21%] | [-37.6%] gcc-7 7.3.1-5 451s | 339s 345s | 256s 446s | 345s 339s | 252s [+2.3%] | [-22%] [-22%] | [-42.1%] Ubuntu 18.04 gcc-7 7.3.0-27 335s | 295s 229s | 189s 361s | 303s 230s | 192s [348s] | [-14.1%] [-34.1%] | [-45.3%] This seems to be the optional test case setup for multi-threaded. There is one thread for CS and three threads for mcsimport. Here the multi-threaded mcsimport outperforms the single threaded. The CentOS 7 compiler difference only takes effect in the optimized multi-threaded use-case. Virtual Box tests against ColumnStore 1.2.2-1 on VMs with 8GiB of memory and 1 core and 2 threads. In the gcc-7 case of CentOS 7, mcsapi was also compiled with gcc-7. (load_test_2) 1.2.3 | MCOL-2089 -O3 CentOS 7 gcc 4.8.5-16 424s | 545s 344s | 434s 421s | 558s 338s | 429s [422.5s] | [+30.5%] [-19.3%] | [+2.1%] gcc-7 7.3.1-5 434s | 562s 327s | 400s 426s | 568s 328s | 407s [+1.8%] | [+33.7%] [-22.5%] | [-4.5%] Ubuntu 18.04 gcc-7 7.3.0-27 357s | 505s 219s | 293s 359s | 504s 220s | 276s [358s] | [+40.9%] [-38.7%] | [-20.5%] Not suprisingly, in an under-threaded machine the single threaded mcsimport outperforms the multi-threaded. CentOS'es gcc-7 compiler performs better in an under-threaded environment than the default version. Second test case - buildbot execution times of load_test_2 Similar test as above, but using buildbot for the execution. The EC2 instances used by buildbot are c4.2xlarge ones which have 8vCPUs and 15GiB of memory. Therefore, an over-threaded environment. 1.2.3 | MCOL-2089 -O3 CentOS 7 gcc 4.8.5-16 207s | 259s 173s | 259s | [+25.1%] [-16.4%] | [+25.1%] Debian 8 gcc-4.9 4.9.2-2 199s | 261s 164s | 272s | [+31.2%] [-17.6%] | [+36.7%] Ubuntu 16.04 gcc-5 5.3.1-3 164s | 261s 117s | 204s | [+59.1%] [-28.7%] | [+24.4%] Debian 9 gcc-6 6.3.0-9 165s | 223s 124s | 218s | [+35.2%] [-24.8%] | [+32.1%] Ubuntu 18.04 gcc-7 7.3.0-27 158s | 121s 115s | 105s | [-23.4%] [-27.2%] | [-33.5%] This shows us that the single threaded mcsimport outperforms the multi-threaded mcsimport on every OS except Ubuntu 18.04 during the buildbot test execution. It further states a performance gain of around 23% for the single threaded mcsimport while using the optimization flag -O3. This contradicts directly with the findings on my Virtual Box setup, as I expected a difference of up to 10% between the multi-threaded and single-threaded execution; Not more than 50%. Third test case - mcsimport injection from Windows 10 CentOS 7, Ubuntu 18.04 ColumnStore 1.2.2-1 (Virtual Box VM) mcsimport injection from Windows 10 (4 cores) comparison (load_test_2) 1.2.3 | MCOL-2089 CentOS 7 (CS) 167s | 145s 164s | 146s [165.5s] | [-12.1%] Ubuntu 18.04 (CS) 129s | 112s 127s | 111s [128s] | [-12.9%] This shows us that there is a performance difference of around 23% only based on the choice of operating system used for ColumnStore. This is probably amongst others about the different version of C++ compiler used while building the ColumnStore packages. This also shows that the multi-threaded implementation of mcsimport performs around 12.5% better than the single threaded on a Windows 10 machine with 4 cores. As Windows uses an optimizer by default, there is no -O3 flag. Fourth test case - mcsapi compiler / optimizer impact CentOS 7 API 1.2.3 Million Row tests 4.8.5-16 -O3 7.3.1-5 -O3 cpp 19.59s n/a 19.69s 19.57s 19.55s n/a 19.25s 20.43s 20.60s n/a 19.23s 19.62s [19.91s] [-2.6%] [-0.2%] python2 22.82s n/a 20.72s 21.08s [-9.2%] [-7.6%] python3 54.19s n/a 51.67s 52.66s [-4.7%] [-2.8%] java 17.55s n/a 18.46s 18.12s [+5.2%] [+3.25%] This shows us that the choice of C++ compiler and optimizer option can have a around 5% effect on the performance. But more data-points should be collected to verify this thesis. My conclusion: Using the -O3 optimizer flag gives us an around 24.5% performance enhancement for the single-threaded mcsimport. [median of all tests] The multi-threaded optimized mcsimport can give performance enhancements between 37% and 46% under optimal conditions. In under-threaded environments the optimized multi-threaded mcsimport performs between 20% and 30% worse than the optimized single threaded mcsimport. We could change the program to use either the multi or single threaded mcsimport depending on the cores it detected during execution. In over-threaded environments the optimized multi-threaded mcsimport performance degrades on every Linux operating system except Ubuntu 18.04. The degradation is between 15% and more than 50% depending on the execution setup (local VM vs. buildbot). Therefore, here it is less efficient than the single-threaded optimized mcsimport. I don't have any explanation for this behaviour. But due to the CentOS 7 tests, we can rule out that it is gcc-7/compiler related. Some advise on how to investigate further would be great. On Windows switching from single-threaded to multi-threaded mcsimport has a positive impact of around 12.5% on multi-core systems. The performance in under-threaded environments hasn't been evaluated yet. Switching the CentOS compiler to gcc-7 has especially good impacts on the multi-threaded performance, but not so much on the single-threaded. We might want to evaluate why there is a server side performance difference during the injection of around 12% depending on the OS used to host ColumnStore and the effect of using the -O3 flag for mcsapi as well. TL/DR: We can get 24.5% optimization right away by enabling -O3 for single threaded mcsimport. We could squeeze out 20% more performance if we use pipelining and figure out why the performance degrades while executing on over-threaded Linux operating systems (except Ubuntu 18.04). We also have to find a solution to minimize the performance degradation while executed on under-threaded operating systems. My suggestion: Merge PR 34 and close PR 33 with the note that over-threaded and under-threaded environments need to be considered better. Then move MCOL-2089 to testing and create a new ticket to address the changes for multi-threaded.

            Attached logs verify that the multi threaded implementation of mcsimport has potential, but currently is still slower than the single threaded implementation on some operating systems.

            Therefore, as indicated above the single threaded optimizations will be patched into 1.2.3 and the multi threaded implementation will be postponed to 1.2.4. It will be documented in MCOL-2226.

            jens.rowekamp Jens Röwekamp (Inactive) added a comment - Attached logs verify that the multi threaded implementation of mcsimport has potential, but currently is still slower than the single threaded implementation on some operating systems. Therefore, as indicated above the single threaded optimizations will be patched into 1.2.3 and the multi threaded implementation will be postponed to 1.2.4. It will be documented in MCOL-2226 .

            1.2.2

            Remote Load Method Elapsed Time [s]
            MCSIMPORT 6918
            MYSQLIMPORT 2180

            1.2.3

            Remote Load Method Elapsed Time [s]
            MCSIMPORT 6303(s)
            MYSQLIMPORT 1914(s)
            *local CPIMPORT 924(s)

            BF Passed rowCnt=1024 actRowCnt=1024
            BC Passed rowCnt=1025 actRowCnt=1025
            TF Passed rowCnt=253952 actRowCnt=253952
            TC Passed rowCnt=253953 actRowCnt=253953
            CF Passed rowCnt=516096 actRowCnt=516096
            CC Passed rowCnt=516097 actRowCnt=516097
            EF Passed rowCnt=8380416 actRowCnt=8380416
            EC Passed rowCnt=8380417 actRowCnt=8380417
            SF Passed rowCnt=33546240 actRowCnt=33546240
            SW Passed rowCnt=33546241 actRowCnt=33546241
            PF Passed rowCnt=67100672 actRowCnt=67100672
            PC Passed rowCnt=67100673 actRowCnt=67100673
            

            winstone Zdravelina Sokolovska (Inactive) added a comment - - edited 1.2.2 Remote Load Method Elapsed Time [s] MCSIMPORT 6918 MYSQLIMPORT 2180 1.2.3 Remote Load Method Elapsed Time [s] MCSIMPORT 6303(s) MYSQLIMPORT 1914(s) *local CPIMPORT 924(s) BF Passed rowCnt=1024 actRowCnt=1024 BC Passed rowCnt=1025 actRowCnt=1025 TF Passed rowCnt=253952 actRowCnt=253952 TC Passed rowCnt=253953 actRowCnt=253953 CF Passed rowCnt=516096 actRowCnt=516096 CC Passed rowCnt=516097 actRowCnt=516097 EF Passed rowCnt=8380416 actRowCnt=8380416 EC Passed rowCnt=8380417 actRowCnt=8380417 SF Passed rowCnt=33546240 actRowCnt=33546240 SW Passed rowCnt=33546241 actRowCnt=33546241 PF Passed rowCnt=67100672 actRowCnt=67100672 PC Passed rowCnt=67100673 actRowCnt=67100673

            issue is reopened as the test results on 1.2.3 show not well improved mcsimport performance ,under 10% from the 1.2.2 value

            winstone Zdravelina Sokolovska (Inactive) added a comment - - edited issue is reopened as the test results on 1.2.3 show not well improved mcsimport performance ,under 10% from the 1.2.2 value

            That is all the performance improvements we are going to get out of this ticket. The rest is being tracked in other tickets.

            LinuxJedi Andrew Hutchings (Inactive) added a comment - That is all the performance improvements we are going to get out of this ticket. The rest is being tracked in other tickets.

            People

              jens.rowekamp Jens Röwekamp (Inactive)
              winstone Zdravelina Sokolovska (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.