Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4450

cpimport on a 3 node system shows 10X as long as single node system

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 5.4.3
    • Fix Version/s: 5.4.3
    • Component/s: installation
    • Labels:
      None
    • Environment:
      Customer test setup

      1 node single server system

      3 node system with maxscale

      Description

      Reported by customer:

      I am seeing some major performance difference now that we have 3 nodes using glusterFS. When we had a single node environment, everything was much quicker. Take a look at these cpimport logs:

      2020-11-30 03:10:05 (3531772) INFO : For table 569_cdr.cisco: 350 rows processed and 350 rows inserted.
      2020-11-30 03:10:05 (3531772) INFO : Bulk load completed, total run time : 1.50017 seconds
      2020-12-10 14:46:29 (3394845) INFO : For table 522_cdr.cisco: 133 rows processed and 133 rows inserted.
      2020-12-10 14:46:29 (3394845) INFO : Bulk load completed, total run time : 29.2482

      The entry from 11/30 was a cpimport with a single node. The one today is with the 3 node environment.

      I would maybe expect it to be 3 times slower but it is much more than that. Before multi node, it did 350 rows in 1.5 seconds but in the 3 node environment, it took 30 seconds to do 133 rows which is like a 3rd of the data it loaded in 1.5 seconds.

      The odd thing is, I was taking a look at the logs and I did a bulk data load when I set up the cluster. I was able to import 16 million records in 457 seconds. I ran the same import today to a test database I created and it took 1400 seconds.
      Here is the output from a load right after setup:
      2020-12-02 11:12:27 (16829) INFO : Reading input from STDIN to import into table 203_cdr.avaya
      2020-12-02 11:12:27 (16829) INFO : Running distributed import (mode 1) on all PMs...
      2020-12-02 11:20:04 (16829) INFO : For table 203_cdr.avaya: 62130952 rows processed and 62130952 rows inserted.
      2020-12-02 11:20:04 (16829) INFO : Bulk load completed, total run time : 457.022 seconds
      Here is the test load I did yesterday:
      2020-12-10 16:27:12 (3442862) INFO : Reading input from STDIN to import into table test_load.avaya
      2020-12-10 16:27:12 (3442862) INFO : Running distributed import (mode 1) on all PMs...
      2020-12-10 16:50:22 (3442862) INFO : For table test_load.avaya: 62314457 rows processed and 62314457 rows inserted.
      2020-12-10 16:50:22 (3442862) INFO : Bulk load completed, total run time : 1390.21 seconds

      I actually did a lot of testing on the glusterFS volumes using gluster top to test the read/write speeds and saw no performance issues. I also benchmark tested the network connections between all 3 servers and confirmed we are getting between 9 - 10 GBps. It is weird because the cpimport speeds very widely even though all table schemas are the same where the table names are the same
      2020-12-14 15:13:40 (82928) INFO : For table 203_cdr.avaya: 51 rows processed and 51 rows inserted
      2020-12-14 15:13:40 (82928) INFO : Bulk load completed, total run time : 40.6569 seconds
      2020-12-14 15:13:43 (83493) INFO : Running distributed import (mode 1) on all PMs...
      2020-12-14 15:14:12 (83493) INFO : For table 203_cdr.avaya: 79 rows processed and 79 rows inserted.
      2020-12-14 15:14:12 (83493) INFO : Bulk load completed, total run time : 29.4481 seconds

      See where it did one import for 51 rows and it took 40 seconds then it did 79 rows and it took less time for more rows. There are also cases we see where it takes over 100 seconds to insert just 1 row.
      2020-12-14 14:46:31 (62145) INFO : For table 644_cdr.cisco: 1 rows processed and 1 rows inserted.
      2020-12-14 14:46:31 (62145) INFO : Bulk load completed, total run time : 185.843 seconds

      At first, we only were doing 1 cpimport process at a time and that was fine because we used to see very fast results but now we have to run multiple at a time so the database can keep up with the stream.

        Attachments

          Activity

            People

            Assignee:
            hill David Hill (Inactive)
            Reporter:
            hill David Hill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration