Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32216

mysqldump: use --tab (or another option) to increase performance and parallelism significantly

Details

    Description

      Add a new option --use-threads, it will apply only to the mariadb-dump -T mode and will make it to dump tables in that many threads.

      If --single-transaction is specified too, should use a consistent snapshot over all threads.

      Attachments

        Issue Links

          Activity

            lstartseva, could you please also see how it works with big multi-GB databases?

            serg Sergei Golubchik added a comment - lstartseva , could you please also see how it works with big multi-GB databases?
            lstartseva Lena Startseva added a comment - - edited

            I checked on my laptop with the next environments and characteristics:
            Environment:
            Processor: Core i7-11850H
            RAM: 64 Gb
            Hard disk: 512GB SSD
            OS: Windows 11/ Ubuntu

            Data:
            Dump size: 62,6 GB
            Full database size on disk: 194 GB

            Results:

            The maximum value of the parameter --parallel is 5, because further increasing the number of threads did not lead to decrease the execution time. The minimum time was limited to the dump time of the largest table.

              without parameter "--parallel" --parallel=2 --parallel=3 --parallel=4 --parallel=5
            Windows 11 15 min 19 sec 10 min 37 sec 7 min 59 sec 7 min 55 sec 7 min 42 sec
            Ubuntu 14 min 24 sec 8 min 24 sec 7 min 25 sec 7 min 4 sec 6 min 43 sec

            Database:

            Table Size (MB) Dump file size (MB)
            call_center 0.05 0.00 (4 KB)
            catalog_page 3.00 1.55
            catalog_returns 5793.05 1488.78
            catalog_sales 56017.23 20316.78
            customer 367.50 140.84
            customer_address 106.16 59.61
            customer_demographics 210.39 75.09
            date_dim 10.52 9.77
            dbgen_version 0.02 0.00 (1 KB)
            household_demographics 0.55 0.14
            income_band 0.02 0.00 (1 KB)
            inventory 8824.94 1 688.33 MB
            item 61.70 23.19
            my_tpcds_result 0.02 0.00 (0 KB)
            promotion 0.19 0.08
            reason 0.02 0.00 (2 KB)
            ship_mode 0.02 0.00 (2 KB)
            store 0.11 0.05
            store_returns 7013.73 2299.64
            store_sales 85959.81 27205.20
            time_dim 12.52 4.79
            warehouse 0.02 0.00 (1 KB)
            web_page 0.31 0.12
            web_returns 2058.86 698.80
            web_sales 27318.72 10144.64
            web_site 0.05 0.01
            lstartseva Lena Startseva added a comment - - edited I checked on my laptop with the next environments and characteristics: Environment: Processor: Core i7-11850H RAM: 64 Gb Hard disk: 512GB SSD OS: Windows 11/ Ubuntu Data: Dump size: 62,6 GB Full database size on disk: 194 GB Results: The maximum value of the parameter --parallel is 5, because further increasing the number of threads did not lead to decrease the execution time. The minimum time was limited to the dump time of the largest table.   without parameter "--parallel" --parallel=2 --parallel=3 --parallel=4 --parallel=5 Windows 11 15 min 19 sec 10 min 37 sec 7 min 59 sec 7 min 55 sec 7 min 42 sec Ubuntu 14 min 24 sec 8 min 24 sec 7 min 25 sec 7 min 4 sec 6 min 43 sec Database: Table Size (MB) Dump file size (MB) call_center 0.05 0.00 (4 KB) catalog_page 3.00 1.55 catalog_returns 5793.05 1488.78 catalog_sales 56017.23 20316.78 customer 367.50 140.84 customer_address 106.16 59.61 customer_demographics 210.39 75.09 date_dim 10.52 9.77 dbgen_version 0.02 0.00 (1 KB) household_demographics 0.55 0.14 income_band 0.02 0.00 (1 KB) inventory 8824.94 1 688.33 MB item 61.70 23.19 my_tpcds_result 0.02 0.00 (0 KB) promotion 0.19 0.08 reason 0.02 0.00 (2 KB) ship_mode 0.02 0.00 (2 KB) store 0.11 0.05 store_returns 7013.73 2299.64 store_sales 85959.81 27205.20 time_dim 12.52 4.79 warehouse 0.02 0.00 (1 KB) web_page 0.31 0.12 web_returns 2058.86 698.80 web_sales 27318.72 10144.64 web_site 0.05 0.01

            After checking on big databases also Ok to push

            lstartseva Lena Startseva added a comment - After checking on big databases also Ok to push
            hiller1 hiller1 added a comment -

            Can you provide a test case for multithreaded data export and an example of using mariadb-dump?

            hiller1 hiller1 added a comment - Can you provide a test case for multithreaded data export and an example of using mariadb-dump?
            serg Sergei Golubchik added a comment - - edited

            For example,

            # dump
            mariadb-dump --tab=dumpdir/ --parallel=5 mydb1
             
            # restore table definitions
            cat dumpdir/*.sql | mariadb mydb1
            # restore data
            mariadb-import --parallel=3 mydb1 dumpdir/*.txt
            

            disclaimer: this is based on the test case in the commit, I didn't run this sequence of commands myself

            serg Sergei Golubchik added a comment - - edited For example, # dump mariadb-dump --tab=dumpdir/ --parallel=5 mydb1   # restore table definitions cat dumpdir/*.sql | mariadb mydb1 # restore data mariadb- import --parallel=3 mydb1 dumpdir/*.txt disclaimer: this is based on the test case in the commit, I didn't run this sequence of commands myself

            People

              wlad Vladislav Vaintroub
              YK Yakov Kushnirsky
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.