[MDEV-32216] mysqldump: use --tab (or another option) to increase performance and parallelism significantly Created: 2023-08-29  Updated: 2024-01-30  Resolved: 2024-01-30

Status: Closed
Project: MariaDB Server
Component/s: Scripts & Clients
Fix Version/s: 11.4.1

Type: New Feature Priority: Critical
Reporter: Yakov Kushnirsky Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: Preview_11.4

Issue Links:
Duplicate
is duplicated by MDEV-32440 parallel mariadb-dump -T Closed
PartOf
Relates
relates to MDEV-12808 Include mysqlpump parallel backup tool Open

 Description   

Add a new option --use-threads, it will apply only to the mariadb-dump -T mode and will make it to dump tables in that many threads.

If --single-transaction is specified too, should use a consistent snapshot over all threads.



 Comments   
Comment by Vladislav Vaintroub [ 2023-11-14 ]

That might need some clarification - currently

mysqldump --tab 

does not handle more than a single database.

Therefore - "How to export 100+ databases and tables on busy and active transactional system" will require 100+ mysqldump --tab invocations

Comment by Max Mether [ 2023-11-14 ]

Any chance to get --tab to work with multiple databases? Or should that be a different ticket?

Comment by Vladislav Vaintroub [ 2023-11-14 ]

@Max, not without breaking compatibility. Currently, --tab creates in the output directory <table_name>.sql with "CREATE TABLE" and <table_name>.txt with the data, for each table. There is no SQL for "CREATE DATABASE" or such. there are no subdirectories. file names do not reflect database name, but they are compatible with mysqlimport.

To handle databases it needs an incompatible extension - e.g create_databases.sql in the output directory, maybe subdirectories for each DB with <table_name>.sql and <table_name>.txt in them. and it might need some corresponding option in mysqlimport (i.e import "directory" , not a bunch of .txt files)

Comment by Max Mether [ 2023-11-14 ]

Yes, definitely sounds out of scope for this task then.

Comment by Oleksandr Byelkin [ 2023-12-06 ]

OK to send for testing after fixes small things I have mentioned

Comment by Lena Startseva [ 2024-01-03 ]

Testing done. Ok to push

Comment by Sergei Golubchik [ 2024-01-03 ]

lstartseva, could you please also see how it works with big multi-GB databases?

Comment by Lena Startseva [ 2024-01-23 ]

I checked on my laptop with the next environments and characteristics:
Environment:
Processor: Core i7-11850H
RAM: 64 Gb
Hard disk: 512GB SSD
OS: Windows 11/ Ubuntu

Data:
Dump size: 62,6 GB
Full database size on disk: 194 GB

Results:

The maximum value of the parameter --parallel is 5, because further increasing the number of threads did not lead to decrease the execution time. The minimum time was limited to the dump time of the largest table.

  without parameter "--parallel" --parallel=2 --parallel=3 --parallel=4 --parallel=5
Windows 11 15 min 19 sec 10 min 37 sec 7 min 59 sec 7 min 55 sec 7 min 42 sec
Ubuntu 14 min 24 sec 8 min 24 sec 7 min 25 sec 7 min 4 sec 6 min 43 sec

Database:

Table Size (MB) Dump file size (MB)
call_center 0.05 0.00 (4 KB)
catalog_page 3.00 1.55
catalog_returns 5793.05 1488.78
catalog_sales 56017.23 20316.78
customer 367.50 140.84
customer_address 106.16 59.61
customer_demographics 210.39 75.09
date_dim 10.52 9.77
dbgen_version 0.02 0.00 (1 KB)
household_demographics 0.55 0.14
income_band 0.02 0.00 (1 KB)
inventory 8824.94 1 688.33 MB
item 61.70 23.19
my_tpcds_result 0.02 0.00 (0 KB)
promotion 0.19 0.08
reason 0.02 0.00 (2 KB)
ship_mode 0.02 0.00 (2 KB)
store 0.11 0.05
store_returns 7013.73 2299.64
store_sales 85959.81 27205.20
time_dim 12.52 4.79
warehouse 0.02 0.00 (1 KB)
web_page 0.31 0.12
web_returns 2058.86 698.80
web_sales 27318.72 10144.64
web_site 0.05 0.01
Comment by Lena Startseva [ 2024-01-23 ]

After checking on big databases also Ok to push

Generated at Thu Feb 08 10:29:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.