Details
-
New Feature
-
Status: Stalled (View Workflow)
-
Critical
-
Resolution: Unresolved
-
None
-
2024-2
Description
Customers need an easy way to export data to disk or local to the client in a performant way without accessing the command line. See simplified reproduction at the bottom
Customer Summary
Customer case runs on 40 billion records table
3 seconds just the select,
19 minutes with OUTFILE
3 seconds with mariadb -q > file.tbl
all yielding 1128 results
Query runs fast alone
multi-unmanaged-db-p09 [(none)]> select year_id, location_id, count(*) from gbd.output_injury_single_year_v14778_incidence_dd_4401 where measure_id=6 and year_id in (2018) group by year_id, location_id order by year_id, location_id; |
1128 rows in set (3.001 sec) |
Runs unacceptably slow with OUTFILE
MariaDB [(none)]> select year_id, location_id, count(*) into outfile '/tmp/test1.csv' from gbd.output_injury_single_year_v14778_incidence_dd_4401 where measure_id=6 and year_id in (2018) group by year_id, location_id order by year_id, location_id; |
Query OK, 1128 rows affected (19 min 22.171 sec) |
Work around
mariadb-dba@multi-unmanaged-db-p09:/var/lib/mysql$ time mysql -q -e "select year_id, location_id, count(*) from gbd.output_injury_single_year_v14778_incidence_dd_4401 where measure_id=6 and year_id in (2018) group by year_id, location_id order by year_id, location_id" > /tmp/test6.csv |
|
real 0m3.030s
|
Reproduction
DB="test" |
TABLE="delete_test" |
RECORDS=12500000 |
MAX_VALUE=100 |
mariadb -e "create database if not exists $DB;" |
mariadb $DB -e "DROP TABLE IF EXISTS $TABLE"; |
mariadb $DB -e "CREATE TABLE $TABLE( number1 int, group_name CHAR(2), number2 DECIMAL(12, 2), var varchar(1), column3 int, column4 int, column5 int, column6 int, column7 int, column8 int, column9 int, column10 int, column11 int, column12 int, column13 int, column14 int, column15 int, column16 int, column17 int, column18 int, column19 int, column20 int, column21 int, column22 int, column23 int, column24 int, column25 int, column26 int, column27 int, column28 int, column29 int, column30 int, column31 int, column32 int, column33 int, column34 int, column35 int, column36 int, column37 int, column38 int, column39 int, column40 int, column41 int, column42 int, column43 int, column44 int, column45 int, column46 int, column47 int, column48 int, column49 int, column50 int) ENGINE=COLUMNSTORE;" |
time mariadb $DB -e "INSERT INTO $TABLE SELECT ROUND(RAND() * $MAX_VALUE, 0), CAST(ROUND(RAND() * 10, 0) AS CHAR), ROUND(RAND() * 1000000, 2), substring(MD5(RAND()),1,1), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0), ROUND(RAND() * $MAX_VALUE, 0) FROM seq_1_to_$RECORDS;" |
|
|
time mariadb $DB -e "select number1, count(*) from $TABLE where number2>=50 group by number1;" | wc -l |
rm -rf /tmp/test1.csv;
|
time mariadb $DB -e "select number1, count(*) into outfile '/tmp/test1.csv' from $TABLE where number2>=50 group by number1;"; |
wc -l /tmp/test1.csv; rm -rf /tmp/test1.csv;
|
time mariadb $DB -e "select number1, count(*) from $TABLE where number2>=50 group by number1;" > /tmp/test1.csv; |
wc -l /tmp/test1.csv;
|
Reproduction results
- note 102 vs 101 because of headers included
[root@ip-172-31-21-130 centos]# time mariadb $DB -e "select number1, count(*) from $TABLE where number2>=50 group by number1;" | wc -l
102
real 0m0.494s * expected
user 0m0.001s
sys 0m0.009s
[root@ip-172-31-21-130 centos]# rm -rf /tmp/test1.csv;
[root@ip-172-31-21-130 centos]# time mariadb $DB -e "select number1, count(*) into outfile '/tmp/test1.csv' from $TABLE where number2>=50 group by number1;";
real 0m22.484s * slow ( in this repo its ~40x slower, customer case is ~385x slower )
user 0m0.004s
sys 0m0.004s
[root@ip-172-31-21-130 centos]# wc -l /tmp/test1.csv; rm -rf /tmp/test1.csv;
101 /tmp/test1.csv
[root@ip-172-31-21-130 centos]# time mariadb $DB -e "select number1, count(*) from $TABLE where number2>=50 group by number1;" > /tmp/test1.csv;
real 0m0.296s * expected
user 0m0.010s
sys 0m0.000s
[root@ip-172-31-21-130 centos]# wc -l /tmp/test1.csv;
102 /tmp/test1.csv