[MCOL-3270] Improve cpimport ingest speed into Dictionary columns Created: 2019-04-18 Updated: 2020-08-25 Resolved: 2019-04-19 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | cpimport |
| Affects Version/s: | 1.2.3 |
| Fix Version/s: | 1.2.4 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Roman | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | 2019-04 | ||||||||
| Description |
|
Given 800 000 000 records with a couple Dictionary columns with lots of equal length strings in the data set. It took 4 167 seconds to ingest the data set into CS. There were two main sources of latency:
|
| Comments |
| Comment by Roman [ 2019-04-18 ] |
|
For QA: you could test the patch comparing ingestion speed of text or varchars without and with the patch using at least 10 000 000 records. There must be a reasonable difference in timings. |
| Comment by Daniel Lee (Inactive) [ 2019-04-19 ] |
|
Build verified: 1.2.4-1 nightly [dlee@master centos7]$ cat gitversionInfo.txt Dataset tested, 10 gb dbt3 orders table has 15,000,000 rows Performed cpimport timing test on both 1.2.2-1 and 1.2.4-1. 1.2.4-1 is about 2.5 times faster. Disk space utilization remained the same. Also with 1.2.4-1, loaded two 1gb dbt3 databases. columnstore database loaded using cpimport and innnodb database loaded using LDI. Verified all varchar columns in the orders table to be identical between the two databases using cross-engine join. 1.2.2-1 [root@localhost columnstore]# time /data/qa/autopilot/databases/dbt3/sh/buildDatabase.sh tpch10 columnstore 10g real 8m53.788s [root@localhost columnstore]# du -sh data1 1.2.4-1 [root@localhost columnstore]# du -sh data1 [root@localhost ~]# time /data/qa/autopilot/databases/dbt3/sh/buildDatabase.sh tpch10 columnstore 10g real 3m27.598s [root@localhost columnstore]# du -sh data1 |