[MCOL-1181] Total Disk Usage is too high with replication Created: 2018-01-26 Updated: 2018-01-29 Resolved: 2018-01-29 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | cpimport |
| Affects Version/s: | 1.1.2 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | BOBY PETER | Assignee: | Andrew Hutchings (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Red Hat Linux |
||
| Attachments: |
|
| Description |
|
getsoftwareinfo Fri Jan 26 12:03:05 2018 Name : mariadb-columnstore-platform Table "A" having ~85K rows and 134 columns (majority float) with replication of 2 (glusterfs), took ~9GB disk size. I've around hundred tables similar to this. TOTAL_DATA_SIZE : 17.17 GB df -h show too much disk usage as well. Same table and data on a lower version (software details below) took only 200 MB disk size. Name : mariadb-columnstore-platform |
| Comments |
| Comment by David Thompson (Inactive) [ 2018-01-27 ] |
|
can you run du -sh in /usr/local/mariadb/columnstore as this might help indicate where the disk usage is being consumed. If that looks fine, possibly there is a bug in the space calculation procedure? |
| Comment by Andrew Hutchings (Inactive) [ 2018-01-27 ] |
|
In addition to David's ask, can you please attach a text file with the output of information_schema.columnstore_files? The TOTAL_DISK_USAGE is calculated as the sum of the file_size column in this table so this should show us why it is so high. In addition we know that the compression ratio is incorrect. This is fixed in 1.1.3 |
| Comment by BOBY PETER [ 2018-01-27 ] |
|
In order to keep mariadb from "restarting" because of too much space consumption, I'm moving data around like, move 3 days of data from A to B; Truncate A; move from data from B to A; Truncate B on a weekly basis. PM1 /usr/local/mariadb/columnstore PM2 /usr/local/mariadb/columnstore PM3 @Andrew Hutchings |
| Comment by Andrew Hutchings (Inactive) [ 2018-01-29 ] |
|
Many thanks for the attachment. Running the numbers you now have ~1.5TB of files of which only ~19GB are used inside ~66000 segment files. This would tend to imply you have a lot of tables that are quite empty. An extent holds up to 8M rows and a segment file holds two extents. So if the table has less than 16M rows, which from your initial post there appears to be ~85K rows per table, there will be a significant amount wasted space allocated. With three PMs you would actually need 48M rows per table to fill segments on all three. The pre-allocation is done to reduce possible fragmentation of the extent. MariaDB ColumnStore is designed to deal with many millions of rows at the cost of disk space when you are handling smaller amounts of data. Unfortunately this is expected behaviour. |