[MCOL-1181] Total Disk Usage is too high with replication Created: 2018-01-26  Updated: 2018-01-29  Resolved: 2018-01-29

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.1.2
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: BOBY PETER Assignee: Andrew Hutchings (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Red Hat Linux


Attachments: Text File mariadb_cs_files.txt    

 Description   

getsoftwareinfo Fri Jan 26 12:03:05 2018

Name : mariadb-columnstore-platform
Version : 1.1.2
Release : 1
Architecture: x86_64
Install Date: Wed 29 Nov 2017 11:36:25 AM EST
Group : Applications/Databases
Size : 113942794

Table "A" having ~85K rows and 134 columns (majority float) with replication of 2 (glusterfs), took ~9GB disk size. I've around hundred tables similar to this.

TOTAL_DATA_SIZE : 17.17 GB
TOTAL_DISK_USAGE : 778.97 GB (This cannot be true)
Compression Ratio : 4428.4900%

df -h show too much disk usage as well.
Size: 493G
Used: 325G
Avail: 143G
Use%: 70%

Same table and data on a lower version (software details below) took only 200 MB disk size.
getsoftwareinfo Fri Jan 26 12:22:14 2018

Name : mariadb-columnstore-platform
Version : 1.0.7
Release : 1
Architecture: x86_64
Install Date: Tue 14 Mar 2017 05:59:14 PM EDT
Group : Applications/Databases
Size : 10013744



 Comments   
Comment by David Thompson (Inactive) [ 2018-01-27 ]

can you run du -sh in /usr/local/mariadb/columnstore as this might help indicate where the disk usage is being consumed. If that looks fine, possibly there is a bug in the space calculation procedure?

Comment by Andrew Hutchings (Inactive) [ 2018-01-27 ]

In addition to David's ask, can you please attach a text file with the output of information_schema.columnstore_files?

The TOTAL_DISK_USAGE is calculated as the sum of the file_size column in this table so this should show us why it is so high.

In addition we know that the compression ratio is incorrect. This is fixed in 1.1.3

Comment by BOBY PETER [ 2018-01-27 ]

In order to keep mariadb from "restarting" because of too much space consumption, I'm moving data around like, move 3 days of data from A to B; Truncate A; move from data from B to A; Truncate B on a weekly basis.

PM1
================
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 48G 22G 27G 45% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 14M 7.8G 1% /dev/shm
tmpfs 7.8G 817M 7.0G 11% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sdb1 493G 384G 84G 83% /usr/local/mariadb/columnstore/gluster/brick1
/dev/sdc1 493G 87G 381G 19% /usr/local/mariadb/columnstore/gluster/brick2

/dev/mapper/rhel-home 24G 37M 24G 1% /home
/dev/sda1 497M 270M 228M 55% /boot
tmpfs 1.6G 44K 1.6G 1% /run/user/0
<<PM1>>:/dbroot1 493G 384G 84G 83% /usr/local/mariadb/columnstore/data1
<<IP>>:/data01/nfsshare 493G 14G 454G 3% /data01/nfsshare

/usr/local/mariadb/columnstore
du -sh --> 864G

PM2
===================
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 48G 19G 30G 39% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 12M 7.8G 1% /dev/shm
tmpfs 7.8G 922M 6.9G 12% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sdb1 493G 384G 84G 83% /usr/local/mariadb/columnstore/gluster/brick1
/dev/sdc1 493G 314G 154G 68% /usr/local/mariadb/columnstore/gluster/brick2

/dev/sda1 497M 270M 228M 55% /boot
/dev/mapper/rhel-home 24G 37M 24G 1% /home
tmpfs 1.6G 12K 1.6G 1% /run/user/42
tmpfs 1.6G 0 1.6G 0% /run/user/0
<<PM2>>:/dbroot2 493G 314G 154G 68% /usr/local/mariadb/columnstore/data2
<<IP>>:/data01/nfsshare 493G 14G 454G 3% /data01/nfsshare

/usr/local/mariadb/columnstore
du -sh --> 1020G

PM3
===================
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 48G 16G 32G 34% /
devtmpfs 7.8G 0 7.8G 0% /dev
tmpfs 7.8G 12M 7.8G 1% /dev/shm
tmpfs 7.8G 898M 6.9G 12% /run
tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup
/dev/sdb1 493G 314G 154G 68% /usr/local/mariadb/columnstore/gluster/brick1
/dev/sdc1 493G 87G 381G 19% /usr/local/mariadb/columnstore/gluster/brick2

/dev/mapper/rhel-home 24G 37M 24G 1% /home
/dev/sda1 497M 270M 228M 55% /boot
tmpfs 1.6G 12K 1.6G 1% /run/user/42
tmpfs 1.6G 0 1.6G 0% /run/user/0
<<PM3>>:/dbroot3 493G 87G 381G 19% /usr/local/mariadb/columnstore/data3
<<IP>>:/data01/nfsshare 493G 14G 454G 3% /data01/nfsshare
mariadb_cs_files.txt
/usr/local/mariadb/columnstore
du -sh --> 494G

@Andrew Hutchings
Also "information_schema.columnstore_files" is attached.

Comment by Andrew Hutchings (Inactive) [ 2018-01-29 ]

Many thanks for the attachment. Running the numbers you now have ~1.5TB of files of which only ~19GB are used inside ~66000 segment files. This would tend to imply you have a lot of tables that are quite empty. An extent holds up to 8M rows and a segment file holds two extents. So if the table has less than 16M rows, which from your initial post there appears to be ~85K rows per table, there will be a significant amount wasted space allocated. With three PMs you would actually need 48M rows per table to fill segments on all three.

The pre-allocation is done to reduce possible fragmentation of the extent. MariaDB ColumnStore is designed to deal with many millions of rows at the cost of disk space when you are handling smaller amounts of data. Unfortunately this is expected behaviour.

Generated at Thu Feb 08 02:26:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.