[MCOL-2052] IS.columnstore_files maximum contains incorrect number of records for any relation. Created: 2018-12-26  Updated: 2020-08-25  Resolved: 2019-01-23

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.1.6, 1.2.2
Fix Version/s: 1.1.7, 1.2.3

Type: Bug Priority: Minor
Reporter: Roman Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Sprint: 2019-01

 Description   

Both columnstore_info.table_usage() and infromation_schema.columnstore_files report incorrect disk space usage for a table. Consider:

releasenum  
root@c3ca474b665d:/git/cs-docker-tools/generator# du -sh /usr/local/mariadb/columnstore/data1/
5,9G	/usr/local/mariadb/columnstore/data1/
 
MariaDB [test]> call columnstore_info.table_usage('test', 'cs2');
+--------------+------------+-----------------+-----------------+-------------+
| TABLE_SCHEMA | TABLE_NAME | DATA_DISK_USAGE | DICT_DATA_USAGE | TOTAL_USAGE |
+--------------+------------+-----------------+-----------------+-------------+
| test         | cs2        | 396.02 MB       | 0.00 Bytes      | 396.02 MB   |
+--------------+------------+-----------------+-----------------+-------------+
1 row in set (0.03 sec)
 
Query OK, 0 rows affected (0.03 sec)
 
MariaDB [test]> 
 
 
MariaDB [test]> select * from information_schema.columnstore_files;
+-----------+------------+--------------+------------------------------------------------------------------------------------------+-----------+----------------------+
| OBJECT_ID | SEGMENT_ID | PARTITION_ID | FILENAME                                                                                 | FILE_SIZE | COMPRESSED_DATA_SIZE |
+-----------+------------+--------------+------------------------------------------------------------------------------------------+-----------+----------------------+
|      3019 |          0 |            0 | /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/203.dir/000.dir/FILE000.cdf | 103809024 |            103809024 |
|      3019 |          1 |            0 | /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/203.dir/000.dir/FILE001.cdf | 103817216 |            103817216 |
|      3019 |          2 |            0 | /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/203.dir/000.dir/FILE002.cdf | 103817216 |            103817216 |
|      3019 |          3 |            0 | /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/203.dir/000.dir/FILE003.cdf | 103817216 |            103817216 |
+-----------+------------+--------------+------------------------------------------------------------------------------------------+-----------+----------------------+
4 rows in set (0.03 sec)



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-12-28 ]

This looks correct to me, the rest of the 5.9GB is likely things like version buffer. Unless there is more data in that table?

Comment by Roman [ 2018-12-28 ]

I omitted the other segment files that weren't displayed by columnstore_files or counted by table_usage.

Comment by Andrew Hutchings (Inactive) [ 2018-12-28 ]

I still don't understand the problem here. 103809024 + (3x 103817216) is 396MiB. Without a directory listing it would be hard to see where the rest of the usage is. But the table usage appears to be correct.

Comment by Roman [ 2019-01-15 ]

Please review the change.
I would appreciate any suggestions on how to test the change.

Comment by Andrew Hutchings (Inactive) [ 2019-01-15 ]

If you are asking for the regression suite, maybe a join on I_S.COLUMNSTORE_COLUMNS and only returning SEGMENT_ID and PARTITION_ID with an order by. I think then you remove any variance due to other tables the system may have. I'm not sure if we have a table that is large enough though. I think you would need at least 128M rows.

Comment by Andrew Hutchings (Inactive) [ 2019-01-15 ]

Also, thanks for fixing my dumb mistake.

For QA: When a table rolls over to the next partition_id (>128M rows) I_S.COLUMNSTORE_FILES was aborting early and not returning any more information.

Comment by Daniel Lee (Inactive) [ 2019-01-21 ]

Build verified:

1.1.7-1
server commit:
b5a7a22
engine commit:
d87b9a6

1.2.3-1
server commit:
61f32f2
engine commit:
83b2d4c

The issue has been fixed in 1.1.7-1, but still exist in 1.2.3-1

1.1.7-1 results:

select * from information_schema.columnstore_files;
.
.
.

3116 1 2 /usr/local/mariadb/columnstore/data1/000.dir/000.dir/012.dir/044.dir/002.dir/FILE001.cdf 469770240 183910400
3116 3 2 /usr/local/mariadb/columnstore/data1/000.dir/000.dir/012.dir/044.dir/002.dir/FILE003.cdf 268443648 106635264
3116 2 2 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/012.dir/044.dir/002.dir/FILE002.cdf 268443648 106610688

------------------------------------------------------------------------------------------------------------------------------------------------------+
228 rows in set (1.92 sec)

MariaDB [(none)]> call columnstore_info.table_usage('tpch10', 'lineitem');
-----------------------------------------------------------------

TABLE_SCHEMA TABLE_NAME DATA_DISK_USAGE DICT_DATA_USAGE TOTAL_USAGE

-----------------------------------------------------------------

tpch10 lineitem 7.47 GB 3.42 GB 10.89 GB

-----------------------------------------------------------------
1 row in set (0.79 sec)

Query OK, 0 rows affected (0.79 sec)

data1 5.6g
data2 5.6g

[root@localhost data1]# find ./ -name 'FILE***.cdf' | wc -l
155
[root@localhost data1]#

[root@localhost data2]# find ./ -name 'FILE***.cdf' | wc -l
114

Comment by Roman [ 2019-01-22 ]

This one will be fixed in 1.2 with the next upmerge.

Comment by Daniel Lee (Inactive) [ 2019-01-23 ]

build verified: 1.2.3-1

server commit:
61f32f2
engine commit:
02a86a3

Generated at Thu Feb 08 02:33:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.