CS now has a very rudimentary query optimization capabilities and we want to improve the situtation. We consider to use Server's optimizer for the purpose but the Server needs statistics namely values distribution histograms and Number of Distinct Values distribution histograms.
There are different levels of complexity for the task:
- implement standalone segment files reader that in the end populates both mysql.column_stats and mysql.table_stats using out of band mariadb client connection
- implement ANALYZE TABLE functionality for Columnstore engine
- implement ANALYZE TABLE and Histograms with equal-width bins for values distribution histograms(similar to MDEV-12313) together with NDV histograms to decrease I/O
We expect to have both unit and regression tests but this is optional.