Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
Selecting random rows from a table is a historicity hard problem. This is because the distribution of row values isn't evident.
"Workarounds" have existed for a while - http://jan.kneschke.de/projects/mysql/order-by-rand/ - fairly intrusive however.
Using engine independent statistics these are available in mariadb https://mariadb.com/kb/en/mariadb/engine-independent-table-statistics/.
To solve exposing a fake table it should be able to get a clean SQL to get random rows.
Like https://mariadb.com/kb/en/mariadb/sequence/ this should be a table discover https://mariadb.com/kb/en/mariadb/table-discovery/.
Mode of operations could be:
ANALYZE TABLE xxx PERSISTENT FOR ALL; |
INSTALL SONAME "ha_rand"; |
SELECT * FROM rand_xxx_col1_5; |
The select would return 5 random rows from the table xxx based on the distribution of col1.
Its use a a full member should be returned:
SELECT r.id, other.id FROM rand_xxx_col1_5 r JOIN other on r.id=other.id; |
Goals in order:
- Initial table discovery and returning tables
- Participation in optimizer negotiation
- Ensuring a unique result - probably desirable from an expected use however altering the table syntax to have a flag as part of the name for this
- Make the table modifiable
I'm willing to mentor this, co-mentors with a good knowledge of the storage API and table discovery would be much appreciated