[MDEV-19556] Support native storage engine sampling of rows - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Optimizer, Storage Engine - Aria, Storage Engine - InnoDB, Storage Engine - MyISAM
Labels:
- contribution
- foundation

Description

Storage engine API should be extended to allow the server to make use (if available) storage engine sampling capabilities.

This feature can be additionally used to support fast approximation functions such as a version of fast count-distinct with an estimator attached. (ex: Smoothed Jackknife Estimator) Additionally, with native sampling support, one could afford to perform various optimizations in the background (such as statistics collection), as the performance impact would be much smaller. Another use case for native sampling is SELECT FROM <table-sample>

https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/

The implementation will be done for 2 different storage engines (Aria & Innodb). The algorithm will make use of a weighted index-dive (to counteract if index pages are unbalanced).

Attachments

Issue Links

relates to

MDEV-15020 Server hangs due to InnoDB persistent statistics or innodb_stats_auto_recalc

Closed

MDEV-28637 Add basic TABLESAMPLE SYSTEM support

Stalled

Activity

Ascending order - Click to sort in descending order

Vicențiu Ciorbaru created issue - 2019-05-22 15:51

Vicențiu Ciorbaru made changes - 2019-05-22 15:51

Field	Original Value	New Value
Fix Version/s		10.5 [ 23123 ]

Vicențiu Ciorbaru made changes - 2019-05-22 15:51

Component/s		Storage Engine - Aria [ 10126 ]
Component/s		Storage Engine - InnoDB [ 10129 ]
Component/s		Storage Engine - MyISAM [ 10600 ]

Vicențiu Ciorbaru made changes - 2019-05-22 15:52

Summary

Support native storage engine Sampling

Support native storage engine sampling of rows

Vicențiu Ciorbaru made changes - 2019-05-22 15:59

Description

Histogram collection has been augmented in 10.4 with the ability to collect a percentage of rows. This was implemented via Bernoulli sampling. The drawback is that one has to perform a full table scan to perform sampling. This technique has reduced the bottleneck of Histograms collection substantially, however it can still be improved.

Storage engine API should be extended to allow the server to make use (if available) storage engine sampling capabilities.

This feature can be additionally used to support fast approximation functions such as a version of fast count-distinct with an estimator attached. (ex: Smoothed Jackknife Estimator) Additionally, with native sampling support, one could afford to perform various optimizations in the background (such as statistics collection), as the performance impact would be much smaller. Another use case for native sampling is SELECT FROM <table-sample>

https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9-5-2/

The implementation will be done for 2 different storage engines (Aria & Innodb). The algorithm will make use of a weighted index-dive (to counteract if index pages are unbalanced).

Sergei Golubchik made changes - 2019-08-09 07:35

Priority

Major [ 3 ]

Critical [ 2 ]

Ralf Gebhardt made changes - 2020-04-28 13:18

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.5 [ 23123 ]

Sergei Golubchik made changes - 2020-08-17 16:54

Link

This issue blocks ~~MDEV-15020~~ [ ~~MDEV-15020~~ ]

Sergei Golubchik made changes - 2021-03-17 13:57

Priority

Critical [ 2 ]

Major [ 3 ]

Julien Fritsch made changes - 2021-03-18 15:05

Link

This issue relates to ~~MDEV-15020~~ [ ~~MDEV-15020~~ ]

Julien Fritsch made changes - 2021-03-18 15:06

Link

This issue blocks ~~MDEV-15020~~ [ ~~MDEV-15020~~ ]

Sergei Golubchik made changes - 2021-06-22 17:31

Fix Version/s		10.7 [ 24805 ]
Fix Version/s	10.6 [ 24028 ]

Ralf Gebhardt made changes - 2021-11-11 10:04

Fix Version/s		10.8 [ 26121 ]
Fix Version/s	10.7 [ 24805 ]

Sergei Golubchik made changes - 2021-12-06 21:21

Workflow

MariaDB v3 [ 96891 ]

MariaDB v4 [ 131090 ]

Sergei Golubchik made changes - 2022-02-01 11:33

Fix Version/s		10.9 [ 26905 ]
Fix Version/s	10.8 [ 26121 ]

Sergei Petrunia made changes - 2022-05-20 17:38

Link

This issue relates to MDEV-28637 [ MDEV-28637 ]

Sergei Petrunia added a comment - 2022-05-20 17:56

See MDEV-28637 for a patch that allows to test the sampling.

Sergei Petrunia added a comment - 2022-05-20 17:56 See MDEV-28637 for a patch that allows to test the sampling.

Sergei Golubchik made changes - 2022-05-24 12:58

Fix Version/s

10.9 [ 26905 ]

Sergey Vojtovich made changes - 2025-01-29 16:28

Labels

contribution

Vlad Radu made changes - 2025-02-06 07:19

Labels

contribution

contribution foundation

People

Assignee:: Vicențiu Ciorbaru

Reporter:: Vicențiu Ciorbaru

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2019-05-22 15:51

Updated:: 2025-02-06 07:19

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration