[MDEV-28637] Add basic TABLESAMPLE SYSTEM support Created: 2022-05-20  Updated: 2023-12-07

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 11.2

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: optimizer-feature

Attachments: File bernoulli_add.diff    
Issue Links:
Relates
relates to MDEV-19556 Support native storage engine samplin... Open

 Description   

MDEV-19556 adds engine-native record sampling. It also adds code that uses sampling for its primary purpose, that is, to collect histograms.

However, it is fairly difficult to do testing. Looking at the collected histogram, it is hard to tell whether the sampling code has provided each record the same chance of being in the output, etc.

This MDEV is about adding the basic TABLESAMPLE syntax support which allows one to examine the sampled rows directly.

The SQL syntax for TABLESAMPLE feature:

<table factor> ::=
<table primary> [ <sample clause> ]
 
<sample clause> ::=
TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
[ <repeatable clause> ]
 
 
<sample method> ::=
BERNOULLI
| SYSTEM
 
<repeatable clause> ::=
REPEATABLE <left paren> <repeat argument> <right paren>
 
<sample percentage> ::=
<numeric value expression>
 
<repeat argument> ::=
<numeric value expression>

TABLESAMPLE is a reserved word.

We allow only this syntax:

TABLESAMPLE SYSTEM ( number )

We allow it only for SELECTs (not for DML).

The optimizer will not use any indexes for the table that uses TABLESAMPLE syntax.

It will also ignore the fact that sampling the table produces fewer rows.



 Comments   
Comment by Sergei Petrunia [ 2022-05-20 ]

The first patch https://gist.github.com/spetrunia/79fcd31c8fe114479a77e682df211c4c

Comment by Oleksandr Byelkin [ 2022-08-24 ]

I added patch to support Bernoulli sampling.

Generated at Thu Feb 08 10:02:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.