Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28637

Add basic TABLESAMPLE SYSTEM support

Details

    Description

      MDEV-19556 adds engine-native record sampling. It also adds code that uses sampling for its primary purpose, that is, to collect histograms.

      However, it is fairly difficult to do testing. Looking at the collected histogram, it is hard to tell whether the sampling code has provided each record the same chance of being in the output, etc.

      This MDEV is about adding the basic TABLESAMPLE syntax support which allows one to examine the sampled rows directly.

      The SQL syntax for TABLESAMPLE feature:

      <table factor> ::=
      <table primary> [ <sample clause> ]
       
      <sample clause> ::=
      TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
      [ <repeatable clause> ]
       
       
      <sample method> ::=
      BERNOULLI
      | SYSTEM
       
      <repeatable clause> ::=
      REPEATABLE <left paren> <repeat argument> <right paren>
       
      <sample percentage> ::=
      <numeric value expression>
       
      <repeat argument> ::=
      <numeric value expression>
      

      TABLESAMPLE is a reserved word.

      We allow only this syntax:

      TABLESAMPLE SYSTEM ( number )
      

      We allow it only for SELECTs (not for DML).

      The optimizer will not use any indexes for the table that uses TABLESAMPLE syntax.

      It will also ignore the fact that sampling the table produces fewer rows.

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            psergei Sergei Petrunia added a comment - - edited The first patch https://gist.github.com/spetrunia/79fcd31c8fe114479a77e682df211c4c
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -
            Component/s Optimizer [ 10200 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.10.0 [ 27912 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.10.0 [ 27912 ]
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 10.10 [ 27530 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.12 [ 28320 ]
            Fix Version/s 10.11 [ 27614 ]
            sanja Oleksandr Byelkin made changes -
            Attachment bernoulli_add.diff [ 65049 ]

            I added patch to support Bernoulli sampling.

            sanja Oleksandr Byelkin added a comment - I added patch to support Bernoulli sampling.
            sanja Oleksandr Byelkin made changes -
            Description MDEV-19556 adds engine-native record sampling. It also adds code that uses sampling for its primary purpose, that is, to collect histograms.

            However, it is fairly difficult to do testing. Looking at the collected histogram, it is hard to tell whether the sampling code has provided each record the same chance of being in the output, etc.

            This MDEV is about adding the basic {{TABLESAMPLE}} syntax support which allows one to examine the sampled rows directly.

            The SQL syntax for TABLESAMPLE feature:

            {code}
            <table factor> ::=
            <table primary> [ <sample clause> ]

            <sample clause> ::=
            TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
            [ <repeatable clause> ]


            <sample method> ::=
            BERNOULLI
            | SYSTEM

            <repeatable clause> ::=
            REPEATABLE <left paren> <repeat argument> <right paren>

            <sample percentage> ::=
            <numeric value expression>

            <repeat argument> ::=
            <numeric value expression>
            {code}

            TABLESAMPLE is a reserved word.

            We allow only this syntax:
            {code}
            TABLESAMPLE SYSTEM ( number )
            {code}

            We allow it only for SELECTs (not for DML).

            The optimizer will not use any indexes for the table that uses TABLESAMPLE syntax.

            It will also ignore the fact that sampling the table produces fewer rows.
            MDEV-19556 adds engine-native record sampling. It also adds code that uses sampling for its primary purpose, that is, to collect histograms.

            However, it is fairly difficult to do testing. Looking at the collected histogram, it is hard to tell whether the sampling code has provided each record the same chance of being in the output, etc.

            This MDEV is about adding the basic {{TABLESAMPLE}} syntax support which allows one to examine the sampled rows directly.

            The SQL syntax for TABLESAMPLE feature:

            {code}
            <table factor> ::=
            <table primary> [ <sample clause> ]

            <sample clause> ::=
            TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
            [ <repeatable clause> ]


            <sample method> ::=
            BERNOULLI
            | SYSTEM

            <repeatable clause> ::=
            REPEATABLE <left paren> <repeat argument> <right paren>

            <sample percentage> ::=
            <numeric value expression>

            <repeat argument> ::=
            <numeric value expression>
            {code}

            TABLESAMPLE is a reserved word.

            -We allow only this syntax:
            {code}
            TABLESAMPLE SYSTEM ( number )
            {code}-

            We allow it only for SELECTs (not for DML).

            The optimizer will not use any indexes for the table that uses TABLESAMPLE syntax.

            It will also ignore the fact that sampling the table produces fewer rows.
            sanja Oleksandr Byelkin made changes -
            Description MDEV-19556 adds engine-native record sampling. It also adds code that uses sampling for its primary purpose, that is, to collect histograms.

            However, it is fairly difficult to do testing. Looking at the collected histogram, it is hard to tell whether the sampling code has provided each record the same chance of being in the output, etc.

            This MDEV is about adding the basic {{TABLESAMPLE}} syntax support which allows one to examine the sampled rows directly.

            The SQL syntax for TABLESAMPLE feature:

            {code}
            <table factor> ::=
            <table primary> [ <sample clause> ]

            <sample clause> ::=
            TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
            [ <repeatable clause> ]


            <sample method> ::=
            BERNOULLI
            | SYSTEM

            <repeatable clause> ::=
            REPEATABLE <left paren> <repeat argument> <right paren>

            <sample percentage> ::=
            <numeric value expression>

            <repeat argument> ::=
            <numeric value expression>
            {code}

            TABLESAMPLE is a reserved word.

            -We allow only this syntax:
            {code}
            TABLESAMPLE SYSTEM ( number )
            {code}-

            We allow it only for SELECTs (not for DML).

            The optimizer will not use any indexes for the table that uses TABLESAMPLE syntax.

            It will also ignore the fact that sampling the table produces fewer rows.
            MDEV-19556 adds engine-native record sampling. It also adds code that uses sampling for its primary purpose, that is, to collect histograms.

            However, it is fairly difficult to do testing. Looking at the collected histogram, it is hard to tell whether the sampling code has provided each record the same chance of being in the output, etc.

            This MDEV is about adding the basic {{TABLESAMPLE}} syntax support which allows one to examine the sampled rows directly.

            The SQL syntax for TABLESAMPLE feature:

            {code}
            <table factor> ::=
            <table primary> [ <sample clause> ]

            <sample clause> ::=
            TABLESAMPLE <sample method> <left paren> <sample percentage> <right paren>
            [ <repeatable clause> ]


            <sample method> ::=
            BERNOULLI
            | SYSTEM

            <repeatable clause> ::=
            REPEATABLE <left paren> <repeat argument> <right paren>

            <sample percentage> ::=
            <numeric value expression>

            <repeat argument> ::=
            <numeric value expression>
            {code}

            TABLESAMPLE is a reserved word.

            -We allow only this syntax:-
            {code}
            TABLESAMPLE SYSTEM ( number )
            {code}

            We allow it only for SELECTs (not for DML).

            The optimizer will not use any indexes for the table that uses TABLESAMPLE syntax.

            It will also ignore the fact that sampling the table produces fewer rows.
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.0 [ 28320 ]
            psergei Sergei Petrunia made changes -
            Labels optimizer-feature
            julien.fritsch Julien Fritsch made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.