Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15140

Implement Partial / Filtered Indexes

Details

    Description

      It would be nice, if MariaDB has some way to index only subset of values / rows.

      For example:
      We have 1% of rows value set to "active" and 99% of rows set to "inactive". There is no need to index "inactive" rows. It would be similar to table scan. But indexing "active" rows would work well.

      This feature is called Partial Index or Filtered Index in other databases.
      PostgreSQL:
      https://www.postgresql.org/docs/current/static/indexes-partial.html
      Microsoft SQL Server:
      https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-indexes
      Oracle (it can be created by Function-based index and fact that Oracle doesn't indexing NULL values):
      http://dba-presents.com/index.php/databases/oracle/41-filtered-index-equivalent-in-oracle

      There is no way to create this index in MySQL / Mariadb. Closest thing would be using another table and trigger. This complicate a lot of things and isn't much useful.

      Feature request in MySQL: https://bugs.mysql.com/bug.php?id=76631

      Attachments

        Issue Links

          Activity

            I wouldn't say that NULL is the one case. It wouldn't cover (most of the times) the most important use case I can think of: data warehousing.

            Fact tables may have "optional dimensions". Of course they could be represented with NULL, but that would makes dwh queries even longer than they usually are. Instead they correctly add a nully row into dimensions:

            INSERT INTO dim_city (city_id, city_name, country_id) VALUES (1, '', NULL);

            In this case, in fact tables I'd want to exclude rows WHERE city_id = 1 OR device_id = 1 OR ...

            f_razzoli Federico Razzoli added a comment - I wouldn't say that NULL is the one case. It wouldn't cover (most of the times) the most important use case I can think of: data warehousing. Fact tables may have "optional dimensions". Of course they could be represented with NULL, but that would makes dwh queries even longer than they usually are. Instead they correctly add a nully row into dimensions: INSERT INTO dim_city (city_id, city_name, country_id) VALUES (1, '', NULL); In this case, in fact tables I'd want to exclude rows WHERE city_id = 1 OR device_id = 1 OR ...
            nunop Nuno added a comment - - edited

            Yep - totally agree. 90% of a table might sometimes have a "city_id = 1" (using Frederico's example), but I want to index the other 10% of that table that has other city_id values.

            I've just read a comment I wrote back in 2019, saying:
            "But yes, VIRTUAL/PERSISTENT is perfect for the condition I was mentioning of "1 active entry per group", etc..."

            The problem with VIRTUAL/PERSISTENT is that it totally breaks ALTER ONLINE for that table, which is a big problem, so I'm actually avoiding using this in the future, and migrating away from using it completely.

            nunop Nuno added a comment - - edited Yep - totally agree. 90% of a table might sometimes have a "city_id = 1" (using Frederico's example), but I want to index the other 10% of that table that has other city_id values. – I've just read a comment I wrote back in 2019, saying: "But yes, VIRTUAL/PERSISTENT is perfect for the condition I was mentioning of "1 active entry per group", etc..." The problem with VIRTUAL/PERSISTENT is that it totally breaks ALTER ONLINE for that table, which is a big problem, so I'm actually avoiding using this in the future, and migrating away from using it completely.

            Here you can see that, in data wahouses, having a special row instead of NULL is a recommended practice by Kimball Group:
            https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/fact-table-null/

            f_razzoli Federico Razzoli added a comment - Here you can see that, in data wahouses, having a special row instead of NULL is a recommended practice by Kimball Group: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/fact-table-null/
            Dean T Dean Trower added a comment -

            In reference to Michael Widenius's comment above, if you allow the ability to omit (just) NULL values from an index, then you effectively have fully-featured partial indexes: You just index a virtual column or expression that translates any value you don't want indexed to NULL.

            Nuno has said this breaks ALTER ONLINE, but maybe that can be fixed? ...Or perhaps just isn't too big a problem for most use cases?

            Dean T Dean Trower added a comment - In reference to Michael Widenius's comment above, if you allow the ability to omit (just) NULL values from an index, then you effectively have fully-featured partial indexes: You just index a virtual column or expression that translates any value you don't want indexed to NULL. Nuno has said this breaks ALTER ONLINE, but maybe that can be fixed? ...Or perhaps just isn't too big a problem for most use cases?
            nunop Nuno added a comment -

            What I said is that VIRTUAL/STORED columns break ALTER ONLINE.

            Partial indexes aren't implemented yet, so I don't know if it will also disallow ALTER ONLINE or not. I don't see why it would, anyway.

            nunop Nuno added a comment - What I said is that VIRTUAL/STORED columns break ALTER ONLINE. Partial indexes aren't implemented yet, so I don't know if it will also disallow ALTER ONLINE or not. I don't see why it would, anyway.

            People

              Unassigned Unassigned
              PavelCibulka Pavel Cibulka
              Votes:
              36 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.