[MDEV-15140] Implement Partial / Filtered Indexes Created: 2018-01-31 Updated: 2024-01-18 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Data Definition - Create Table, Optimizer, Storage Engine - InnoDB |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major |
| Reporter: | Pavel Cibulka | Assignee: | Ralf Gebhardt |
| Resolution: | Unresolved | Votes: | 32 |
| Labels: | index | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
It would be nice, if MariaDB has some way to index only subset of values / rows. For example: This feature is called Partial Index or Filtered Index in other databases. There is no way to create this index in MySQL / Mariadb. Closest thing would be using another table and trigger. This complicate a lot of things and isn't much useful. Feature request in MySQL: https://bugs.mysql.com/bug.php?id=76631 |
| Comments |
| Comment by Nuno [ 2019-06-18 ] |
|
Was just looking to see if this existed in MariaDB. Not only that, SQL Server (at least) also has a very useful thing with this, where you can create a UNIQUE index with a WHERE clause, so that the uniqueness only applies for that criteria. This is great if we want, for example, only 1 active entry per group, while all other entries for those groups are disabled. Hopefully this will be considered. Thank you! |
| Comment by Federico Razzoli [ 2019-08-17 ] |
|
Please note how this feature makes life easier with temporal tables. An index could be on:
All these make sense to optimise certain analytics queries and would potentially avoid a relevant waste of disk space. |
| Comment by Dean Trower [ 2019-09-01 ] |
|
Nuno, for your 2nd case (UNIQUE index with a WHERE clause), you can already achieve that by creating a generated column (either VIRTUAL or STORED is ok) that evaluates to a unique index key for those rows matching your desired WHERE clause, and evaluates to NULL for all other rows. Then create a UNIQUE index on that column: The uniqueness constraint doesn't apply to the NULL rows, but does apply to all the others. It's still a waste of space, though, as the index still does reference every row. (BTW in MySQL since v8.0.13 you can now index directly on functions/expressions, so you don't even need to create the virtual column in that DB). |
| Comment by Nuno [ 2019-09-01 ] |
|
Thank Dean. However, > It's still a waste of space, though, as the index still does reference every row. This is exactly what I want to avoid But yes, VIRTUAL/PERSISTENT is perfect for the condition I was mentioning of "1 active entry per group", etc... |
| Comment by David Taylor [ 2020-04-08 ] |
|
Curious if any progress has been made on this. I have utilized this multiple times with MSSQL. It is very useful when you need some type of work queue, so for example if you have a nullable processed_time (or similar) field that indicates it still needs to be processed if the value is NULL. You can create a filtered index on this property to only pick up the NULL rows so that your query will be very performant when querying for new tasks. |
| Comment by Marko Mäkelä [ 2020-09-16 ] |
|
I would like to ‘dumb down’ InnoDB if possible and rather reduce, not increase, the number of special index types. But, this feature seems to be simply make a secondary index cover a subset of the rows in the table. It sounds fairly simple, especially once we have per-record transaction identifiers (MDEV-17598) and can simplify the purge of history accordingly. I have been toying with a similar idea in the past, for omitting NULL values from an index. Because I would like to eliminate the internal InnoDB data dictionary in some form (MDEV-11655) at some point of the future, I would rather not extend the InnoDB data dictionary with this. The metadata would have to be added to the .frm file. InnoDB cannot really trust .frm files for internal operations (such as rollback and purge) until I am pretty sure that also the query executor and planner would have to be aware of this feature. |
| Comment by Marko Mäkelä [ 2022-02-24 ] |
|
Even though |
| Comment by Johannes B [ 2022-02-25 ] |
|
Hi @marko , what are the implications of: > The metadata on indexes needs to be written in transactional storage, and I would not want to extend the current InnoDB `SYS_INDEXES` or `SYS_FIELDS` tables to duplicate any new type of metadata. Does this mean that this issue is not to be implemented any more and rather becomes a WONTFIX? |
| Comment by Tom Conlon [ 2022-03-13 ] |
|
Interbase/Firebird was first DB to support blobs and other early items (domains, cursors, etc) and usually closely folllows ansi standards. CREATE [UNIQUE] [ASC[ENDING] | DESC[ENDING]] INDEX <index name> ON <table name> COMPUTED BY ( <value expression> ) |
| Comment by VAROQUI Stephane [ 2022-03-14 ] |
|
The more i think on it the more i tend to think that some tables should force ROW FORMAT login. A new CDC plugin binlog synchronus or assynchronus consumer would update an extra materialized table in case of partial index , the parser rewrite queries by injecting extra subquery given by the plugin in relation to a specific table primary key. This could work for materialized view, partial index and external multi table index like sphinx where the consumer would update rt_index using such plugin. Table definition would store a list of CDC plugin to call with row based images of the records trigger could be a specific implementation of such plugins |
| Comment by Michael Widenius [ 2022-07-18 ] |
|
Benefit of a partial index:
Disadvantages:
The one case where partial indexes makes sense is to remove NULL values. I suggest that we start by adding support for: |
| Comment by Federico Razzoli [ 2022-07-18 ] |
|
I wouldn't say that NULL is the one case. It wouldn't cover (most of the times) the most important use case I can think of: data warehousing. Fact tables may have "optional dimensions". Of course they could be represented with NULL, but that would makes dwh queries even longer than they usually are. Instead they correctly add a nully row into dimensions: INSERT INTO dim_city (city_id, city_name, country_id) VALUES (1, '', NULL); In this case, in fact tables I'd want to exclude rows WHERE city_id = 1 OR device_id = 1 OR ... |
| Comment by Nuno [ 2022-07-18 ] |
|
Yep - totally agree. 90% of a table might sometimes have a "city_id = 1" (using Frederico's example), but I want to index the other 10% of that table that has other city_id values. – I've just read a comment I wrote back in 2019, saying: The problem with VIRTUAL/PERSISTENT is that is totally breaks ALTER ONLINE for that table, which is a big problem, so I'm actually avoiding using this in the future, and migrating away from using it completely. |