Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12163

Support for Materialized Views

    XMLWordPrintable

Details

    • Q1/2026 Server Development

    Description

      Problem

      A number of RDBMS system support Materialized Views, which wikipedia defines thus

      In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

      The process of setting up a materialized view is sometimes called materialization. This is a form of caching the results of a query, similar to memoization of the value of a function in functional languages, and it is sometimes described as a form of precomputation. As with other forms of precomputation, database users typically use materialized views for performance reasons, i.e. as a form of optimization.

      Materialized views which store data based on remote tables are also known as snapshots.[citation needed] (C. J. Date regards the phrase "materialized view" as a deprecated term for a "snapshot".)

      In any database management system following the relational model, a view is a virtual table representing the result of a database query. Whenever a query or an update addresses an ordinary view's virtual table, the DBMS converts these into queries or updates against the underlying base tables. A materialized view takes a different approach: the query result is cached as a concrete ("materialized") table (rather than a view as such) that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of extra storage and of some data being potentially out-of-date. Materialized views find use especially in data warehousing scenarios, where frequent queries of the actual base tables can be expensive.

      In a materialized view, indexes can be built on any column. In contrast, in a normal view, it's typically only possible to exploit indexes on columns that come directly from (or have a mapping to) indexed columns in the base tables; often this functionality is not offered at all.

      This ticket is for tracking the potential design & implementation for MariaDB.

      Background

      Various dialects are support by existing vendors, e.g.

      PostgreSQL
      https://www.postgresql.org/docs/current/rules-materializedviews.html

      CREATE MATERIALIZED VIEW table_name
          [ (column_name [, ...] ) ]
          [ WITH ( storage_parameter [= value] [, ... ] ) ]
          [ TABLESPACE tablespace_name ]
          AS query
          [ WITH [ NO ] DATA ]
      

      Oracle
      https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/CREATE-MATERIALIZED-VIEW.html

      SQL Server (Materialized and Indexed Views)
      https://learn.microsoft.com/en-us/fabric/real-time-intelligence/materialized-view
      https://msdn.microsoft.com/en-us/library/ms191432.aspx

      Suggested implementation:
      CREATE MATERIALIZED VIEW view_name as query ".." should create a table named 'view_name' and fill it with the data.
      The table should identify itself as a view:
      DROP VIEW should drop the table and the view definition.
      SHOW CREATE VIEW should show the view definition.
      Users should have access rights to the view to use it.

      Requirements

      REFRESH MATERIALIZED VIEW should update the content of the view (PostgreSQL syntax).
      This should be done atomic, non blocking and versioned. This means:

      • There should be no wait on locks when accessing a materialized view.
      • When a user first access the view in a transaction, he should get access to the latest version of the view.
      • The user will continue to use this version of the view until the end of transaction.
      • The given version of the view will be deleted after the last transaction that used it is committed/rolled back.

      The above is done to ensure that if a user are running a long transaction (hours) using a view, this should
      not block other users from accessing the latest version of the view instantly.

      General approach

      • We need a variable that says how many version or total size of stored view are allowed.
      • THD should contain a list of all views that a user has used so that they can be released at transaction end.
      • The view data should be stored in an Aria non transactional table (it will be readonly for the end user).
      • The view definition, in the .frm file, could contain an internal prefix for the name of the view. The temporary table name to store the view would be prefix#{version} where {version} is incremented for each version of the view. Prefix should be a unique, composed by the view name+an unique monotonic number (for example short_uuid in hex, which would use 16 bytes)
      • We need a "materialized view object", stored in a hash, which contains a list of all versions of a view and reference counters for them.
      • On shutdown any view materialization in progress should be stopped and deleted.
      • On recovery any view materialization that was in progress should be deleted. (The code for CREATE ... SELECT can be used for this)

      Caveats

      Problem: accessing table by table_name#{version} is tricky. As we know only table_name, we'd need to search directory by globe table_name#*. Not too many OSes/file systems can do that efficiently.
      Solution: Store the latest version of frm as table_name.frm, without suffixes. Rename it when it's not last anymore.

      Problem: We need to drop version when it comes out of use
      Solution: When transaction finishes, it releases the tables it used, decreasing share->tdc->refcount. When refcount reaches 0, the actor (connection) that observes this, drops the table.

      Problem: We need to manage the versions and know which version of the table to access
      Solution: Always store the latest version in Table Definition Cache (tdc). When the materialized view table of certain version is taken in use by the transaction, it stores a pointer to TABLE_SHARE it uses in THD (in a list). When a transaction decides to re-access the table, it first checks that list.

      Problem: We need to delete old versions during recovery, if the server is crashed
      Solution 1: Log every delete. On recovery, read the ddl log and complete it. This should also complete the unfinished renames.
      Solution 2: We can skip extra logging for starters and make the updates based just on the file system: for each materialized view found, delete its outdated versions.

      Problem: When executing DROP, we need to remove all versions of materialized view
      Solution: take MDL_EXCLUSIVE lock on the table. Once all transactions that used it will finish, they will drop the old versions, and the lock will be acquired with only the last version left.

      Suggested implementation

      REFRESH VIEW side

      We keep in mind that we do not allow concurrent REFRESH runs
      We need a registry (or use TABLE_SHARE as a registry) for holding last version number
      of the materialized view. Note that version number could also be a timestamp.

      1. Lock MDL_SHARED_NO_WRITE
      2. Fetch registry entry for my_table. Create if not found (no version of the table exists)
      3. version++ or version=timestamp; // only in memory!
      4. DDL_LOG prepare_drop
      5. create table table_name#version
      6. copy data
      7. remove entry for prepare_drop
      8. Remove old version(s) that are not in use (on disk and in registry)
      9. Unlock

      Notes:

      • MDL_SHARED_NO_WRITE will not block the reads (and also MDL_SHARED_UPGRADABLE would be enough)

      DML

      Bold actions are the new actions
      We need in THD a list of materialized views (LOMV) used by the current transaction.

      Table open:

      1. Find view in table definition cache (as normal)
      2. If table is a materialized view, check if the table is in LOMV. If yes, open this table as a table-part-of-a-view. If not found open the latest materialized view and add a record/increment reference counter in the version registry
      3. The tables are opened as any other table that is part of a view.

      Table open under LOCK TABLES:

      1. All tables are opened once during lock tables. The above procedure should work for LOCK TABLES too.
      2. The materialize view ref counter is decremented at UNLOCK TABLES and dropped if needed.

      Transaction end:
      For each table in LOMV:

      1. Unlink table from list
      2. Decrement ref counter
      3. If the ref counter == 0 and there is newer version of the table drop the table and remove the record from version registry

      Recovery

      The following will depict Solution 2, which is easier, since it eliminates the need for logging the deletions of the outdated versions.

      1. files = (enumerate the directory)
      2. for file in files:
      • if file has the materialized view marker (like {{#]}) and there is a later version of the same file, rm_table_no_locks(file)

      Future opportunities

      • For first version, lets just create view as such. We could detect the usage of a primary keys that are guaranteed to be unique and create an index for these. We could also create a primary key for any GROUP BY expression (as we would need to have the key anyway to generate the result).
      • For next version, allow the user create the view with indexes, with same syntax as in CREATE TABLE.

      Attachments

        Issue Links

          Activity

            People

              nikitamalyavin Nikita Malyavin
              alvinr Alvin Richards (Inactive)
              Votes:
              19 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.