[MDEV-11658] Simpler, faster IMPORT of InnoDB tables Created: 2016-12-23  Updated: 2023-11-30

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: None

Type: New Feature Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 20
Labels: innodb, portability

Issue Links:
Blocks
blocks MDEV-11633 Make the InnoDB system tablespace opt... Open
blocks MDEV-26141 Crash in instant ALTER TABLE after IM... Confirmed
is blocked by MDEV-26137 ALTER TABLE IMPORT enhancement Closed
is blocked by MDEV-29694 Remove the InnoDB change buffer Closed
Duplicate
is duplicated by MDEV-20974 Don't require .cfg files to import In... Closed
PartOf
includes MDEV-15225 Can't import .ibd file with temporal ... Closed
includes MDEV-20973 Allow write operations to an InnoDB t... Open
Relates
relates to MDEV-11659 Move the InnoDB doublewrite buffer to... Open
relates to MDEV-12288 Reset DB_TRX_ID when the history is r... Closed
relates to MDEV-15020 Server hangs due to InnoDB persistent... Closed
relates to MDEV-15049 Importing InnoDB table from 10.1 to 1... Closed
relates to MDEV-17598 InnoDB index option for per-record tr... Open
relates to MDEV-18543 IMPORT TABLESPACE fails after instant... Closed
relates to MDEV-25265 ALTER TABLE…IMPORT TABLESPACE fails a... Closed
relates to MDEV-11657 Cross-engine transaction metadata Open
relates to MDEV-12437 Import tablespace causes replication ... Open
relates to MDEV-15228 Document how to upgrade old temporal ... Closed
relates to MDEV-18329 Backup/Restore Single Database as Dif... Open
relates to MDEV-18762 Support easy restore of partial backup Closed
relates to MDEV-20930 Document that SHOW CREATE TABLE on In... Closed
relates to MDEV-27274 DROP TABLE after failed IMPORT TABLES... Closed
relates to MDEV-27735 Deprecate the parameter innodb_change... Closed
relates to MDEV-29144 ER_TABLE_SCHEMA_MISMATCH or InnoDB: F... Closed

 Description   

Streaming, Faster Exporting and Instant Importing of Tables

Currently, tables are exported like this in MySQL 5.6 and 5.7:

SHOW CREATE TABLE t;
FLUSH TABLE t FOR EXPORT;
/* access the file system to copy t.ibd */
UNLOCK TABLES;

The export syntax is somewhat awkward, and it assumes that the user has access to the file system. We might support an alternative that streams an IMPORT statement with the metadata and encoded file contents to the client connection.

The import procedure is even more awkward:

CREATE TABLE t…; /* using the output from SHOW CREATE TABLE */
ALTER TABLE t DISCARD TABLESPACE; /* effectively corrupts the database! */
/* copy the file t.ibd to the file system */
ALTER TABLE t IMPORT TABLESPACE;

As an alternative to this, we can simply support the following:

SHOW CREATE TABLE t;/* finds user-copied *.frm and *.ibd files */

To allow importing of files in a streaming-friendly way that does not encourage direct manipulation of the data directory, we could also support the following:

CREATE TABLE t…IMPORT DATA [base64 or base85 dump];

Such statements could be the output of a new streaming-friendly EXPORT statement.

What currently necessitates a rewrite of the data file at IMPORT?

The InnoDB data files contain some identifiers that bind the file to the containing instance:

  • Redo log sequence number (LSN) on each page (needed for crash recovery)
  • The DB_TRX_ID transaction identifier in each clustered index record
  • The PAGE_MAX_TRX_ID transaction identifier in each secondary index leaf page
  • The 32-bit tablespace ID (space_id) on every page and in every BLOB pointer

At IMPORT we must ensure that all these identifiers are compatible with the importing InnoDB instance.

Instant IMPORT

Ever since MySQL 5.6 implemented WL#5522, IMPORT reads and writes every page in the to-be-imported tablespace. Before MySQL 5.7 introduced WL#7142 and the MLOG_FILE_NAME record, the IMPORT procedure was also not crash-safe, because crash recovery could mistake a not-yet-imported file for a real data file and even attempt to apply mismatching redo log records to it.

In MariaDB we could relatively easily allow an instant variant of import that would not adjust anything in the data file. Only in the case when the space_id is already used by the importing system, we would refuse a quick import.

The trick is that each *.ibd file would keep track of the maximum contained TRX_ID and LSN as dynamic persistent metadata, persisted in a header page on EXPORT or shutdown. If the current mysql.transactions.start_id (MDEV-11657, trx_sys->max_trx_id) or the current redo log sequence number (log_sys->lsn) in the importing server instance is less than the values found near the start of the .ibd file, we would simply move the server’s settings to a larger value. No need to adjust anything in the file! (Some caution is needed when MAX_LSN or MAX_TRX_ID of the being-imported file is within an ‘active window’ of the importing server.)

With some more effort, we can also remove the requirement to avoid space_id collision on import. This would require that space_id be replaced with fil_space_t* in various parts of InnoDB code and all space_id lookups be removed. Then, the space_id in the data files would essentially become "don’t care" fields, or fil_space_t::id could become a non-unique ‘nonce’ that could be used in consistency checks. We might also continue to require the space_id to be unique, but only write the unique space_id to the first page of the tablespace file, not to each page header.

Thanks to MDEV-12288, the DB_TRX_ID could be reset when the history is no longer needed. After a proper EXPORT, all DB_TRX_ID columns would be 0, and would not pose any problem. The PAGE_MAX_TRX_ID is trickier, because it cannot be easily be reset due to the complexity that is involved (before resetting, we would have to look up every clustered index record that the secondary index page is pointing to).

Maybe we should replace the PAGE_MAX_TRX_ID with per-record DB_TRX_ID in secondary indexes and let purge reset those fields as well. This would also properly fix MySQL Bug#67044/Oracle Bug#14704286 Secondary index updates make consistent reads do O(N^2) undo page lookups), which Oracle ‘fixed’ by merely making the lengthy operations interruptible.

No instant EXPORT for InnoDB

Normal FLUSH TABLES will not ensure that no pending transactions exist on a table, and it will also not prevent the InnoDB purge operation (or XA ROLLBACK) from modifying the *.ibd file while it is being copied.

We definitely want an EXPORT that ensures that there are no active or XA PREPARE transactions on the table, and that any delete-marked garbage has been purged. This (apart from server shutdown) is the only way to guarantee that InnoDB cannot modify the *.ibd file while the user is copying it.

To allow quick import after a normal server shutdown (innodb_fast_shutdown=1), InnoDB could keep track of the number of uncommitted and unpurged records as persistent dynamic metadata. If both counts are zero, an instant import is possible. If the ‘uncommitted’ count is not zero, we cannot safely import anything (the data would essentially be READ UNCOMMITTED, and secondary indexes could be out of sync with the clustered index). If the ‘unpurged’ count is nonzero, import will have to delete all delete-marked records.

Optional offline clean-up step after EXPORT

If it is not desirable to move the LSN or TRX_ID forward in the importing server instance, an external utility can be developed to reset the fields in the data file.
This external utility could also clear (or punch_hole) unallocated pages to reduce the physical size of the file.
This utility could set a bit in FSP_SPACE_FLAGS to indicate whether the file was completely cleaned.

Maybe we should write the server GUID somewhere in the first 3 pages of an InnoDB file. When a file is being imported from the same server instance, it can be assumed that no adjustments will be necessary.



 Comments   
Comment by Marko Mäkelä [ 2017-02-10 ]

The doublewrite buffer depends on the space_id for identifying pages. Thus, we cannot make the space_id in data files entirely “don’t care”. It seems that unless MDEV-11659 adds some extra metadata to the doublewrite buffer, before flushing we should write the current space_id to the files. For example, if a file with space_id=15 gets imported into an InnoDB instance as space_id=10, all pages would initially carry the space_id 10 in each page header at byte offset 34. Every time a page is flushed to the data file, the new space_id 15 would be written. The true space_id would always be present in the tablespace header (page 0).

Comment by Andreas Kirbach [ 2017-06-21 ]

Being able to quickly import InnoDB tables would be a great feature as this could enable almost instant cloning of whole databases on CoW filesystems

Comment by Marko Mäkelä [ 2017-10-06 ]

In MDEV-11657 I suggested the idea of splitting the InnoDB undo log into tablename.ibu files. If we go down that route, EXPORT would merely have to merge the change buffer and prevent any purge activity while the .ibd and .ibu files are copied. Then, IMPORT could apply the undo logs. It would be possible to EXPORT while active transactions exist on the table. Those transactions would be rolled back by the IMPORT.

Comment by Marko Mäkelä [ 2018-02-08 ]

If we can fix MDEV-13603, then thanks to MDEV-12288 we should always have DB_TRX_ID=0 when importing tables. This would mean that we should only have problems with FIL_PAGE_SPACE_ID or FIL_PAGE_LSN when importing. As far as I understand, the PAGE_MAX_TRX_ID in secondary index leaf pages only matters for performance. If we have some PAGE_MAX_TRX_ID in the future, then when accessing records in that page, we will always have to look up the corresponding record in the primary key index.

The FIL_PAGE_SPACE_ID we could fix by removing most reads of that field from our code, and passing fil_space_t* in all calls and data structures. Then, we would have the "real" tablespace ID only in the first page, at byte offset 38 (right after the FIL_SPACE_PAGE_ID). The FIL_PAGE_LSN we can avoid adjusting if we somewhere keep track of the maximum LSN of the file. When importing, we would only let log_sys->lsn jump forward to this number.

Comment by Marko Mäkelä [ 2018-04-23 ]

After MDEV-12266 removed many fil_space_t lookups, it should be feasible to remove references to the tablespace ID on the file pages.

In the exporting instance, it is not strictly necessary to keep track of the maximum LSN and transaction ID of each file. We can simply record the maximum LSN and TRX_ID of the exporting server, and on import, ensure that the importing server is not behind those numbers.

Comment by Laxman Prabhu [ 2019-08-10 ]

Interesting feature. The alter table import tablespace also rewrites Index Ids at least in mysql 5.7. IndexIDs can also be different between the importing and exporting server.

Comment by Marko Mäkelä [ 2019-09-26 ]

laxmanprabhu, you are right, the PAGE_INDEX_ID in each index page header would be adjusted as well. But, I think that we could relax the checks around those somehow. One idea would be that we would require each page of an index to contain the same PAGE_INDEX_ID as the root page does. It would not necessarily have to match the dict_index_t::id. As far as I can tell, the main use of the identifier is the adaptive hash index, which I would remove in MDEV-20487.

In MDEV-4259 I wrote down an idea how to store the index metadata inside the .ibd files, to eventually allow the InnoDB data dictionary tables to be removed. The directory inside the .ibd files would store the root page numbers for each secondary index. (For the clustered index, it is always page number 3.)

With all this in place, when creating an index, we could just assign a random PAGE_INDEX_ID value, to allow us to catch corruption where an index page is pointing to a page in a different index.

Comment by Geoff Montee (Inactive) [ 2019-11-01 ]

marko,

With the changes that you're envisioning, would the .cfg file be completely unnecessary, or would it still be needed for some reason? (i.e. see MDEV-20974)

Comment by Marko Mäkelä [ 2019-11-05 ]

GeoffMontee, yes, the .cfg files would no longer be read or written. Currently, the .cfg files store information about secondary index root page numbers, plus some InnoDB data dictionary information that is derived from the original .frm file.

Some existing code tries to guess the root page numbers when the .cfg file is not available. Since the aim of this task is instantaneous operation, we can only read at most a small number of pages of the data file, and hence we should store the information somewhere in the .ibd file.

MDEV-15225 would allow the original .frm file to be used when importing tables, thus making most of the .cfg file redundant. Until we extend the .ibd file format to store information about the secondary index root page numbers (which I would do as part of this task), the .cfg files would still be needed.

Comment by Geoff Montee (Inactive) [ 2020-03-30 ]

MDEV-20974 was closed as a duplicate of this task, since this task will remove the need for .cfg files.

Comment by Marko Mäkelä [ 2021-03-26 ]

Making export wait for purge would also fix MDEV-25265.

Comment by Marko Mäkelä [ 2021-10-01 ]

While testing MDEV-18543, mleich got the server to crash during a rename operation in the following scenario that involves any ALTER TABLE that uses ALGORITHM=COPY:

  1. ALTER TABLE…ALGORITHM=COPY is almost finished, and renames the old copy of the table to a different name, for the purpose of dropping it.
  2. The test environment copies a file by that name into the data directory.
  3. ALTER TABLE…ALGORITHM=COPY attempts to rename the intermediate copy of the table to the final name, but it finds that a file by that name meanwhile exists.
  4. (ALTER TABLE…ALGORITHM=COPY would have dropped the original table that had been renamed.)

To prevent this scenario, we could introduce a separate PREPARE IMPORT step that would acquire an exclusive metadata lock on the table name and ensure that no table by that name exists. So, the workflow would be as follows:

  1. PREPARE IMPORT tablename;
  2. copy the tablename.* files into the data directory
  3. IMPORT tablename;
Comment by Marko Mäkelä [ 2022-10-12 ]

After MDEV-27735 removes the code that writes change buffer records, the change buffer bitmap pages will be unused. I think that we can extend the MDEV-19534 innodb_checksum_algorithm=full_crc32 format as follows:

  1. A bit in FSP_SPACE_FLAGS would indicate the new format.
  2. FIL_PAGE_SPACE_ID in each page header and BTR_EXTERN_SPACE_ID in each BLOB pointer will be ignored, or they may be validated against the contents of that field in page 0.
  3. Only FSP_SPACE_ID in page 0 will contain the tablespace identifier. This field would be updated when importing a tablespace.
  4. The change buffer bitmap pages (pages 1+n*innodb_page_size) will be initialized with NUL bytes.
  5. In page 1 we would store a transactional (undo-logged) ROW_FORMAT=REDUNDANT record, comprising:
    • the root page numbers of all secondary indexes (could be a VARBINARY column, with a length that is a multiple of 4 bytes)
    • server ID, table dictionary version (see MDEV-4259)
    • possibly, something that identifies the MariaDB Server version that last modified this record

We should also store the maximum FIL_PAGE_LSN and transaction ID somewhere, along with the purge status. These fields would be cleared by the special slow export step. If we do not know the maximum LSN or transaction ID, we will still be able to import the tablespace without writing much to it; we’d merely have to read many pages.

This should make the .cfg files redundant; we will only need .frm and .ibd files for exporting and importing tablespaces.

Comment by Marko Mäkelä [ 2023-02-01 ]

MDEV-26137 has been filed for an SQL level change that would allow InnoDB tables to be implicitly imported or discovered. One would just issue a statement like SELECT * FROM tablename or CHECK TABLE tablename after copying some files to the data directory:

  • tablename.frm (the table definition, with the correct timestamp format (MDEV-15225) and whatnot)
  • tablename.ibd (the InnoDB data copied between FLUSH TABLES tablename FOR EXPORT and UNLOCK TABLES
  • tablename.cfg (additional InnoDB metadata generated by FLUSH TABLES tablename FOR EXPORT)

The scope of this ticket would be a change of the .ibd file format so that:

  • no .cfg file will be needed (the index root page numbers will be explicitly stored in the .ibd file), and
  • the file can be imported instantly, without rewriting all pages.

I propose that any other changes, such as introducing a "streaming" variant of EXPORT or IMPORT that would transfer the contents of .frm and .ibd files over a socket connection, would be outside the scope of this ticket.

Comment by Rick James [ 2023-04-11 ]

Will this mechanism apply to a PARTITIONed table? And/or a single PARTITION?

Comment by BJ Quinn [ 2023-08-10 ]

With MDEV-26137 merged, do we think that this enhancement might make 11.3 now? This is a very exciting feature for lots of use cases for us.

Comment by BJ Quinn [ 2023-08-10 ]

And one question, would this feature only work for NEW tables created with a version of MariaDB with this feature, since it's a format change to innodb? In other words, for tables created prior to this enhancement, you'd still need a .cfg file? That's still great, just thinking through how to use this properly.

Comment by Marko Mäkelä [ 2023-08-17 ]

rjasdfiii, the design of partitioned tables is somewhat complicated design, and it could become even more so after MDEV-22168 is implemented.

bjquinn, it should be possible to import tables without a .cfg file already now. Under some circumstances, secondary indexes in the imported file would be ignored. The clustered index root page number in .ibd files is always 3 in MariaDB or MySQL up to 5.7.

The required .ibd file format changes are somewhat challenging to implement, even though MDEV-29694 freed up some space by removing the change buffer and thus making the change buffer bitmap pages (page number 1+n*innodb_page_size) completely unused.

I think that the changes of dictionary information (secondary index root page numbers and index identifiers) will need to be covered with an undo log record. A thinkable solution would be a pseudo ‘table’ starting at page 1, complete with DB_TRX_ID, DB_ROLL_PTR to allow rollback.

Comment by BJ Quinn [ 2023-08-18 ]

marko, thank you. What I had in mind was using this in conjunction with MDEV-26137 to simply rsync an .ibd + .frm file in alone, w/o a .cfg file, and just expect them to show up. If I'm reading MDEV-26137 right, the way to get the table to show up is to try to open it, is there any other way to force a directory scan to pick up all new tables?

Anyway, the limitations I understand right now on just copying in .ibd/.frm's are the following. Please clarify if I'm missing something:

  • Need 11.2.1 RC, which is not yet released
  • Source table needs to be created in 11.2.1 or later? Or at least needs to have at one point been running on an 11.2.1 server, or perhaps needed an OPTIMIZE TABLE while running on 11.2.1+ to get the latest table format?
  • Need the .ibd and .frm, but not .cfg. Just rsync over the .ibd and .frm, no need to FLUSH TABLE ... FOR EXPORT on the source server (though obviously preferred if possible, otherwise it's like you're getting a copy of the table with READ UNCOMMITTED), also no need to ALTER TABLE DISCARD/IMPORT TABLESPACE.
  • Partitioned table will not work
  • Will lose all indexes other than the primary key under certain circumstances (which indexes, which circumstances?)
  • Is there a way to get a .cfg file w/o FLUSH TABLE ... FOR EXPORT?
  • If any of the required criteria is not met, you will still not need to DISCARD/IMPORT tablespace, but you will need the .cfg (i.e. FLUSH TABLE ... FOR EXPORT was run and you still have the .cfg file from that)
Comment by Marko Mäkelä [ 2023-08-18 ]

bjquinn, at some point in MDEV-26137 there was a thought that user-copied data files would be imported automatically, just like they will be for the MyISAM storage engine. For now, that plan was abandoned, and a statement like ALTER TABLE t2 IMPORT TABLESPACE will be necessary. Unlike MyISAM, InnoDB still needs to update its internal data dictionary tables to be able to use the table in transactions.

The last substantial changes to InnoDB data file formats were made in MariaDB Server 10.4, by MDEV-12026 and MDEV-15562. MDEV-29694 removed the InnoDB change buffer in MariaDB Server 11.0, but that did not really change the .ibd file format or affect importing tablespaces. MariaDB Server 11.2 should be able to import any old data files (from MySQL up to 5.7 or any MariaDB version).

The "instant IMPORT" that I want to do in this ticket would require a file format change. Importing older-format files would still involve reading and rewriting each page in the .ibd file.

Comment by BJ Quinn [ 2023-08-18 ]

Yeah, I had followed a few of these tickets and thought I'd seen some stuff that I couldn't find anymore, maybe that's because MDEV-26137 changed.

So if I understand correctly, MDEV-26137 allows you to just copy the .ibd/.frm/.cfg in place and do an ALTER TABLE t2 IMPORT TABLESPACE, even though t2 doesn't exist in the internal InnoDB data dictionary? So no need for CREATE TABLE. But SELECT * FROM or CHECK TABLE would not be sufficient alone (as was initially planned).

Is there a ticket I should follow for the auto-import feature dropped from MDEV-26137?

What are the current limitations around importing a table without a .cfg file? Does this ticket still intend to resolve those situations?

Thanks!

Comment by BJ Quinn [ 2023-09-06 ]

Hey @marko just seeing if you had an update on my questions above. Just trying to see if I can properly understand the implications of this feature and what the limitations are, including what the limitations are in 11.2 before this ticket, MDEV-11658, is resolved. Thanks!

Comment by BJ Quinn [ 2023-10-16 ]

We've been on MariaDB 10.5 or earlier at multiple sites for a while now, and we were looking into upgrading to later versions, in particular to start benefiting from these improvements in copying around InnoDB data. One of our major use cases is a large development database that needs to be refreshed frequently and quickly either from the production data or from zfs snapshots taken over time of historical data. Besides the fact that we don't always want current data, the data diverges at this point while doing development. We have multiple dev databases on the same server. Anyway, all of these tickets like this one about streamlining this process are very interesting to us. The holy grail would be the original intent of this ticket (instant imports by just rsync'ing an .ibd and .frm file into a database folder), though anything that streamlines the current process would be welcomed. However, we started testing latest 11.x, and noticed that we've actually regressed in functionality because now since 10.5.13, I believe, it refuses to import tablespace for tables with secondary indexes. That actually worked fine for us before but this new restriction is a regression for us. Any thoughts?

Comment by Marko Mäkelä [ 2023-10-16 ]

I think that an auto-import on DML, without involving any DDL statement, would be problematic for locking and logging.

The fix of the crashing bug MDEV-26131 was to refuse to import secondary indexes if a .cfg file is missing. The .cfg file would be created by FLUSH TABLES…FOR EXPORT.

Comment by BJ Quinn [ 2023-10-16 ]

Yes, but for our use case, creating a .cfg file is impractical. We can't lock the tables on the entire production system in order to make a copy of it or to snapshot it regularly in order to have selective copies we can make later on by time/date. This mechanism worked perfectly well for us until the fix in question broke our process.

This is also why I was asking previously about the elimination of the .cfg requirement, which was mentioned as one of the outcomes of the resolution of this particular ticket.

I can live without auto import, but I can't require locking tables in order to make the copies I need, that's actually a regression from our existing process.

Comment by Rick James [ 2023-10-16 ]

@BJQuinn - Instead of speeding up dump/srestore,... Have you tried LVM? Put the database in a disk subsystem, then "snapshot" it whenever you want to capture it. The snapshot is virtually instantaneous. You could either keep multiple copies on the same server or leisurely rsync them to some other server (and destroy the snapshot). Since LVM employs "copy-on-write", each snapshot grows only as the two copies diverge.

Comment by BJ Quinn [ 2023-10-16 ]

I'm not using dump/restore, I'm copying the raw .ibd/.frm files (w/o .cfg files since I can't afford to lock the production database regularly), regularly creating a .sql file of the structure of all the tables, then on demand, running a script to create a new database, create all the empty tables from their definitions, then discarding tablespaces on all tables in the database, copy .ibd/.frm files in from either the production database or a snapshot via zfs, and then importing the tablespace for all the tables. This is orders of magnitude faster than dump/restore. This still works on < 10.5.13, and I guess would still work on >= 10.5.13 as long as you only had a primary index and no others. In 10.5.13 there was a limitation added that now prevents tables with secondary indexes from being imported this way, unless you got the .cfg file using flush table for export, which locks the table.

This JIRA issue at one point intended to eliminate .cfg files altogether, which I would assume would mean a .cfg file could not be required to import the tablespace for tables. Alternatively, I could accept an option where I could get a .cfg or equivalent w/o locking the source table.

Comment by Rick James [ 2023-10-16 ]

@BJQuinn – LVM should be faster and less invasive that that. But it does snapshot the entire filesystem. So it is better to have that entire database isolated on its own mount. COW is the secret to its speed – and that depends on whether the OS and filesystem support COW.

Comment by BJ Quinn [ 2023-10-16 ]

Thanks for offering some ideas! For various other purposes, I prefer zfs over LVM. But the problem isn't how to get the snapshotted data, I can get that. It's how to import it into a different database on the same mariadb instance. If I snapshot the entire volume (which is what I'm doing with zfs currently as well), then I either have to selectively copy files off the volume to a new database in a running server, at which point I run into the problem I'm describing. Or I go ahead and take the entire volume, or the entire /var/lib/mysql folder at least, ibdata file and all, and copy it over. But then that wouldn't allow me to load data into an existing instance of mariadb, it would require wiping out the entire server.

Comment by Rick James [ 2023-10-16 ]

@BJQuinn - Sorry, I forgot to mention that it would have to be a separate instance of MariaDB, optionally even a different version (if not too different).

Comment by BJ Quinn [ 2023-10-17 ]

Thanks, yes, what I'm trying to do is create on demand, quick clones of databases on the same database instance. This worked pretty well before 10.5.13, and this JIRA issue seemed to be poised to make it work even better, at least at one point.

Comment by BJ Quinn [ 2023-10-26 ]

Any updates on workarounds for this? This is getting to be more and more difficult to work with as we upgrade more servers past 10.5.13. I would be happy with any of the following solutions:

  • an option that doesn't require .cfg files for tables with secondary indexes
  • an override option for the "fix" in MDEV-26131
  • a way to get a .cfg file that doesn't require locking the table in question
  • any other ideas anyone has

We have some databases that are many terabytes. mysqldump and similar aren't workable for that size. We often need just certain tables to be copied, or certain databases, and we often need them copied from one database to another on the same server. There's really no alternative to copying .ibd files around at the filesystem level for this use case, and I can't lock these tables to get a .cfg file the standard way. This all worked before 10.5.13 and was not undocumented behavior. I gained interest in this ticket and MDEV-11658 which held promise to streamlining this purpose even further, but I'm now in a situation where my existing process doesn't work at all.

Thanks!

Comment by Marko Mäkelä [ 2023-11-03 ]

Sorry, there has been no update on this. Designing and implementing a file format change is challenging. When it comes to partitioned tables, MDEV-10568 is an omission compared to MySQL 5.7.

The only alternative that we currently have is a faster load of SQL dumps, which was implemented in MDEV-515 and MDEV-24621. MDEV-16281 could do more in that area.

Comment by BJ Quinn [ 2023-11-03 ]

Understood, thank you for the feedback!

In the meantime, could we get a way to override MDEV-26131? I understand getting a new improved way to do this is a lot of work. But optionally providing a way to override MDEV-26131 should be simpler, and should also be considered a feature regression. I know there were some reasons to implement MDEV-26131, but we have copied countless tables with secondary indexes using this mechanism and never ran into the issue that caused that fix to be put in place, so it's pure regression for us.

Comment by Sergei Golubchik [ 2023-11-04 ]

I'd expect that internally InnoDB wouldn't need a table lock to create a .cfg file. But a lock might be needed so that you wouldn't copy a table which is being actively modified at the same moment.

Also, perhaps, it might be possible to recreate secondary indexes without a cfg file — that is, InnoDB cannot use them, but perhaps it can drop and create them again.

Comment by BJ Quinn [ 2023-11-04 ]

Generally we're pulling .ibd files from filesystem snapshots. So: the file is frozen in time and not actively changing during the copy, but obviously a frozen in time copy might have stuff that's not flushed to disk, etc. But that's fine for our purposes.

If we could get a .cfg file somehow without the lock, that would at least get us back to where we were. We would still be very excited for further improvements in JIRA tickets like this, but at least we'd be back to the functionality we had before.

Comment by BJ Quinn [ 2023-11-30 ]

I'm also now running into the issue that the ".cfg file or no indexes" rule from MDEV-26131 doesn't play nicely with MDEV-26137 since all I have is an .ibd file, I can't drop the indexes it wants me to drop, so I can't import the file at all.

Is there any way we can get an optional workaround for MDEV-26131? I'm running into this problem constantly now.

Comment by Marko Mäkelä [ 2023-11-30 ]

Would the following be an acceptable solution?

  1. CREATE TABLE without any indexes.
  2. ALTER TABLE…DISCARD TABLESPACE;
  3. Copy the .ibd file to the data directory.
  4. ALTER TABLE…IMPORT TABLESPACE;
  5. ALTER TABLE…FORCE, ADD UNIQUE INDEX(u), ADD INDEX(h), ADD INDEX(j);

The last step should copy the data of the old file and create the secondary indexes. If you do not need the indexes, you could omit the last step, but then the space that is occupied by the indexes would not be reclaimed.

Comment by BJ Quinn [ 2023-11-30 ]

Thanks for the suggestion! I have in fact tried that successfully in some cases. The problem lies with larger tables where re-adding the indexes becomes very time consuming, on the order of hours or longer. One of our primary goals for shuttling around .ibd files like this is to make fast copies of large datasets. Rewriting all their indexes takes the "fast" part out of it.

Generated at Thu Feb 08 07:51:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.