MDEV-14992 BACKUP SERVER to mounted file system#4817
Conversation
|
|
2723322 to
1703796
Compare
9a529de to
857edeb
Compare
8149b3d to
c08d121
Compare
This introduces a basic driver Sql_cmd_backup, storage engine interfaces, and basic copying of InnoDB, Aria, and MyISAM files. For streaming, we aim to generate streams that are compatible with GNU tar --format=oldgnu, which is also supported by the built-in BSD tar on FreeBSD, macOS as well as Microsoft Windows. That is, to extract files from a backup stream, you can use the standard tar utility of the operating system, instead of anything nonstandard like xbstream or mbstream. TODO: Support partial backup and restore. This should be done by implementing some configuration parameters as well as a predicate that checks if a file name pattern should be included. TODO: Back up ENGINE=RocksDB. TODO: Refactor the crude first implementation of Aria_backup that is based on the work of Andrzej Jarzabek, to copy as many files as concurrently as possible while holding the minimum amount of locks. --- backup_target: A structured data type to represent a target directory. On Microsoft Windows, we must use directory paths because there is no variant of CopyFileEx() that would work on file handles. backup_sink: Wraps a per-thread output stream as well as storage engine specific context. handlerton::backup_start(), handlerton::backup_end(): Invoked at the start or end of a backup phase, in the thread that executes a BACKUP SERVER statement. handlerton::backup_step(): A backup step that can be invoked from multiple threads concurrently, between the execution of the corresponding handlerton::backup_start() and handlerton::backup_end() of the same phase. copy_entire_file(): A file copying service for POSIX systems. copy_file(): A sparse file-copying service for all systems. backup_stream_append_async(): A variant of backup_stream_append() where the source file region is guaranteed to be immutable after the call returns. We must not use Linux sendfile(2) for copying data files that may be modified in place, because it could introduce a race condition between a page write that runs concurrently with a child process that is reading the data from the pipe. InnoDB_backup::context: Backup context, attached to backup_sink so that context can continue to exist between the time a BACKUP SERVER releases all locks and another BACKUP SERVER starts executing, with innodb_backup pointing to the new backup, while the old backup is still being finished. fil_space_t::write_or_backup: Keep track of in-flight page writes and pending backup operation. We must not allow them concurrently, because that could lead into torn pages in the backup. fil_space_t::backup_end: The first page number that is not being backed up (by default 0, to indicate that no backup is in progress). fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be covered by fil_space_t::backup_end. This is the unit of "page range locking" during InnoDB backup. log_sys.backup: Whether BACKUP SERVER is in progress. The purpose of this is to make BACKUP SERVER prevent the concurrent execution of SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size when innodb_log_archive=OFF. log_sys.archived_checkpoint: Keep track of the earliest available checkpoint, corresponding to log_sys.archived_lsn. This reflects SET GLOBAL innodb_log_recovery_start (which is settable now), for incremental backup. buf_flush_list_space(): Check for concurrent backup before writing each page. This is inefficient, but this function may be invoked from multiple threads concurrently, and it cannot be changed easily, especially for fil_crypt_thread().
| /* Collect all tablespaces that have been created before our | ||
| start checkpoint. Newer tablespaces will be recovered by the | ||
| innodb_log_archive=ON recovery. | ||
|
|
||
| If a tablespace is deleted before step() is invoked, the file | ||
| will not be copied, and a FILE_DELETE record in the log will | ||
| ensure correct recovery. | ||
|
|
||
| If a tablespace is renamed between this and end(), the recovery | ||
| of a FILE_RENAME record will ensure the correct file name, | ||
| no matter which name was used by step(). */ | ||
| mysql_mutex_lock(&fil_system.mutex); | ||
| for (fil_space_t &space : fil_system.space_list) |
There was a problem hiding this comment.
This assumes that fil_system.space_list actually contains the metadata of all files. This is not guaranteed ever since eb1f8b2 was implemented. It looks like we will need a flag that indicates if fil_system.space_list may be incomplete, or we must revise that fix so that we will create file system metadata but not actually open any files. Thanks to @Thirunarayanan for pointing this out.
There was a problem hiding this comment.
a7570c4 addresses this. It is not an elegant solution, but unfortunately the fil_node_t::name must contain the full file name, and it can only be determined by potentially opening and reading .isl files.
There was a problem hiding this comment.
Sorry, https://buildbot.mariadb.org/#/builders/949/builds/11498/steps/6/logs/stdio is a deadlock in backup.backup_stream due to this change. I think that we may need a preparatory step that will be executed before any backup lock is acquired.
|
I plan to rebase this once #5070 has been merged up to the The ultimate merge target is While rebasing, I will write a description based on the commit message of 4769a43, but mentioning actual MDEVs for the outstanding work. Soon after the rebase, we can include #5140 so that this can be tested more conveniently. |
No description provided.