BACKUP / RESTORE to disk

Syntax

-- core commands
BACKUP | RESTORE 
--- what to backup/restore (or exclude)
TABLE [db.]table_name           [AS [db.]table_name_in_backup] |
DICTIONARY [db.]dictionary_name [AS [db.]name_in_backup] |
DATABASE database_name          [AS database_name_in_backup] |
TEMPORARY TABLE table_name      [AS table_name_in_backup] |
VIEW view_name                  [AS view_name_in_backup] |
[EXCEPT TABLES ...] |
ALL [EXCEPT {TABLES|DATABASES}...] } [,...]
--- 
[ON CLUSTER 'cluster_name']
--- where to backup or restore to or from
TO|FROM 
File('<path>/<filename>') | 
Disk('<disk_name>', '<path>/') | 
S3('<S3 endpoint>/<path>', '<Access key ID>', '<Secret access key>', '<extra_credentials>') |
AzureBlobStorage('<connection string>/<url>', '<container>', '<path>', '<account name>', '<account key>')
--- additional settings
[SETTINGS ...]
[ASYNC]

See “command summary” for more details of each command.

Configure backup destinations for disk

Configure a backup destination for local disk

In the examples below you will see the backup destination specified as Disk('backups', '1.zip').
To use the Disk backup engine it is necessary to first add a file specifying the backup destination at the path below:

/etc/clickhouse-server/config.d/backup_disk.xml

For example, the configuration below defines a disk named backups and then adds that disk to the allowed_disk list of backups:

<clickhouse>
    <storage_configuration>
        <disks>
<!--highlight-next-line -->
            <backups>
                <type>local</type>
                <path>/backups/</path>
            </backups>
        </disks>
    </storage_configuration>
<!--highlight-start -->
    <backups>
        <allowed_disk>backups</allowed_disk>
        <allowed_path>/backups/</allowed_path>
    </backups>
<!--highlight-end -->
</clickhouse>

Configure a backup destination for S3 disk

It is also possible to BACKUP/RESTORE to S3 by configuring an S3 disk in the ClickHouse storage configuration. Configure the disk like this by adding a file to /etc/clickhouse-server/config.d as was done above for the local disk.

<clickhouse>
    <storage_configuration>
        <disks>
            <s3_plain>
                <type>s3_plain</type>
                <endpoint></endpoint>
                <access_key_id></access_key_id>
                <secret_access_key></secret_access_key>
            </s3_plain>
        </disks>
        <policies>
            <s3>
                <volumes>
                    <main>
                        <disk>s3_plain</disk>
                    </main>
                </volumes>
            </s3>
        </policies>
    </storage_configuration>

    <backups>
        <allowed_disk>s3_plain</allowed_disk>
    </backups>
</clickhouse>

BACKUP/RESTORE for S3 disk is done in the same way as for local disk:

BACKUP TABLE data TO Disk('s3_plain', 'cloud_backup');
RESTORE TABLE data AS data_restored FROM Disk('s3_plain', 'cloud_backup');

This disk shouldn’t be used for MergeTree itself, only for BACKUP/RESTORE
If your tables are backed by S3 storage and the types of the disks are different, it doesn’t use CopyObject calls to copy parts to the destination bucket, instead, it downloads and uploads them, which is very inefficient. In this case prefer using the BACKUP ... TO S3(<endpoint>) syntax for this use-case.

Usage examples of backup/restore to local disk

Backup and restore a table

Run the following commands below to create the test database and table we will be making a backup and restoration of in this example:

Setup commands

Create the database and table:

CREATE DATABASE test_db;

CREATE TABLE test_db.test_table (
    id UUID,
    name String,
    email String,
    age UInt8,
    salary UInt32,
    created_at DateTime,
    is_active UInt8,
    department String,
    score Float32,
    country String
) ENGINE = MergeTree()
ORDER BY id;

Preprocess and one thousand rows of random data:

INSERT INTO test_table (id, name, email, age, salary, created_at, is_active, department, score, country)
SELECT
    generateUUIDv4() as id,
    concat('User_', toString(rand() % 10000)) as name,
    concat('user', toString(rand() % 10000), '@example.com') as email,
    18 + (rand() % 65) as age,
    30000 + (rand() % 100000) as salary,
    now() - toIntervalSecond(rand() % 31536000) as created_at,
    rand() % 2 as is_active,
    arrayElement(['Engineering', 'Marketing', 'Sales', 'HR', 'Finance', 'Operations'], (rand() % 6) + 1) as department,
    rand() / 4294967295.0 * 100 as score,
    arrayElement(['USA', 'UK', 'Germany', 'France', 'Canada', 'Australia', 'Japan', 'Brazil'], (rand() % 8) + 1) as country
FROM numbers(1000);

Next you will need to create a file specifying the backup destination at the path below:

/etc/clickhouse-server/config.d/backup_disk.xml

<clickhouse>
    <storage_configuration>
        <disks>
            <backups>
                <type>local</type>
                <path>/backups/</path> -- for MacOS choose: /Users/backups/
            </backups>
        </disks>
    </storage_configuration>
    <backups>
        <allowed_disk>backups</allowed_disk>
        <allowed_path>/backups/</allowed_path> -- for MacOS choose: /Users/backups/
    </backups>
</clickhouse>

If clickhouse-server is running you will need to restart it for the changes to take effect.

To backup the table you can run:

Query

BACKUP TABLE test_db.test_table TO Disk('backups', '1.zip')

Response

   ┌─id───────────────────────────────────┬─status─────────┐
1. │ 065a8baf-9db7-4393-9c3f-ba04d1e76bcd │ BACKUP_CREATED │
   └──────────────────────────────────────┴────────────────┘

The table can be restored from the backup using the following command if the table is empty:

Query

RESTORE TABLE test_db.test_table FROM Disk('backups', '1.zip')

Response

   ┌─id───────────────────────────────────┬─status───┐
1. │ f29c753f-a7f2-4118-898e-0e4600cd2797 │ RESTORED │
   └──────────────────────────────────────┴──────────┘

The above RESTORE would fail if the table test.table contains data. The setting allow_non_empty_tables=true allows RESTORE TABLE to insert data into non-empty tables. This will mix earlier data in the table with the data extracted from the backup. This setting can therefore cause data duplication in the table, and should be used with caution.

To restore the table with data already in it, run:

RESTORE TABLE test_db.test_table FROM Disk('backups', '1.zip')
SETTINGS allow_non_empty_tables=true

Tables can be restored, or backed up, with new names:

RESTORE TABLE test_db.test_table AS test_db.test_table_renamed FROM Disk('backups', '1.zip')

The backup archive for this backup has the following structure:

├── .backup
└── metadata
    └── test_db
        └── test_table.sql

Formats other than zip can be used. See “Backups as tar archives” below for further details.

Incremental backups to disk

A base backup in ClickHouse is the initial, full backup from which the following incremental backups are created. Incremental backups only store the changes made since the base backup, so the base backup must be kept available to restore from any incremental backup. The base backup destination can be set with setting base_backup.

Incremental backups depend on the base backup. The base backup must be kept available to be able to restore from an incremental backup.

To make an incremental backup of a table, first make a base backup:

BACKUP TABLE test_db.test_table TO Disk('backups', 'd.zip')

BACKUP TABLE test_db.test_table TO Disk('backups', 'incremental-a.zip')
SETTINGS base_backup = Disk('backups', 'd.zip')

All data from the incremental backup and the base backup can be restored into a new table test_db.test_table2 with command:

RESTORE TABLE test_db.test_table AS test_db.test_table2
FROM Disk('backups', 'incremental-a.zip');

Securing a backup

Backups written to disk can have a password applied to the file. The password can be specified using the password setting.

Password protection is only supported for ZIP archives (.zip, .zipx). The backup path must end with .zip or .zipx for the password to be accepted. Using a password with any other format - including tar archives and non-archive paths - will result in a BAD_ARGUMENTS error: Password is not applicable, backup cannot be encrypted.

BACKUP TABLE test_db.test_table
TO Disk('backups', 'password-protected.zip')
SETTINGS password='qwerty'

To restore a password-protected backup, the password must again be specified using the password setting:

RESTORE TABLE test_db.test_table
FROM Disk('backups', 'password-protected.zip')
SETTINGS password='qwerty'

Backups as tar archives

Backups can be stored not only as zip archives, but also as tar archives. The functionality is the same as for zip, except that password protection isn’t supported for tar archives. Additionally, tar archives support a variety of compression methods. To make a backup of a table as a tar:

BACKUP TABLE test_db.test_table TO Disk('backups', '1.tar')

to restore from a tar archive:

RESTORE TABLE test_db.test_table FROM Disk('backups', '1.tar')

To change the compression method, the correct file suffix should be appended to the backup name. For example, to compress the tar archive using gzip run:

BACKUP TABLE test_db.test_table TO Disk('backups', '1.tar.gz')

The supported compression file suffixes are:

tar.gz
.tgz
tar.bz2
tar.lzma
.tar.zst
.tzst
.tar.xz

Compression settings

The compression method and level of compression can be specified using setting compression_method and compression_level respectively.

BACKUP TABLE test_db.test_table
TO Disk('backups', 'filename.zip')
SETTINGS compression_method='lzma', compression_level=3

Restore specific partitions

If specific partitions associated with a table need to be restored, these can be specified. Let’s create a simple partitioned table into four parts, insert some data into it and then take a backup of only the first and fourth partitions:

Setup

CREATE IF NOT EXISTS test_db;
       
-- Create a partitioend table
CREATE TABLE test_db.partitioned (
    id UInt32,
    data String,
    partition_key UInt8
) ENGINE = MergeTree()
PARTITION BY partition_key
ORDER BY id;

INSERT INTO test_db.partitioned VALUES
(1, 'data1', 1),
(2, 'data2', 2),
(3, 'data3', 3),
(4, 'data4', 4);

SELECT count() FROM test_db.partitioned;

SELECT partition_key, count() 
FROM test_db.partitioned
GROUP BY partition_key
ORDER BY partition_key;

   ┌─count()─┐
1. │       4 │
   └─────────┘
   ┌─partition_key─┬─count()─┐
1. │             1 │       1 │
2. │             2 │       1 │
3. │             3 │       1 │
4. │             4 │       1 │
   └───────────────┴─────────┘

Run the following command to back up partitions 1 and 4:

BACKUP TABLE test_db.partitioned PARTITIONS '1', '4'
TO Disk('backups', 'partitioned.zip')

Run the following command to restore partitions 1 and 4:

RESTORE TABLE test_db.partitioned PARTITIONS '1', '4'
FROM Disk('backups', 'partitioned.zip')
SETTINGS allow_non_empty_tables=true

Documentation Index

​BACKUP / RESTORE to disk

​Syntax

​Configure backup destinations for disk

​Configure a backup destination for local disk

​Configure a backup destination for S3 disk

​Usage examples of backup/restore to local disk

​Backup and restore a table

​Incremental backups to disk

​Securing a backup

​Backups as tar archives

​Compression settings

​Restore specific partitions

BACKUP / RESTORE to disk

Syntax

Configure backup destinations for disk

Configure a backup destination for local disk

Configure a backup destination for S3 disk

Usage examples of backup/restore to local disk

Backup and restore a table

Incremental backups to disk

Securing a backup

Backups as tar archives

Compression settings

Restore specific partitions