跳到主要内容

Move Data from MySQL to ClickHouse in Minutes

· 阅读需 4 分钟
John Li
John Li

Overview

This article explains how to move data from relational databases to ClickHouse with BladePipe. By default, it uses ReplacingMergeTree as the ClickHouse table engine. The key features of the connection include:

  • Add _version and _sign fields to ensure accurate merging in ClickHouse.
  • All DML statements are written as INSERT statements, ensuring good synchronization performance.
  • Support for DDL synchronization.

Highlights

Schema Migration

When performing schema migration with ClickHouse as the target database, the default table engine selected is ReplacingMergeTree. If replication is involved, ReplicatedReplacingMergeTree is automatically chosen.

The sort key for ClickHouse tables defaults to the primary key fields of the source table. If the source table has no primary key, tuple() is used as sort key.

Additional fields _version and _sign are added as merge fields. During synchronization, BladePipe automatically fill in these fields based on the DML statements to ensure data consistency between the source and target.

# e.g.,
CREATE TABLE console.worker_stats
(
`id` Int64,
`gmt_create` DateTime,
`worker_id` Int64,
`cpu_stat` String,
`mem_stat` String,
`disk_stat` String,
`_sign` UInt8 DEFAULT 0,
`_version` UInt64 DEFAULT 0,
INDEX `_version_minmax_idx` `_version` TYPE minmax GRANULARITY 1
)
ENGINE = ReplacingMergeTree(`_version`,`_sign`)
ORDER BY id
SETTINGS index_granularity = 8192

Data Writing

In both Full Data migration and Incremental data synchronization, all DML statements are converted into INSERTs, which are written in standard batches.

  • The _version field values increment according to the order of data changes.
  • The _sign field values are set to 0 for Insert and Update statements, and 1 for Delete statements.

The two additional fields comply with the ClickHouse ReplacingMergeTree definition.

Procedure

Step 1: Install BladePipe

Follow the instructions in Install Worker (Docker) or Install Worker (Binary) to download and install a BladePipe Worker.

Step 2: Add DataSources

  1. Log in to the BladePipe Cloud.
  2. Click DataSource > Add DataSource, and add 2 DataSources.

Step 3: Create a DataJob

  1. Click DataJob > Create DataJob.

  2. Select the source and target DataSources, and click Test Connection to ensure the connection to the source and target DataSources are both successful.

  3. In the Advanced configuration of the target DataSource, choose the table engine as ReplacingMergeTree (or ReplicatedReplacingMergeTree).

  4. Select Incremental for DataJob Type, together with the Full Data option.

    信息

    In the Specification settings, make sure that you select a specification of at least 1 GB.

    Allocating too little memory may result in Out of Memory (OOM) errors during DataJob execution.

  5. Select the tables to be replicated.

  6. Select the columns to be replicated.

  7. Confirm the DataJob creation.

  8. Wait for the DataJob to automatically run.

    信息

    Once the DataJob is created and started, BladePipe will automatically run the following DataTasks:

    • Schema Migration: The schemas of the source tables will be migrated to ClickHouse.
    • Full Data Migration: All existing data of the source tables will be fully migrated to ClickHouse.
    • Incremental Synchronization: Ongoing data changes will be continuously synchronized to the target database.

Step 4: Verify the Data

  1. Stop data write in the source database and wait for ClickHouse to merge data.

    信息

    Due to the unpredictable timing of ClickHouse's automatic merging, you can manually trigger a merging by running the OPTIMIZE TABLE xxx FINAL; command. Note that there is a chance that this manual merging may not always succeed.

    Alternatively, you can run the CREATE VIEW xxx_v AS SELECT * FROM xxx FINAL; command to create a view and perform queries on the view to ensure the data is fully merged.

  2. Create a Verification DataJob. Once the Verification DataJob is completed, review the results to confirm that the data in ClickHouse are the same as the data in MySQL.

Latest blog posts

Back to blogarrow-right
10 Best Data Integration Tools in 2025
Data insights

10 Best Data Integration Tools in 2025

Discover the top 10 data integration tools in 2025.

Barry
Barry
Nov 20, 2025
Be & Cheery Drives Retail Analytics in Real Time with BladePipe
User stories

Be & Cheery Drives Retail Analytics in Real Time with BladePipe

Discover how Be & Cheery builds a unified data-integration platform using BladePipe to support data-driven decision-making.

Zoe
Zoe
Nov 17, 2025
Choosing Your Data Lake Format in 2025:Iceberg vs Delta Lake vs Paimon
Data insights

Choosing Your Data Lake Format in 2025:Iceberg vs Delta Lake vs Paimon

A deep dive into how these open lake formats differ, and how to build a real-time data lake that actually works.

Barry
Barry
Oct 22, 2025