Full Data & Incremental

BladePipe allows you to create a Full Data& Incremental DataJob in just a few minutes. This pipeline handles schema migration, initial full data migration, and continuous incremental synchronization. Once you start the DataJob, BladePipe orchestrates these DataTasks sequentially and automatically.

Step 1: Select DataSources

Log in to BladePipe.
In the top navigation bar, click DataJob.
Click Create DataJob.
Select a Cluster to execute the DataJob.

info

If the cluster contains multiple Workers, BladePipe will automatically schedule tasks to ensure DataJob dual-level disaster recovery. If it has only one Worker, BladePipe will maintain single-level disaster recovery.

Select the source and target data sources, then click Test Connection to ensure connectivity.
Select the specific databases or schemas for both the source and target. You can select multiple schemas simultaneously.
Click Next.

Step 2: Configure DataJob

For the DataJob Type, select Incremental, and make sure that the Initial Load option is checked.
Select the Specification size.

info

Larger specifications provide better performance and stability. Balance your specification sizes against your available Worker memory and the aggregate number of running DataJobs.

Configure the DataJob settings:

Function	Description
Sync DDL	Yes: Synchronize DDL. No: Do not synchronize DDL.
Verification	No: BladePipe doesn't verify data. One-time: BladePipe verifies data once. Scheduled: BladePipe verifies data regularly according to the set cycle.

In Advanced settings:

Function	Description
Migrate partition	If enabled, BladePipe will migrate the source partitions to the target.
Clear Target Data Before Full Data	If enabled, the target data will be cleared up before full data initialization.
Rebuild Target Schema	If enabled, BladePipe will automatically rebuild the target schema in the target database.
Start Automatically	If enabled, the DataJob starts automatically upon creation.
Use param template	If enabled, you can select a parameter template to use for this DataJob.

Click Next.

Step 3: Select Tables

Select the tables you want to synchronize.
- Exact match: Enter exact table names separated by semicolons without spaces.
- Fuzzy match: Enter partial characters to instantly find matching table names.

tip

You can bulk-select all tables on the current page or across all pages using the master checkboxes.
To intuitively auto-select all tables for future DataJobs, navigate to Settings > Preference > BladePipe and enable jobTableDefaultSelectAll.

Configure the target table names.
- By default, target names generate automatically.
- You can manually type a custom name into the Target Table column and press Enter.
- You can click Batch Operation > Modify Target Name to broadly apply prefixes or suffixes.
To filter actions, click Open Action Blacklist.
You can set the actions not to be synchronized for each table separately or in batches (Click Batch Operation > Action Blacklist).
Optional: Click Mapping Rules to adjust the automatic target-naming conventions.
Optional: Click Advanced > View Duplicate Subscriptions to check whether the selected tablea are subscribed in other existing DataJobs.
Click Next.

Step 4: Map and Process Columns

Optional: If the target mapping rules need to be modified, click Mapping Rules and make the changes accordingly.
View your selected tables on the left pane. Use the search bar to rapidly locate specific tables.
Select the columns to be synchronized and apply transformations.
- Configure Individually: Click Operation to set explicit filter conditions, designate a custom primary key logic, etc.
- Configure in Batches: Click Batch Operation to apply update conditions, primary keys across multiple tables, etc.

Feature	Description
Set Virtual Column	Define virtual columns (name, type, value) injected directly into the target table during synchronization. See Add Virtual Columns
Data Transform	Transform data based on built-in scripts. See Data Transform.
Target Primary Key	Assign a primary key for the target. If the source lacks a primary key but holds a unique key, BladePipe assigns the unique key by default. See Set Target Primary Key
Data Filtering	Use where-like queries to sync only specific rows. See Data Filtering.
Set Update Condition	Configure precise rules indicating when the target rows should be overwritten.
Wide Table	Create a wide table in a UI-driven manner. See Wide Table.
Batch Filter Columns	Exclude or include columns systematically across multiple tables.

Optional: To inject complex Java logic, click Upload Custom Code. See Custom Code.
Click Next.

Step 5: Confirm Creation

Verify the full configuration summary.
Click Create DataJob to deploy your pipeline.

info

If you configured schema structures that do not exist in the target database, BladePipe automatically initiates the schema migration phase to build them before transferring data.

Step 6: Monitor DataJob

Navigate to the main DataJob list to track your pipeline's progress in real-time.
Click Details to dive deep into monitoring metrics, specific logs, and individual DataTask states.

Step 1: Select DataSources​

Step 2: Configure DataJob​

Step 3: Select Tables​

Step 4: Map and Process Columns​

Step 5: Confirm Creation​

Step 6: Monitor DataJob​

Step 1: Select DataSources

Step 2: Configure DataJob

Step 3: Select Tables

Step 4: Map and Process Columns

Step 5: Confirm Creation

Step 6: Monitor DataJob