Full Data & Incremental
BladePipe allows you to create a Full Data& Incremental DataJob in just a few minutes. This pipeline handles schema migration, initial full data migration, and continuous incremental synchronization. Once you start the DataJob, BladePipe orchestrates these DataTasks sequentially and automatically.
Step 1: Select DataSources
- Log in to BladePipe.
- In the top navigation bar, click DataJob.
- Click Create DataJob.
- Select a Cluster to execute the DataJob.
If the cluster contains multiple Workers, BladePipe will automatically schedule tasks to ensure DataJob dual-level disaster recovery. If it has only one Worker, BladePipe will maintain single-level disaster recovery.
- Select the source and target data sources, then click Test Connection to ensure connectivity.
- Select the specific databases or schemas for both the source and target. You can select multiple schemas simultaneously.
- Click Next.
Step 2: Configure DataJob
- For the DataJob Type, select Incremental, and make sure that the Initial Load option is checked.
- Select the Specification size.
Larger specifications provide better performance and stability. Balance your specification sizes against your available Worker memory and the aggregate number of running DataJobs.
- Configure the DataJob settings:
| Function | Description |
|---|---|
| Sync DDL |
|
| Verification |
|
In Advanced settings:
| Function | Description |
|---|---|
| Migrate partition | If enabled, BladePipe will migrate the source partitions to the target. |
| Clear Target Data Before Full Data | If enabled, the target data will be cleared up before full data initialization. |
| Rebuild Target Schema | If enabled, BladePipe will automatically rebuild the target schema in the target database. |
| Start Automatically | If enabled, the DataJob starts automatically upon creation. |
| Use param template | If enabled, you can select a parameter template to use for this DataJob. |
- Click Next.
Step 3: Select Tables
- Select the tables you want to synchronize.
- Exact match: Enter exact table names separated by semicolons without spaces.
- Fuzzy match: Enter partial characters to instantly find matching table names.
You can bulk-select all tables on the current page or across all pages using the master checkboxes.
To intuitively auto-select all tables for future DataJobs, navigate to Settings > Preference > BladePipe and enable jobTableDefaultSelectAll.
- Configure the target table names.
- By default, target names generate automatically.
- You can manually type a custom name into the Target Table column and press Enter.
- You can click Batch Operation > Modify Target Name to broadly apply prefixes or suffixes.
- To filter actions, click Open Action Blacklist.
You can set the actions not to be synchronized for each table separately or in batches (Click Batch Operation > Action Blacklist). - Optional: Click Mapping Rules to adjust the automatic target-naming conventions.
- Optional: Click Advanced > View Duplicate Subscriptions to check whether the selected tablea are subscribed in other existing DataJobs.
- Click Next.
Step 4: Map and Process Columns
- Optional: If the target mapping rules need to be modified, click Mapping Rules and make the changes accordingly.
- View your selected tables on the left pane. Use the search bar to rapidly locate specific tables.
- Select the columns to be synchronized and apply transformations.
- Configure Individually: Click Operation to set explicit filter conditions, designate a custom primary key logic, etc.
- Configure in Batches: Click Batch Operation to apply update conditions, primary keys across multiple tables, etc.
| Feature | Description |
|---|---|
| Set Virtual Column | Define virtual columns (name, type, value) injected directly into the target table during synchronization. See Add Virtual Columns |
| Data Transform | Transform data based on built-in scripts. See Data Transform. |
| Target Primary Key | Assign a primary key for the target. If the source lacks a primary key but holds a unique key, BladePipe assigns the unique key by default. See Set Target Primary Key |
| Data Filtering | Use where-like queries to sync only specific rows. See Data Filtering. |
| Set Update Condition | Configure precise rules indicating when the target rows should be overwritten. |
| Wide Table | Create a wide table in a UI-driven manner. See Wide Table. |
| Batch Filter Columns | Exclude or include columns systematically across multiple tables. |
- Optional: To inject complex Java logic, click Upload Custom Code. See Custom Code.
- Click Next.
Step 5: Confirm Creation
- Verify the full configuration summary.
- Click Create DataJob to deploy your pipeline.
If you configured schema structures that do not exist in the target database, BladePipe automatically initiates the schema migration phase to build them before transferring data.
Step 6: Monitor DataJob
- Navigate to the main DataJob list to track your pipeline's progress in real-time.
- Click Details to dive deep into monitoring metrics, specific logs, and individual DataTask states.