BladePipe 1.5.0 brings Reverse ETL and powerful new features.
Skip to main content

Full Data & Incremental

BladePipe allows you to create a Full Data& Incremental DataJob in just a few minutes. This pipeline handles schema migration, initial full data migration, and continuous incremental synchronization. Once you start the DataJob, BladePipe orchestrates these DataTasks sequentially and automatically.

Step 1: Select DataSources

  1. Log in to BladePipe.
  2. In the top navigation bar, click DataJob.
  3. Click Create DataJob.
  4. Select a Cluster to execute the DataJob.
info

If the cluster contains multiple Workers, BladePipe will automatically schedule tasks to ensure DataJob dual-level disaster recovery. If it has only one Worker, BladePipe will maintain single-level disaster recovery.

  1. Select the source and target data sources, then click Test Connection to ensure connectivity.
  2. Select the specific databases or schemas for both the source and target. You can select multiple schemas simultaneously.
  3. Click Next.

Step 2: Configure DataJob

  1. For the DataJob Type, select Incremental, and make sure that the Initial Load option is checked.
  2. Select the Specification size.
info

Larger specifications provide better performance and stability. Balance your specification sizes against your available Worker memory and the aggregate number of running DataJobs.

  1. Configure the DataJob settings:
FunctionDescription
Sync DDL
  • Yes: Synchronize DDL.
  • No: Do not synchronize DDL.
Verification
  • No: BladePipe doesn't verify data.
  • One-time: BladePipe verifies data once.
  • Scheduled: BladePipe verifies data regularly according to the set cycle.

In Advanced settings:

FunctionDescription
Migrate partitionIf enabled, BladePipe will migrate the source partitions to the target.
Clear Target Data Before Full DataIf enabled, the target data will be cleared up before full data initialization.
Rebuild Target SchemaIf enabled, BladePipe will automatically rebuild the target schema in the target database.
Start AutomaticallyIf enabled, the DataJob starts automatically upon creation.
Use param templateIf enabled, you can select a parameter template to use for this DataJob.
  1. Click Next.

Step 3: Select Tables

  1. Select the tables you want to synchronize.
    • Exact match: Enter exact table names separated by semicolons without spaces.
    • Fuzzy match: Enter partial characters to instantly find matching table names.
tip

You can bulk-select all tables on the current page or across all pages using the master checkboxes.
To intuitively auto-select all tables for future DataJobs, navigate to Settings > Preference > BladePipe and enable jobTableDefaultSelectAll.

  1. Configure the target table names.
    • By default, target names generate automatically.
    • You can manually type a custom name into the Target Table column and press Enter.
    • You can click Batch Operation > Modify Target Name to broadly apply prefixes or suffixes.
  2. To filter actions, click Open Action Blacklist.
    You can set the actions not to be synchronized for each table separately or in batches (Click Batch Operation > Action Blacklist).
  3. Optional: Click Mapping Rules to adjust the automatic target-naming conventions.
  4. Optional: Click Advanced > View Duplicate Subscriptions to check whether the selected tablea are subscribed in other existing DataJobs.
  5. Click Next.

Step 4: Map and Process Columns

  1. Optional: If the target mapping rules need to be modified, click Mapping Rules and make the changes accordingly.
  2. View your selected tables on the left pane. Use the search bar to rapidly locate specific tables.
  3. Select the columns to be synchronized and apply transformations.
    • Configure Individually: Click Operation to set explicit filter conditions, designate a custom primary key logic, etc.
    • Configure in Batches: Click Batch Operation to apply update conditions, primary keys across multiple tables, etc.
FeatureDescription
Set Virtual ColumnDefine virtual columns (name, type, value) injected directly into the target table during synchronization. See Add Virtual Columns
Data TransformTransform data based on built-in scripts. See Data Transform.
Target Primary KeyAssign a primary key for the target. If the source lacks a primary key but holds a unique key, BladePipe assigns the unique key by default. See Set Target Primary Key
Data FilteringUse where-like queries to sync only specific rows. See Data Filtering.
Set Update ConditionConfigure precise rules indicating when the target rows should be overwritten.
Wide TableCreate a wide table in a UI-driven manner. See Wide Table.
Batch Filter ColumnsExclude or include columns systematically across multiple tables.
  1. Optional: To inject complex Java logic, click Upload Custom Code. See Custom Code.
  2. Click Next.

Step 5: Confirm Creation

  1. Verify the full configuration summary.
  2. Click Create DataJob to deploy your pipeline.
info

If you configured schema structures that do not exist in the target database, BladePipe automatically initiates the schema migration phase to build them before transferring data.

Step 6: Monitor DataJob

  1. Navigate to the main DataJob list to track your pipeline's progress in real-time.
  2. Click Details to dive deep into monitoring metrics, specific logs, and individual DataTask states.