SQL Server to Iceberg
BladePipe supports data replication from SQL Server to Iceberg. View supported migration, sync, verification, and connector capabilities.
| Function | Description |
|---|---|
Schema Migration | If the target schema does not exist, BladePipe will automatically generate and execute CREATE statements based on the source metadata and the mapping rule. |
Full Data Migration | Migrate data by sequentially scanning data in tables and writing it in batches to the target database. |
Incremental Data Sync | Sync of common DML like INSERT, UPDATE, DELETE is supported. |
Subscription Modification | Add, delete, or modify the subscribed tables with support for historical data migration. For more information, see Modify Subscription. |
Table Name Mapping | Support the mapping rules, namely, keeping the name the same as that in Source, converting the text to lowercase, converting the text to uppercase, truncating the name by "_digit" suffix. |
DDL Sync | Supports ALTER TABLE ADD COLUMN and DROP COLUMN. |
Advanced Functions
| Function | Description |
|---|---|
Write Conflict Resolution Rule | For the source tables with primary keys, data is overwritten to the Target. For the source tables without primary keys, data is appended to the Target. |
Custom Table Properties | Configure format-version and other table properties. |
Setting Data Partitions | When creating a DataJob, specify partition definitions at the table level (static or dynamic). Automatically add these partition definitions during schema migration. |
Custom Code | For more information, see Custom Code Processing, Debug Custom Code and Logging in Custom Code. |
Setting Target Primary Key | Change the primary key to another field to facilitate data aggregation and other operations. |
Data Filtering Conditions | Support data filtering using WHERE conditions, with SQL-92 as the SQL language. For more information, see Data Filtering. |
Prerequisites
| Prerequisite | Description |
|---|---|
Permissions for Account | |
Enable SQL Server CDC | Run: exec [your_database].sys.sp_cdc_enable_db |
Parameters
| Parameter | Description |
|---|---|
maxTxsPerIteration | Maximum number of transactions to scan each time in incremental data sync. |
scanParallel | Full data stage: The number of tables to be scanned in parallel. |
eventStoreSize | Size of the cache for parsed incremental events. |
Tips: To modify the general parameters, see General Parameters and Functions.
Prerequisites
| Prerequisite | Description |
|---|---|
Port Preparation | Allow the migration and sync node (Worker) to connect to the catalogs and FileIO. |
Nessie Catalog Configuration Template |
|
Glue Data Catalog Configuration Template |
|
REST Catalog Configuration Template |
|
Parameters
| Parameter | Description |
|---|---|
fileFormat | Format of the file to write data to (parquet/orc/...) |
writeTargetFileSizeMb | Size of the target file to write data to (MB). |
writeProps | Data Write Parameters to configure (Json format). |
commitBranch | Branch to commit. |
totalDataInMemMb | Maximum data size allowed in memory when writing in batches; If the data size exceeds the memory limit, or the wait time exceeds asyncFlushIntervalSec, then data is flushed to the write queue. |
asyncFlushIntervalSec | Interval to wait for flushing when writing in batches; If the wait time exceeds asyncFlushIntervalSec, or the data size exceeds totalDataInMemMb, then data is flushed to the write queue. |
flushBatchMb | Maximum batch size per table; If the batch size exceeds this limit, then data is flushed to the write queue. |
realFlushPauseSec | Wait time to flush data to Iceberg using Bulk Write. 0 means no wait is needed. |
Tips: To modify the general parameters, see General Parameters and Functions.