Aurora for MySQL to Hive
BladePipe supports data replication from Aurora for MySQL to Hive. View supported migration, sync, verification, and connector capabilities.
| Function | Description |
|---|---|
Schema Migration | If the target schema does not exist, BladePipe will automatically generate and execute CREATE statements based on the source metadata and the mapping rule. |
Full Data Migration | Migrate data by sequentially scanning data in tables and writing it in batches to the target database. |
Incremental Data Sync | Sync of common DML like INSERT, UPDATE, DELETE is supported (for tables with primary keys). |
Position Resetting | Reset positions by file position or timestamp. Allow re-consumption of incremental data logs in a past period or since a specific Binlog file and position. |
Table Name Mapping | Support the mapping rules, namely, keeping the name the same as that in Source, converting the text to lowercase, converting the text to uppercase, truncating the name by "_digit" suffix. |
Metadata Retrieval | Retrieve the target metadata with filtering conditions or target primary keys set from the source table. |
Advanced Functions
| Function | Description |
|---|---|
Removal of Target Data before Full Data Migration | Remove the existing data in the Target before running the Full Data Migration, applicable for DataJobs reruning and scheduled Full Data migrations. |
Recreating Target Table | Recreate target tables before running the Full Data Migration, applicable for DataJobs reruning and scheduled Full Data migrations. |
Scheduled Full Data Migration | For more information, see Create Scheduled Full Data DataJob. |
Custom Code | For more information, see Custom Code Processing, Debug Custom Code and Logging in Custom Code. |
Setting Target Primary Key | Change the primary key to another field to facilitate data aggregation and other operations. |
Data Filtering Conditions | Support data filtering using WHERE conditions, with SQL-92 as the SQL language. For more information, see Data Filtering. |
Limits
| Limit | Description |
|---|---|
MySQL Storage Engine | Support InnoDB, MyISAM, AWS XEngine. Other storage engines have not been tested yet. |
MySQL Character Set | Support utf8, utf8mb4, latin1. Other encodings have not been tested yet. |
FAQ
What to do when access to schema in MySQL Source is denied?
Tip: MySQL source-related FAQ also applies to MySQL-based DataSources.
Prerequisites
| Prerequisite | Description |
|---|---|
Permissions for Account | |
Enabling Binlog | [mysqld] |
Parameters
| Parameter | Description |
|---|---|
parseBinlogParallel | Number of threads for parallel parsing of Binlog in Incremental DataJobs. |
parseBinlogBufferSize | Size of the circular buffer for parsing Binlog in Incremental DataJobs. |
maxTransactionSize | Maximum number of data rows per transaction. If exceeded, the transaction will be split and flushed in parts. |
limitThroughputMb | Limit the throughput of incremental Binlogs. |
extraDDL | Support synchronization of additional DDL, including PT, GHOST, ALI_DMS, and PT_GHOST. |
needJsonEscape | Escape special characters in JSON to be written to the target database. |
fullDataSqlConditionEnabled | Add filtering conditions in SQL during source data scanning. It only works in Full Data migration. |
srcTimeZone | Source time zone, e.g., +08:00, Asia/Shanghai, America/New_York, etc. |
Tips: To modify the general parameters, see General Parameters and Functions.
Prerequisites
| Prerequisite | Description |
|---|---|
Port Preparation | Allow the migration and sync node (Worker) to connect to the HDFS/Hive port. |
Parameters
| Parameter | Description |
|---|---|
asyncFlushIntervalSec | Interval to wait for flushing when writing in batches; If the wait time exceeds asyncFlushIntervalSec, or the data size exceeds totalDataInMemMb, then data is flushed to the write queue. |
totalDataInMemMb | Maximum data size allowed in memory when writing in batches; If the data size exceeds the memory limit, or the wait time exceeds asyncFlushIntervalSec, then data is flushed to the write queue. |
realFlushPauseSec | Wait time to flush data to HDFS. 0 means no wait is needed. |
hdfsBlockSize | HDFS block size in Hive. |
incrTempSchemaName | Temporary schema for incremental data in Hive. |
incrTempTableIntervalCharacter | Concatenation operator of temporary tables for incremental data in Hive. |
incrTempTableDistConnect | Connection specifiers of temporary tables for incremental data in Hive. There must be two specifiers with ; in between. |
incrMergePollingPauseSec | Polling interval of checking threads in merging temporary tables for incremental data (in seconds). |
incrMergeTimePauseMin | Interval of merging temporary tables for incremental data (in minutes). |
Tips: To modify the general parameters, see General Parameters and Functions.