TiDB to Elasticsearch
BladePipe supports data replication from TiDB to Elasticsearch. View supported migration, sync, verification, and connector capabilities.
| Function | Description |
|---|---|
Schema Migration | If the target index does not exist, BladePipe will create the index mapping in the target based on the source metadata and mapping rules. |
Full Data Migration | Migrate data by sequentially scanning data in tables and writing it in batches to the target database. |
Incremental Data Sync | Sync of common DML like INSERT, UPDATE, DELETE is supported. |
Data Verification and Correction | Verify all existing data. Optionally, you can correct the inconsistent data based on verification results. Scheduled DataTasks are supported. |
Subscription Modification | Add, delete, or modify the subscribed tables with support for historical data migration. For more information, see Modify Subscription. |
Position Resetting | Reset positions by timestamp to consume again the incremental data that has not been collected as garbage by TiKV in a past period. |
Index Name Mapping | Support the mapping rules, namely, concatenation with underscores (dataJobName_DB_SCHEMA_table), keeping the name the same as that in Source, converting the text to lowercase, converting the text to uppercase, truncating the name by "_digit" suffix. |
DDL Sync |
|
Metadata Retrieval | Retrieve the target metadata with filtering conditions or target primary keys set from the source table. |
Advanced Functions
| Function | Description |
|---|---|
Removal of Target Data before Full Data Migration | Remove the existing data in the Target before running the Full Data Migration, applicable for DataJobs reruning and scheduled Full Data migrations. |
Recreating Target Table | Recreate target tables before running the Full Data Migration, applicable for DataJobs reruning and scheduled Full Data migrations. |
Format of Time Written to ES | Time is written to Elasticsearch in the format of the first time record of the field, or yyyy-MM-dd'T'HH:mm:ss if no time format is set. |
Setting ES Time Zone | The time zone setting on the page will be written to Elasticsearch only when the time zone format is ZZZZZ. |
Optional Fields in Indexing | By default, all fields are indexed. Specific fields can be excluded from indexing. |
Field-level Analyzers | Allow selecting analyzers for string fields that are indexed. Support STANDARD (default), SIMPLE, and other common analyzers, with the option to specify custom analyzers. |
Setting Index _id Field | By default, the _id field is a concatenation of the source primary key values. It can be changed to other field values. |
Scheduled Full Data Migration | For more information, see Create Scheduled Full Data DataJob. |
Custom Code | For more information, see Custom Code Processing, Debug Custom Code and Logging in Custom Code. |
Data Filtering Conditions | Support data filtering using WHERE conditions, with SQL-92 as the SQL language. For more information, see Data Filtering. |
Prerequisites
| Prerequisite | Description |
|---|---|
Permissions for Account | |
Connection to PD Nodes | Make sure that BladePipe Workers can communicate with PD nodes.
|
TiKV GC Frequency | Set GC cycle to 24 hours or more in TiDB Server.
|
TiKV Historical Data Caching | Adjust the size based on task needs.
|
Parameters
| Parameter | Description |
|---|---|
printDetailLog | Print received incremental data. It is used for determining if the source database has incremental data. |
pdHost | PD node address for DataJob requests. Format: [PD_IP]:[PD_PORT], multiple PD nodes separated by , |
cdcGrpcTimeout | Timeout for gRPC channel of PD nodes to DataJob, in ms. |
cdcStubTimeout | Timeout for each stub in gRPC channel, in ms. Auto-resubscribe the stub in case of time out. |
fastFailKeywords | A comma-separated array of strings. When an exception message contains any of these keywords, the task will skip reconnection attempts and restart directly. For example, DEADLINE_EXCEEDED means the task will restart directly instead of reconnecting when a gRPC timeout exception occurs. |
Tips: To modify the general parameters, see General Parameters and Functions.
Prerequisites
| Prerequisite | Description |
|---|---|
Permissions for Account | create, delete, create_index, delete_index, read, write permissions for indexes. |
Port Preparation | Allow the migration and sync node (Worker) to connect to the Elasticsearch port. |
Parameters
| Parameter | Description |
|---|---|
maxBulkSizeMb | Maximum batch size per table; If the batch size exceeds this limit, then data is flushed to the write queue. |
totalDataInMemMb | Maximum data size allowed in memory when writing in batches; If the data size exceeds the memory limit, or the wait time exceeds asyncFlushIntervalSec, then data is flushed to the write queue. |
asyncFlushIntervalSec | Interval to wait for flushing when writing in batches; If the wait time exceeds asyncFlushIntervalSec, or the data size exceeds totalDataInMemMb, then data is flushed to the write queue. |
realFlushPauseSec | Wait time to flush data to ElasticSearch using Bulk Write. 0 means no wait is needed. |
pkSeparator | Separator for concatenating _id (number of fields > 1). |
enableBulkSizeThreshold | Enable batch write mode (enabled by default). |
Tips: To modify the general parameters, see General Parameters and Functions.