BladePipe 1.7.0: Stronger alerts, Broader DB support, Faster KingbaseES scanning.
Skip to main content

MongoDB to RagApi

BladePipe supports data replication from MongoDB to RagApi. View supported migration, sync, verification, and connector capabilities.

Target DataSource:

Connection

Basic Functions

FunctionDescription
Full Data Migration

Migrate data by sequentially scanning data in tables and writing it in batches to the target database. Supported _id types: ObjectId, Long, Integer.

Incremental Data Sync

Sync of INSERT, UPDATE, DELETE is supported.

Subscription Modification

Add, delete, or modify the subscribed tables with support for historical data migration. For more information, see Modify Subscription.

Position Resetting

Reset positions by timestamp to consume the oplog in a past period again.

Supported Deployment

Support master-slave, replica set, sharded cluster.

Advanced Functions

FunctionDescription
Knowledge Selection(KNOWLEDGE_SELECT)

According to the user query, the most relevant knowledge is automatically filtered from the retrieval results to improve the accuracy of the generated answers.

Query Compression(QUERY_COMPRESS)

Semantically compress the original query to remove redundancy and keep the core content, optimizing the retrieval performance.

Query Extension(QUERY_EXTEND)

Automatically expand user query and introduce potentially relevant information or synonymous expressions to expand semantic coverage.

MCP Tool Call

Call tool chains (such as GitHub queries, Shell commands, etc.) configured on the MCP platform to automatically call external systems to complete tasks or offer more information in Q&A.

Custom Code

For more information, see Custom Code Processing, Debug Custom Code and Logging in Custom Code.

Adding Virtual Columns

Support adding custom virtual columns with fixed values, such as region, ID, etc.

Limits

LimitDescription
Oplog Size and Retention Settings

By default, the value of replication.oplogSizeMB or storage.oplogMinRetentionHours in MongoDB is too small. If data synchronization latency is significant, unconsumed oplogs may be removed. In this case, it is necessary to increase these parameters.

Parameter Configuration for MongoDB Master-Slave Architecture

For MongoDB master-slave architecture, set the Source parameter oplogCollection to oplog.$main.

ChangeStream Mode

MongoDB 3.6 and above support changeStream for capturing incremental data changes. Set the Source parameter captureMode to CHANGE_STREAM. For sharded clusters, use the MongoDB connection string for synchronization.

Oplog Mode

When using oplog mode for data synchronization from a MongoDB instance, ensure the access to the local database.

Network Connectivity

Ensure that the migration and sync node (Worker) can connect to the knowledge bases and LLMs.


Source

Prerequisites

PrerequisiteDescription
Permissions for Account

See Permissions Required for MongoDB.

Parameters

ParameterDescription
captureMode

Configure the MongoDB incremental data sync mode, supporting OP_LOG and CHANGE_STREAM modes.

changeStreamBatchSize

Set the maximum number of change events per batch for MongoDB Change Stream.

oplogCollection

Specify the collection name for MongoDB oplog. The default name is oplog.rs.

timezone

Source time zone (the default time zone is UTC).

Tips: To modify the general parameters, see General Parameters and Functions.


Target

Prerequisites

PrerequisiteDescription
Port Preparation

Allow the migration and sync node (Worker) to connect to the LLM and the vector database.

Parameters

ParameterDescription
uriPrefix

The URI prefix of RagApi chat service to receive queries and send back the response generated by the model. Default value: /v1/chat/completions

contentUriPrefix

The URI prefix of RagApi information retrieval service to retrive relevant vectors. Default value: /v1/content/retrieve

retrieveMaxResults

Configure the maximum number of results returned by the retriever to limit the number of most relevant entries in the vector search

retrieveMinScore

Configure the minimum score threshold for the retriever to return results. The higher the score is, the more relevant the result is. Only content with a score above this threshold will be considered.

contentPrompt

Configure the prompt template, defining how to combine the user question with the retrieved content into a complete prompt. You can use {{context}} and {{query}} as variable placeholders.

enabledPromptFunctions

Prompt functions that can be enabled (seperated by comma).
Optional functions:

  • KNOWLEDGE_SELECT(Select the most relevant content automatically. Supports automatic routing of multiple knowledge bases.)
  • QUERY_COMPRESS(Compress queries)
  • QUERY_EXTEND(Extend queries)

Example: KNOWLEDGE_SELECT,QUERY_COMPRESS

compressContentPrompt

Prompt for query compression, telling the system how to simplify the input text. You can use {{chatMemory}} and {{query}} as variable placeholders. This will only work when QUERY_COMPRESS is enabled in enabledPromptFunctions.

extendContentPrompt

Prompt for query extention, telling the system how to enrich the original query by adding relevant context. You can use {{query}} as a variable placeholder, which only works when QUERY_EXTEND is enabled in enabledPromptFunctions.

extendContentCount

Set the number of entries generated when content is extended. This only works when QUERY_EXTEND is enabled in enabledPromptFunctions.

mcpServers

JSON for configuring available MCP servers, defining how to call the tools (e.g. command line or HTTP).
eg: { "mcpServers": { "github": { "command": "docker", "args": [ "run", "-i", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "mcp/github" ], "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "<YOUR_TOKEN>" } } } }

maxChatMemory

The maximum number of messages retained in the chat context, defining how many rounds of conversation can be seen during model reasoning.

toolMaxInvokeCount

The maximum number of tool calls allowed in a single session, used to limit the call depth and prevent tool execution from falling into an infinite loop.

Tips: To modify the general parameters, see General Parameters and Functions.