MongoDB to RagApi
BladePipe supports data replication from MongoDB to RagApi. View supported migration, sync, verification, and connector capabilities.
| Function | Description |
|---|---|
Full Data Migration | Migrate data by sequentially scanning data in tables and writing it in batches to the target database. Supported _id types: ObjectId, Long, Integer. |
Incremental Data Sync | Sync of INSERT, UPDATE, DELETE is supported. |
Subscription Modification | Add, delete, or modify the subscribed tables with support for historical data migration. For more information, see Modify Subscription. |
Position Resetting | Reset positions by timestamp to consume the oplog in a past period again. |
Supported Deployment | Support master-slave, replica set, sharded cluster. |
Advanced Functions
| Function | Description |
|---|---|
Knowledge Selection(KNOWLEDGE_SELECT) | According to the user query, the most relevant knowledge is automatically filtered from the retrieval results to improve the accuracy of the generated answers. |
Query Compression(QUERY_COMPRESS) | Semantically compress the original query to remove redundancy and keep the core content, optimizing the retrieval performance. |
Query Extension(QUERY_EXTEND) | Automatically expand user query and introduce potentially relevant information or synonymous expressions to expand semantic coverage. |
MCP Tool Call | Call tool chains (such as GitHub queries, Shell commands, etc.) configured on the MCP platform to automatically call external systems to complete tasks or offer more information in Q&A. |
Custom Code | For more information, see Custom Code Processing, Debug Custom Code and Logging in Custom Code. |
Adding Virtual Columns | Support adding custom virtual columns with fixed values, such as region, ID, etc. |
Limits
| Limit | Description |
|---|---|
Oplog Size and Retention Settings | By default, the value of replication.oplogSizeMB or storage.oplogMinRetentionHours in MongoDB is too small. If data synchronization latency is significant, unconsumed oplogs may be removed. In this case, it is necessary to increase these parameters. |
Parameter Configuration for MongoDB Master-Slave Architecture | For MongoDB master-slave architecture, set the Source parameter oplogCollection to oplog.$main. |
ChangeStream Mode | MongoDB 3.6 and above support changeStream for capturing incremental data changes. Set the Source parameter captureMode to CHANGE_STREAM. For sharded clusters, use the MongoDB connection string for synchronization. |
Oplog Mode | When using oplog mode for data synchronization from a MongoDB instance, ensure the access to the local database. |
Network Connectivity | Ensure that the migration and sync node (Worker) can connect to the knowledge bases and LLMs. |
Parameters
| Parameter | Description |
|---|---|
captureMode | Configure the MongoDB incremental data sync mode, supporting OP_LOG and CHANGE_STREAM modes. |
changeStreamBatchSize | Set the maximum number of change events per batch for MongoDB Change Stream. |
oplogCollection | Specify the collection name for MongoDB oplog. The default name is oplog.rs. |
timezone | Source time zone (the default time zone is UTC). |
Tips: To modify the general parameters, see General Parameters and Functions.
Prerequisites
| Prerequisite | Description |
|---|---|
Port Preparation | Allow the migration and sync node (Worker) to connect to the LLM and the vector database. |
Parameters
| Parameter | Description |
|---|---|
uriPrefix | The URI prefix of RagApi chat service to receive queries and send back the response generated by the model. Default value: |
contentUriPrefix | The URI prefix of RagApi information retrieval service to retrive relevant vectors. Default value: |
retrieveMaxResults | Configure the maximum number of results returned by the retriever to limit the number of most relevant entries in the vector search |
retrieveMinScore | Configure the minimum score threshold for the retriever to return results. The higher the score is, the more relevant the result is. Only content with a score above this threshold will be considered. |
contentPrompt | Configure the prompt template, defining how to combine the user question with the retrieved content into a complete prompt. You can use {{context}} and {{query}} as variable placeholders. |
enabledPromptFunctions | Prompt functions that can be enabled (seperated by comma).
Example: |
compressContentPrompt | Prompt for query compression, telling the system how to simplify the input text. You can use {{chatMemory}} and {{query}} as variable placeholders. This will only work when QUERY_COMPRESS is enabled in enabledPromptFunctions. |
extendContentPrompt | Prompt for query extention, telling the system how to enrich the original query by adding relevant context. You can use {{query}} as a variable placeholder, which only works when QUERY_EXTEND is enabled in enabledPromptFunctions. |
extendContentCount | Set the number of entries generated when content is extended. This only works when QUERY_EXTEND is enabled in enabledPromptFunctions. |
mcpServers | JSON for configuring available MCP servers, defining how to call the tools (e.g. command line or HTTP). |
maxChatMemory | The maximum number of messages retained in the chat context, defining how many rounds of conversation can be seen during model reasoning. |
toolMaxInvokeCount | The maximum number of tool calls allowed in a single session, used to limit the call depth and prevent tool execution from falling into an infinite loop. |
Tips: To modify the general parameters, see General Parameters and Functions.