PostgreSQL to RagApi

BladePipe supports data replication from PostgreSQL to RagApi. View supported migration, sync, verification, and connector capabilities.

Target DataSource：

RagApi

Connection

Basic Functions

Function	Description
Schema Migration	If there is no knowledge base in the target instance, BladePipe will automatically create a virtual one based on the source metadata and the mapping rule.
Knowledge Base Name Mapping	Support the mapping rules, namely, generating a name in the format DataJobName_DB_SCHEMA_Table, keeping the name the same as that in Source, converting the text to lowercase, converting the text to uppercase, truncating the name by "_digit" suffix.
Chat	Integrate popular LLMs (such as OpenAI, DashScope, Ollama, etc.) to implement core RAG capabilities such as embedding generation, semantic retrieval, and response generation. Support extended functions such as streaming response, multi-semantic retrieval, and MCP tool calls

Advanced Functions

Function	Description
Knowledge Selection（KNOWLEDGE_SELECT）	According to the user query, the most relevant knowledge is automatically filtered from the retrieval results to improve the accuracy of the generated answers.
Query Compression（QUERY_COMPRESS）	Semantically compress the original query to remove redundancy and keep the core content, optimizing the retrieval performance.
Query Extension（QUERY_EXTEND）	Automatically expand user query and introduce potentially relevant information or synonymous expressions to expand semantic coverage.
MCP Tool Call	Call tool chains (such as GitHub queries, Shell commands, etc.) configured on the MCP platform to automatically call external systems to complete tasks or offer more information in Q&A.

Limits

Limit	Description
Network Connectivity	Ensure that the migration and sync node (Worker) can connect to the knowledge bases and LLMs.

Examples

Title	Details
Create and Store Vectors in PGVector	See Create and Store Vectors in PGVector
Create RAG API with PGVector	See Create RAG API with PGVector

Source

Prerequisites

Prerequisite	Description
Permissions for Account	Required permissions (taking a self-managed database as an example): GRANT ALL PRIVILEGES ON DATABASE sync_db TO sync_user (or SELECT permission on all views in the sync_db information_schema, and SELECT permission on tables, indexes, constraints to be synchronized) ALTER USER sync_user REPLICATION
Incremental Data Sync Preparation	Prepare as follows: Modify postgresql.conf, set wal_level=logical and wal_log_hints=on Modify pg_hba.conf, set host replication sync_user CIDR netmask md5, host sync_db sync_user CIDR netmask md5, host postgres sync_user CIDR netmask md5 Restart PostgreSQL
Port Preparation	Allow the migration and sync node (Worker) to connect to the PostgreSQL port (e.g., port 5432).

Prerequisite

Description

Permissions for Account

Required permissions (taking a self-managed database as an example):

GRANT ALL PRIVILEGES ON DATABASE sync_db TO sync_user (or SELECT permission on all views in the sync_db information_schema, and SELECT permission on tables, indexes, constraints to be synchronized)
ALTER USER sync_user REPLICATION

Incremental Data Sync Preparation

Prepare as follows:

Modify postgresql.conf, set wal_level=logical and wal_log_hints=on
Modify pg_hba.conf, set host replication sync_user CIDR netmask md5, host sync_db sync_user CIDR netmask md5, host postgres sync_user CIDR netmask md5
Restart PostgreSQL

Port Preparation

Allow the migration and sync node (Worker) to connect to the PostgreSQL port (e.g., port 5432).

Parameters

Parameter	Description
fullFetchSize	Fetch size for scaning full data.
eventStoreSize	Cache size for parsed incremental events.
ignoreGisSRID	Whether to ignore SRID when parsing GIS data types.
defaultGisSRID	Set the SRID for GIS data types.

Tips: To modify the general parameters, see General Parameters and Functions.

Target

Prerequisites

Prerequisite	Description
Port Preparation	Allow the migration and sync node (Worker) to connect to the LLM and the vector database.

Parameters

Parameter	Description
uriPrefix	The URI prefix of RagApi chat service to receive queries and send back the response generated by the model. Default value: `/v1/chat/completions`
contentUriPrefix	The URI prefix of RagApi information retrieval service to retrive relevant vectors. Default value: `/v1/content/retrieve`
retrieveMaxResults	Configure the maximum number of results returned by the retriever to limit the number of most relevant entries in the vector search
retrieveMinScore	Configure the minimum score threshold for the retriever to return results. The higher the score is, the more relevant the result is. Only content with a score above this threshold will be considered.
contentPrompt	Configure the prompt template, defining how to combine the user question with the retrieved content into a complete prompt. You can use {{context}} and {{query}} as variable placeholders.
enabledPromptFunctions	Prompt functions that can be enabled (seperated by comma). Optional functions: KNOWLEDGE_SELECT(Select the most relevant content automatically. Supports automatic routing of multiple knowledge bases.) QUERY_COMPRESS(Compress queries) QUERY_EXTEND（Extend queries） Example: `KNOWLEDGE_SELECT,QUERY_COMPRESS`
compressContentPrompt	Prompt for query compression, telling the system how to simplify the input text. You can use {{chatMemory}} and {{query}} as variable placeholders. This will only work when QUERY_COMPRESS is enabled in enabledPromptFunctions.
extendContentPrompt	Prompt for query extention, telling the system how to enrich the original query by adding relevant context. You can use {{query}} as a variable placeholder, which only works when QUERY_EXTEND is enabled in enabledPromptFunctions.
extendContentCount	Set the number of entries generated when content is extended. This only works when QUERY_EXTEND is enabled in enabledPromptFunctions.
mcpServers	JSON for configuring available MCP servers, defining how to call the tools (e.g. command line or HTTP). eg: { "mcpServers": { "github": { "command": "docker", "args": [ "run", "-i", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "mcp/github" ], "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "<YOUR_TOKEN>" } } } }
maxChatMemory	The maximum number of messages retained in the chat context, defining how many rounds of conversation can be seen during model reasoning.
toolMaxInvokeCount	The maximum number of tool calls allowed in a single session, used to limit the call depth and prevent tool execution from falling into an infinite loop.

Tips: To modify the general parameters, see General Parameters and Functions.