BladePipe 1.7.0: Stronger alerts, Broader DB support, Faster KingbaseES scanning.
Skip to main content

PostgreSQL to RagApi

BladePipe supports data replication from PostgreSQL to RagApi. View supported migration, sync, verification, and connector capabilities.

Target DataSource:

Connection

Basic Functions

FunctionDescription
Schema Migration

If there is no knowledge base in the target instance, BladePipe will automatically create a virtual one based on the source metadata and the mapping rule.

Knowledge Base Name Mapping

Support the mapping rules, namely, generating a name in the format DataJobName_DB_SCHEMA_Table, keeping the name the same as that in Source, converting the text to lowercase, converting the text to uppercase, truncating the name by "_digit" suffix.

Chat

Integrate popular LLMs (such as OpenAI, DashScope, Ollama, etc.) to implement core RAG capabilities such as embedding generation, semantic retrieval, and response generation. Support extended functions such as streaming response, multi-semantic retrieval, and MCP tool calls

Advanced Functions

FunctionDescription
Knowledge Selection(KNOWLEDGE_SELECT)

According to the user query, the most relevant knowledge is automatically filtered from the retrieval results to improve the accuracy of the generated answers.

Query Compression(QUERY_COMPRESS)

Semantically compress the original query to remove redundancy and keep the core content, optimizing the retrieval performance.

Query Extension(QUERY_EXTEND)

Automatically expand user query and introduce potentially relevant information or synonymous expressions to expand semantic coverage.

MCP Tool Call

Call tool chains (such as GitHub queries, Shell commands, etc.) configured on the MCP platform to automatically call external systems to complete tasks or offer more information in Q&A.

Limits

LimitDescription
Network Connectivity

Ensure that the migration and sync node (Worker) can connect to the knowledge bases and LLMs.

Examples

TitleDetails
Create and Store Vectors in PGVector

See Create and Store Vectors in PGVector

Create RAG API with PGVector

See Create RAG API with PGVector


Source

Prerequisites

PrerequisiteDescription
Permissions for Account

Required permissions (taking a self-managed database as an example):

  • GRANT ALL PRIVILEGES ON DATABASE sync_db TO sync_user (or SELECT permission on all views in the sync_db information_schema, and SELECT permission on tables, indexes, constraints to be synchronized)
  • ALTER USER sync_user REPLICATION
Incremental Data Sync Preparation

Prepare as follows:

  • Modify postgresql.conf, set wal_level=logical and wal_log_hints=on
  • Modify pg_hba.conf, set host replication sync_user CIDR netmask md5, host sync_db sync_user CIDR netmask md5, host postgres sync_user CIDR netmask md5
  • Restart PostgreSQL
Port Preparation

Allow the migration and sync node (Worker) to connect to the PostgreSQL port (e.g., port 5432).

Parameters

ParameterDescription
fullFetchSize

Fetch size for scaning full data.

eventStoreSize

Cache size for parsed incremental events.

ignoreGisSRID

Whether to ignore SRID when parsing GIS data types.

defaultGisSRID

Set the SRID for GIS data types.

Tips: To modify the general parameters, see General Parameters and Functions.


Target

Prerequisites

PrerequisiteDescription
Port Preparation

Allow the migration and sync node (Worker) to connect to the LLM and the vector database.

Parameters

ParameterDescription
uriPrefix

The URI prefix of RagApi chat service to receive queries and send back the response generated by the model. Default value: /v1/chat/completions

contentUriPrefix

The URI prefix of RagApi information retrieval service to retrive relevant vectors. Default value: /v1/content/retrieve

retrieveMaxResults

Configure the maximum number of results returned by the retriever to limit the number of most relevant entries in the vector search

retrieveMinScore

Configure the minimum score threshold for the retriever to return results. The higher the score is, the more relevant the result is. Only content with a score above this threshold will be considered.

contentPrompt

Configure the prompt template, defining how to combine the user question with the retrieved content into a complete prompt. You can use {{context}} and {{query}} as variable placeholders.

enabledPromptFunctions

Prompt functions that can be enabled (seperated by comma).
Optional functions:

  • KNOWLEDGE_SELECT(Select the most relevant content automatically. Supports automatic routing of multiple knowledge bases.)
  • QUERY_COMPRESS(Compress queries)
  • QUERY_EXTEND(Extend queries)

Example: KNOWLEDGE_SELECT,QUERY_COMPRESS

compressContentPrompt

Prompt for query compression, telling the system how to simplify the input text. You can use {{chatMemory}} and {{query}} as variable placeholders. This will only work when QUERY_COMPRESS is enabled in enabledPromptFunctions.

extendContentPrompt

Prompt for query extention, telling the system how to enrich the original query by adding relevant context. You can use {{query}} as a variable placeholder, which only works when QUERY_EXTEND is enabled in enabledPromptFunctions.

extendContentCount

Set the number of entries generated when content is extended. This only works when QUERY_EXTEND is enabled in enabledPromptFunctions.

mcpServers

JSON for configuring available MCP servers, defining how to call the tools (e.g. command line or HTTP).
eg: { "mcpServers": { "github": { "command": "docker", "args": [ "run", "-i", "--rm", "-e", "GITHUB_PERSONAL_ACCESS_TOKEN", "mcp/github" ], "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "<YOUR_TOKEN>" } } } }

maxChatMemory

The maximum number of messages retained in the chat context, defining how many rounds of conversation can be seen during model reasoning.

toolMaxInvokeCount

The maximum number of tool calls allowed in a single session, used to limit the call depth and prevent tool execution from falling into an infinite loop.

Tips: To modify the general parameters, see General Parameters and Functions.