Base de Conocimiento — NetVoice Telecom

RAG ingestion and query pipeline

SOURCES OF KNOWLEDGE

📄

PDF / Word

🌐

Website

🔌

API / DB

📊

Spreadsheets

↓

⚙️ Vector processing

Smart Chunking

Embeddings 1536-dim

Metadata tagging

Automatic OCR

↓

🗄️ Indexed vector basis

1.2M vectors

↓

💬 Semantic query + generation

"What is the return policy for digital products?"

✅ Answer generated with verified sources · <500ms

Retrieval-Augmented Generation

The agent who always knows the correct answer

Unlike conventional LLMs that generate answers based solely on your training, NetVoice uses RAG to query your specific knowledge base in each conversation. The result: accurate, up-to-date answers traceable to the exact source.

🎯

95%+ accuracy guaranteed

The system only responds with verified information from your documents. When you are not certain, you say so explicitly and escalate the query instead to invent an answer. Goodbye to the model's hallucinations.

⚡

Instant updates

Upload a new document and in less than 60 seconds it is available for the agent. You do not need to re-train the model or wait for update cycles. Change a policy today and the agent knows about it today.

🔍

Semantic search, not just by keywords

The vector engine understands the meaning of the questions, not just the exact words. "When does my warranty expire?" and "limit date for coverage of product" search for the same thing and get the same correct answer.

Capabilities

A knowledge brain always updated

Six capabilities that transform your documents and information into highly precise conversational intelligence.

📄

PDF/Word Document Ingestion

Upload manuals, contracts, policies, catalogs and procedures in PDF, Word, Excel or PowerPoint. The system extracts text, OCRs images and tables, divides the content into semantic fragments and automatically indexes it for instant search.

🕷️

Automatic Web Crawling

Configure URLs for your website, blog, help center or technical documentation and the system automatically crawls them. Defines the depth of the crawl, the update frequency and the sections to include or exclude. The web content is always kept up to date.

🔍

Vector Semantic Search

Search engine based on high-dimensional embeddings that understands the meaning and context of the questions. Retrieves the most relevant pieces of information in less than 500 ms, even in knowledge bases with millions of documents.

⚡

Real Time Update

Each new or updated document is available to the agent in less than 60 seconds. No retraining, no deployment cycles, no maintenance windows. Change a policy, upload the PDF and the agent knows it immediately.

🌍

Multilanguage

Indexes documents in Spanish, English, Portuguese, French and more than 50 additional languages. The agent responds in the customer's language by querying the correct knowledge base, or searches all bases simultaneously when content is not available in a specific language.

📋

Content Version Control

Each version of a document is registered with timestamp and author. You can check what information the agent had available on a specific date, roll back to previous versions, and schedule document activation for future dates (ideal for price or policy changes).

Process

From your documents to precise answers

Four stages of the RAG pipeline that convert any document into high-fidelity conversational knowledge.

1

Document Upload

Upload your documents via web interface, API or direct integration with Google Drive, SharePoint or Confluence. The system accepts PDF, Word, Excel, PowerPoint, HTML, Markdown and plain text. It also configures website URLs for automatic crawling and connectors to external databases or APIs.

2

Vector Processing

The processing engine applies OCR to extract text from images and tables, divides the content into coherent semantic fragments (chunks), generates high-dimensional embeddings for each fragment using state-of-the-art language models, and enriches each vector with origin, date and version metadata.

3

Semantic Indexing

Vectors are stored in an optimized vector database (Pinecone, Weaviate or Qdrant depending on size). Semantic indexes are built that allow the most relevant fragments to be recovered for any query in less than 500 ms, even with millions of indexed documents. The indexes are updated in real time.

4

Accurate Answers

When the customer asks a question, the system converts the question into a query vector, retrieves the most relevant fragments from the knowledge base, combines them with the context of the conversation, and generates a natural language response that cites the specific sources. If trust is low, it automatically escalates.

Use cases

Precise knowledge for each industry

How different organizations leverage the RAG knowledge base to deliver perfect answers at scale.

🖥️ Technical Support

A software manufacturer indexes all of its technical documentation: user manuals, release notes, troubleshooting guides, and knowledge base articles. The AI agent answers 85% of technical support tickets without escalating to humans, citing the exact article from the documentation. Complex errors scale with the complete previous diagnosis already made by the AI.

✓ 85% ticket deduction · -60% resolution cost

🛒 Customer Service

A retailer with 50,000 SKUs indexes product catalogs, return policies, terms and conditions, frequently asked questions and warranty manuals. The agent answers questions about specifications, availability, prices, delivery times and return processes with always updated information from the catalog and the ERP.

✓ +92% customer satisfaction · 24/7 without human agents

🏛️ Government Procedures

A government entity indexes regulatory frameworks, paperwork requirements, administrative forms and procedures. Citizens ask in language natural: "What documents do I need to renew my passport?" and get answers accurate with exact steps, updated costs and process times, without having to read complex legal documents.

✓ -75% in-person consultations · +89% satisfied citizens

🎓 Employee Onboarding

Companies index HR policies, employee manuals, procedures internal information, benefits and onboarding guides. New employees ask the agent about vacations, benefits, internal processes and company policies. The AI responds with complete precision, citing the exact policy document, freeing the HR team from answering repetitive questions.

✓ -70% HR time on repetitive employee queries

Advanced Features

Knowledge management enterprise

Beyond basic indexing, NetVoice offers advanced tools to manage large volumes of knowledge with precision, security and total control.

🔒

Role Based Access (RBAC)

Defines which agents can access which knowledge bases. The sales agent only sees the catalog, the HR agent only sees the internal policies. Granular segmentation by department, region or customer type.

📊

Query Analytics

Complete dashboard with the most frequently asked questions, the most consulted documents, unanswered questions (knowledge gaps) and precision by category. Identify what information is missing from your knowledge base.

✍️

AI Knowledge Editor

Integrated tool to edit, approve and publish knowledge content. AI suggests improvements to writing to optimize retrieval, detects inconsistencies between documents and alerts about outdated information.

📊 Knowledge Base Analytics

Frequently asked questions

Return Policy

82%

Open hours

67%

Product Guarantee

54%

97.3%

Average precision

342ms

P50 Latency

⚠️ Knowledge gaps detected

3 unanswered questions this week

→ "Extended warranty exchange policy"

Frequently Asked Questions

All about the RAG knowledge base

What is RAG and why is it better than an LLM trained with my papers?

+

RAG (Retrieval-Augmented Generation) combines two components: an information retrieval engine that searches your knowledge base, and a generative model that converts that information into natural language responses. The advantage over fine-tuning (training the model with your data) is fundamental: with RAG, updating the information takes seconds, not weeks of retraining. Additionally, RAG cites exact sources, reducing "hallucinations" (made-up responses) to less than 5%. The LLM never tries to respond with information that is not in your documents: if it doesn't know, it says so and escalates the query.

How many documents can I index and what is the size limit?

+

There is no limit on the number of documents. Each document can be up to 50 MB in PDF or Word format. We have indexed knowledge bases with more than 500,000 documents without performance degradation. The practical limit is set by the search time: for very large databases (>1M documents), we recommend partitioning by department or product to keep latency below 500 ms. The Enterprise plan includes unlimited vector storage with guaranteed search SLA of <500 ms P95.

How do you prevent the agent from "hallucinating" or inventing information?

+

We implement four layers of protection against hallucinations: (1) Strict Grounding: the model is instructed to respond only with information present in the recovered fragments; (2) Confidence Score: each response has a semantic similarity score; if it is below the configured threshold, the agent says "I don't have exact information about this" and escalates; (3) Citation of sources: each answer includes the document and section from which the information comes; (4) Post-generation validation: a model checker compares the generated response with the source fragments and detects inconsistencies before sending the response to the client.

What happens with confidential documents? Is the information secure?

+

Knowledge base security is a top priority. All documents are stored encrypted with AES-256 at rest and TLS 1.3 in transit. Vectors (mathematical representations of the content) do not allow the original text to be reconstructed, adding an additional layer of protection. Access to the knowledge base is role controlled (RBAC): you define exactly which agents can consult which documents. Each client's data is stored in completely isolated namespaces. We are GDPR, HIPAA (for health), and SOC 2 Type II compliant. Documents are never shared between clients or used to train global models.

Knowledge Base with RAG in Real Time