This detailed tutorial walks through building a production-ready Retrieval-Augmented Generation (RAG) system using OpenSearch's vector database and Deepseek's advanced language model. The implementation focuses on creating an end-to-end solution for accurate information retrieval combined with AI-powered contextual responses, all running locally without cloud dependencies.
Want to build your AI Workflow with NO CODE?
Want to use ALL AI Models (GPT Models, Deepseek, Google Gemini, Claude 3.5 Sonnet....) in One Place?
Want to generate AI Images with FLUX, Recraft, and Generate AI Vidoes with Minimax, Luma AI, Runway ML, etc?
Try Anakin AI where you can use everything with ONE SUBSCRIPTION for All!

System Architecture Overview
The RAG system combines three core components:
- OpenSearch Vector Database - Stores document embeddings and handles similarity searches
- Deepseek Language Model - Generates contextual responses using retrieved information
- Embedding Engine - Converts text to numerical representations (Sentence Transformers)
The workflow follows four key stages:
- Document embedding and indexing
- Query processing and vector search
- Context augmentation
- AI response generation
1. Environment Setup
Software Dependencies
# Install system-level requirements
sudo apt-get install -y python3-dev build-essential docker.io
Python Package Installation
pip install opensearch-py==2.4.0 \
transformers==4.37.0 \
torch==2.1.2 \
sentence-transformers==2.3.1 \
numpy==1.26.4
2. OpenSearch Configuration
Extended Docker Deployment
sudo docker run -d --name opensearch-rag \
-p 9200:9200 -p 9600:9600 \
-e "discovery.type=single-node" \
-e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=YourSecurePassword" \
-e "OPENSEARCH_JAVA_OPTS=-Xms4g -Xmx4g" \
-e "plugins.security.disabled=true" \
-v /path/to/opensearch/data:/usr/share/opensearch/data \
opensearchproject/opensearch-knn:2.14.0
3. Vector Index Configuration
Enhanced Index Settings
```
index_settings = {
"settings": {
"index": {
"knn": True,
"knn.algo_param.ef_search": 512,
"number_of_shards": 3,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"text": {"type": "text", "analyzer": "english"},
"metadata": {"type": "object"},
"embedding": {
"type": "knn_vector",
"dimension": 384,
"method": {
"name": "hnsw",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
"m": 32
}
}
}
}
}
}
4. Advanced Document Processing
Text Preparation Pipeline
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
def preprocess_text(text):
# Clean text
text = text.lower().replace('\n', ' ')
# Tokenization and stemming
stemmer = PorterStemmer()
tokens = [stemmer.stem(word) for word in word_tokenize(text)]
# Remove stopwords
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word not in stop_words]
return ' '.join(filtered_tokens)
5. Deepseek Model Customization
Model Initialization with Quantization
from transformers import BitsAndBytesConfig
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16
)
model = AutoModelForCausalLM.from_pretrained(
"deepseek-ai/deepseek-coder-6.7b-base",
quantization_config=quant_config,
device_map="auto"
)
6. Enhanced RAG Pipeline
Hybrid Search Implementation
def hybrid_search(query, k=5, alpha=0.7):
# Vector search
vector_results = client.search(
index="documents",
body={"query": {"knn": {"embedding": {"vector": embedding_model.encode(query), "k": k}}}}
)
# Keyword search
keyword_results = client.search(
index="documents",
body={"query": {"match": {"text": query}}}
)
# Combine results using reciprocal rank fusion
combined = reciprocal_rank_fusion(vector_results, keyword_results)
return combined[:k]
Production Deployment Considerations
Performance Monitoring
class PerformanceMonitor:
def __init__(self):
self.metrics = {
'search_latency': [],
'model_inference_time': [],
'cache_hit_rate': 0
}
def log_search_time(self, duration):
self.metrics['search_latency'].append(duration)
def log_inference_time(self, duration):
self.metrics['model_inference_time'].append(duration)
def update_cache_stats(self, hits, total):
self.metrics['cache_hit_rate'] = hits / total
Future Enhancement Roadmap
- Implement cross-lingual search capabilities
- Add visual search through multimodal embeddings
- Integrate real-time data streaming
- Develop domain-specific fine-tuning pipelines
- Create automated evaluation framework
This comprehensive implementation provides a robust foundation for building enterprise-grade RAG systems. By combining OpenSearch's powerful search capabilities with Deepseek's advanced language understanding, developers can create sophisticated AI applications that deliver accurate, context-aware responses while maintaining full control over data privacy and system architecture.ShareExportRewrite