A comprehensive example demonstrating how to use ElasticPress.io as a vector knowledge store for both traditional search and AI-powered applications, built with the Nobel Prize dataset.
Data Source: Nobel Prize API v2.1 by Nobel Prize Outreach.
Search
- Full-text keyword search with faceting, filters, and fuzzy matching
- Semantic (vector/kNN) search using dense embeddings
- Hybrid search combining BM25 and vector similarity
- Real-time autosuggest via ElasticPress.io search templates
AI / RAG
- Ask AI: natural language questions answered from the database with source citations
- Text-to-Elasticsearch-DSL: LLM reads index schema, composes optimal queries (aggregations, kNN, sorts), executes, synthesizes answers
- Compatible with any OpenAI-compatible API (OpenAI, Ollama, LM Studio, Groq, Azure)
MCP Server
- Exposes search and Ask AI as tools for AI models via the Model Context Protocol
- Works with Claude Desktop, Claude Code, and any MCP-compatible client
- Demonstrates using Elasticsearch as a local vector knowledge store for AI agents
ElasticPress.io specifics
- Correct bulk indexing format (no
_indexin action metadata) - Index naming conventions
- Search template API for unauthenticated frontend access
- PHP 8.1+
- Composer
- An ElasticPress.io account
- An OpenAI-compatible API key (for semantic search, Ask AI, and MCP tools)
# Install dependencies
composer install
# Configure credentials
cp .env.example .env
# Edit .env — at minimum fill in ELASTICPRESS_* and OPENAI_API_KEY
# Create the index and import data
php bin/setup.php
php bin/index.php
# Generate vector embeddings (requires OPENAI_API_KEY)
php bin/generate-embeddings.php
# Set up the autosuggest search template
php bin/setup-template.php
# Start the web server
php -S localhost:8000 -t publicOpen http://localhost:8000 — use the Search tab for keyword/semantic/hybrid search, and the Ask AI tab for natural language questions.
Copy .env.example to .env and fill in your credentials:
# ElasticPress.io
ELASTICPRESS_HOST=https://your-endpoint.elasticpress.io
ELASTICPRESS_SUBSCRIPTION_ID=your-subscription-id
ELASTICPRESS_SUBSCRIPTION_TOKEN=your-subscription-token
# OpenAI-compatible API (OpenAI, Ollama, LM Studio, Groq, Azure, …)
OPENAI_API_BASE_URL=https://api.openai.com/v1
OPENAI_API_KEY=sk-your-api-key
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_EMBEDDING_DIMS=1536
OPENAI_CHAT_MODEL=gpt-4o-miniUsing a local model? Point OPENAI_API_BASE_URL at your provider:
| Provider | Base URL |
|---|---|
| OpenAI | https://api.openai.com/v1 |
| Ollama | http://localhost:11434/v1 |
| LM Studio | http://localhost:1234/v1 |
| Groq | https://api.groq.com/openai/v1 |
php bin/setup.phpCreates the {subscription-id}-laureates index with full-text and vector field mappings.
If you already have the index and only need to add the vector field:
php bin/update-mapping.phpphp bin/index.phpFetches ~1,000 laureate documents from the Nobel Prize API and bulk-indexes them.
ElasticPress.io note: Bulk action metadata must not include
_index. The index is specified in the URL instead. Seesrc/Index/BulkIndexer.php.
php bin/generate-embeddings.php
# Options
php bin/generate-embeddings.php --dry-run # preview without API calls
php bin/generate-embeddings.php --batch-size=50 # documents per API call
php bin/generate-embeddings.php --force # re-generate existing embeddingsEach laureate document gets a motivation_embedding field (1536-dim dense vector) built from: {category} {name} ({year}): {motivation}. Affiliated with: {institutions}.
Approximate cost with
text-embedding-3-small: < $0.01 for the full dataset.
php bin/setup-template.phpCreates an ElasticPress.io search template enabling unauthenticated autosuggest directly from the browser.
php -S localhost:8000 -t public| Mode | Description |
|---|---|
| Keyword | BM25 full-text search with fuzzy matching across names, motivations, affiliations |
| Semantic | kNN vector search — finds results by meaning, not keyword match |
| Hybrid | Combines BM25 and kNN scores for best-of-both retrieval |
Results include relevance score. Filters (category, gender, year) apply to all modes.
Ask natural language questions. The system uses a text-to-Elasticsearch-DSL approach that adapts to any index schema without code changes:
- Reads the index mapping to discover available fields and their types
- The LLM writes the optimal ES query DSL (aggregations, filters, kNN, sorts)
- Executes the query against Elasticsearch with auto-retry on errors
- The LLM synthesizes a grounded answer from the raw hits and aggregations
Example questions:
- Which women won the physics prize?
- What breakthroughs in cancer research won Nobel Prizes?
- Which person won the most Nobel prizes?
- Which country produced the most chemistry laureates?
# Keyword search
php bin/search.php "einstein"
php bin/search.php --category=physics --gender=female
php bin/search.php "quantum" --year-from=2000
# Semantic and hybrid search (requires embeddings)
php bin/semantic-search.php "quantum entanglement"
php bin/semantic-search.php "nuclear structure" --mode=hybrid --k=10
# Ask AI (RAG)
php bin/ask.php "What contributions did women make to physics?"
php bin/ask.php "Which German physicists won prizes after 1950?" --verbose
# Manage templates
php bin/manage-templates.php list
php bin/manage-templates.php view {index-name}The project includes an MCP (Model Context Protocol) server that exposes search and Ask AI as tools for AI models.
| Tool | Description |
|---|---|
search |
Keyword search with filters (category, gender, country, year) |
semantic_search |
Vector kNN search by concept similarity |
hybrid_search |
Combined BM25 + kNN search |
ask |
Answer a question using RAG — the AI decides what to search for |
get_document |
Fetch a specific laureate document by ID |
claude mcp add nobel-prize \
-e OPENAI_API_KEY="your-key" \
-e OPENAI_API_BASE_URL="https://api.openai.com/v1" \
-e OPENAI_CHAT_MODEL="gpt-4o-mini" \
-e OPENAI_EMBEDDING_MODEL="text-embedding-3-small" \
-- php /absolute/path/to/bin/mcp-server.phpThe ElasticPress.io credentials are read from .env automatically. OpenAI credentials must be passed explicitly since the MCP process runs in an isolated environment.
Verify the server is running:
claude mcp list
# nobel-prize: php ... - ✓ ConnectedAdd to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"nobel-prize": {
"command": "php",
"args": ["/absolute/path/to/bin/mcp-server.php"],
"env": {
"OPENAI_API_KEY": "your-key",
"OPENAI_API_BASE_URL": "https://api.openai.com/v1",
"OPENAI_CHAT_MODEL": "gpt-4o-mini",
"OPENAI_EMBEDDING_MODEL": "text-embedding-3-small"
}
}
}
}Restart Claude Desktop after saving. The Nobel Prize tools will appear in the tools menu.
Once connected, you can ask Claude to use the tools directly:
Use the
asktool: which women won the physics prize?
Use the
searchtool to find peace prize winners from Japan
Use the
hybrid_searchtool: breakthroughs in genetics
GET /api.php?q=einstein&mode=keyword
GET /api.php?q=quantum+mechanics&mode=semantic
GET /api.php?q=einstein&mode=hybrid
Parameters: q, mode (keyword/semantic/hybrid), category, gender, birth_country, prize_country, year_from, year_to, page, per_page
GET /api.php?ask=Which+women+won+the+physics+prize
Returns: { answer, sources[], question, mode: "rag" }
GET /api.php?id={document-id}
.
├── bin/
│ ├── setup.php # Create index
│ ├── index.php # Import Nobel Prize data
│ ├── update-mapping.php # Add vector field to existing index
│ ├── generate-embeddings.php # Generate and store vector embeddings
│ ├── search.php # CLI keyword search
│ ├── semantic-search.php # CLI semantic/hybrid search
│ ├── ask.php # CLI Ask AI (RAG)
│ ├── mcp-server.php # MCP server entry point
│ ├── setup-template.php # Create autosuggest template
│ └── manage-templates.php # Manage templates
├── public/
│ ├── index.php # Web interface (Search + Ask AI tabs)
│ ├── api.php # REST API
│ └── search-api-template.php # Serves autosuggest template to frontend
└── src/
├── Client/ElasticsearchClient.php
├── Config/Config.php
├── Data/
│ ├── NobelDataFetcher.php
│ └── NobelDataTransformer.php
├── Embeddings/
│ ├── EmbeddingProviderInterface.php
│ ├── OpenAIEmbeddingProvider.php # Any OpenAI-compatible endpoint
│ ├── EmbeddingService.php # Text composition + embedding
│ └── EmbeddingUpdater.php # Scroll + bulk-update embeddings
├── Index/
│ ├── IndexManager.php
│ └── BulkIndexer.php
├── Mapping/NobelPrizeMapping.php
├── MCP/
│ ├── McpServer.php # JSON-RPC 2.0 over stdio
│ └── ToolRegistry.php
├── RAG/
│ ├── ChatProviderInterface.php
│ ├── OpenAIChatProvider.php # Any OpenAI-compatible endpoint
│ └── RagService.php # 3-phase RAG pipeline
├── Search/SearchService.php # Keyword, semantic, hybrid search
├── Search/SearchTemplateManager.php
└── Shared/
├── HttpClientFactory.php # Shared Guzzle client setup
└── ServiceFactory.php # Assembles AI services from config
explicit index in bulk is not allowed
ElasticPress.io requires the _index field to be omitted from bulk metadata. The index is specified in the URL. See src/Index/BulkIndexer.php.
Semantic search returns 0 results
Embeddings have not been generated yet. Run php bin/generate-embeddings.php.
Ask AI returns no results
- Confirm
OPENAI_API_KEYis set in.env - Confirm embeddings are generated (
php bin/generate-embeddings.php) - Run
php bin/ask.php "your question" --verboseto see what filters and strategies were used
MCP server fails to connect
- Ensure PHP is in your
PATHor use the absolute path in thecommandfield - Check ElasticPress.io credentials are in
.env(MCP reads them from there) - Pass OpenAI credentials explicitly via
envin the MCP config (they are not inherited from the shell) - Test the server directly:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' | php bin/mcp-server.php
Changing the embedding model
If you change OPENAI_EMBEDDING_MODEL or OPENAI_EMBEDDING_DIMS, you must recreate the index and regenerate all embeddings — the vector dimensions are fixed in the mapping and cannot be changed in place.
- ElasticPress.io — Managed Elasticsearch service
- ElasticPress.io Developer Documentation
- ElasticPress.io Post Search API
- Model Context Protocol
- Nobel Prize API v2.1
MIT
Provided as-is. This is a sample project with no support commitment. For engineering consulting enquiries visit ElasticPress.io Consulting.
