Searching for Birds with Pinecone Full-Text Search

Semantic search excels at meaning, but some queries demand exactness. It finds documents that mean roughly what the user asked, even when the exact words don't match. That's useful until you need a specific term, a verbatim phrase, or an explicit exclusion — like an error code, or legal clause. When search returns “close enough” instead of exact, users lose trust and developers spend time debugging.Full-text search is built for exactly those cases. Pinecone’s implementation runs BM25 scoring against string fields in your index, supports Lucene query syntax for boolean and phrase queries, and can be combined with dense or sparse vector ranking when you need both lexical precision and semantic similarity.By the end of this post, you'll have a working reference for every query pattern and a new appreciation for North American birds. The examples use a flock of 200 North American bird articles indexed with three searchable text fields: bird_name, intro, and body. The body field has stemming enabled; bird_name and intro don't. The index also includes an image_embedding field, a 768-dimensional dense vector from Gemini Embedding 2, which we'll use when combining vector search with text filtering.Want to follow along in code? The full notebook is here.Each section below adds one tool to your query-building vocabulary, starting with a single token match and ending with combining dense vector ranking with a text filter.Simple queries: single-term and multi-fieldThe simplest full-text query is a type: "text" clause targeting a single field. Here it searches the body field for documents containing the token "migration".A note on tokens: in full-text search, a token is a unit produced by splitting text on whitespace and punctuation, lowercasing, and optionally stemming — not the same as a token in an LLM or embedding model. "Black-throated" becomes two tokens (black, throated); "migrating" with stemming enabled becomes migrat. Dense and sparse vector encoders use their own internal tokenizers entirely separate from this pipeline.response = idx.documents.search(

Searching for Birds with Pinecone Full-Text Search | Pinecone

Searching for Birds with Pinecone Full-Text Search | Pinecone

Other newsrooms on this story

Related reading

Full Text Search in Pinecone, Now in Public Preview | Pinecone

Full Text Search: Architecture and Design | Pinecone

Learn | Pinecone

Building remarkable multimodal search applications with Pinecone and AWS |…

Inside Pinecone: Slab Architecture | Pinecone

Accurate and Efficient Metadata Filtering in Pinecone’s Serverless Vector…