For the technical deep dive into how FTS is built, see Full Text Search: Architecture and Design When semantic search hit production scale, the default move for retrieval was to embed text and search by meaning. As such, the surface area of what a query could match expanded: the same corpus, searched semantically, contained more retrievable signal than it had before.But expanded coverage cuts both ways. The same property that lets a vague query find a relevant document also makes it harder to pin down an exact one. Precision for searching on specifics (i.e. a product SKU, a legal citation, a person's name, an error code) doesn't live in embedding space. So as retrieval systems matured, keyword matching came back into purview, not as a replacement for semantic search but as the natural complement.Full text search is now available in Pinecone, in Public Preview. BM25 scoring across multiple text fields per index, Lucene query syntax, and multi-language tokenization are all built in.One index, text and vectors togetherA single index now holds text fields, dense vectors, sparse vectors, and filterable metadata, defined together in a schema set at index creation. Each text field takes a language setting that controls tokenization, stemming, and optional stop word removal. Stemming reduces words to their root form, so "running" and "runs" all match a query for "run." Eighteen languages are supported.Multiple text fields can be configured per index, which removes the modeling workaround of routing every searchable string through a single field. Title, body, and tags can each be independent text fields, scored or filtered on their own terms.Keyword search and vector search run in the same query against that schema. There is no separate keyword index to maintain, no results from two systems to reconcile.# Query: keyword search across title/body using Lucene query syntax (BM25 ranking)
Full Text Search in Pinecone, Now in Public Preview | Pinecone
Full text search in Pinecone, built for agents and RAG. Lucene queries, BM25, 17-language tokenization, and text-match filters in a single query alongside vectors.






