I benchmarked my own semantic cache against RedisVL and Upstash for a week. Here is what actually held up

Most semantic cache benchmarks are a vendor showing you the one dataset where they win, on a model they finetuned, against a competitor they configured badly. You read it, you nod, you learn nothing.

I built and maintain a semantic cache library (@betterdb/semantic-cache on npm, betterdb-semantic-cache on PyPI, MIT, Valkey-native). So I had two choices. Write that post about my own library, or run the comparison straight and publish it even where I only tie. I did the second one. Four public datasets, two peers (RedisVL and Upstash), one self-tuning loop, and a fair amount of being wrong before being right.

There was no honest cross-library comparison of semantic caches anywhere I could find. So I made one. This is the short version. Links to the full tables and methodology are at the bottom.

1. Quality is a tie. That is the result you want.

Fix the embedding model and every honest semantic cache is doing the same thing: embed the prompt, measure cosine distance against stored prompts, return a hit below a threshold. So peak F1 converges. There is no secret sauce in the lookup.

Most semantic cache benchmarks are a vendor showing you the one dataset where they win, on a model they finetuned, against a competitor they configured badly. You read it, you nod, you learn nothing.

There was no honest cross-library comparison of semantic caches anywhere I could find. So I made one. This is the short version. Links to the full tables and methodology are at the bottom.

1. Quality is a tie. That is the result you want.

I benchmarked my own semantic cache against RedisVL and Upstash for a week. Here is what actually held up

Other newsrooms on this story

I benchmarked my own semantic cache against RedisVL and Upstash for a week. Here is what actually held up

Other newsrooms on this story

Related reading

What I Learned Building a Redis Clone in C++

A 13 KB text file beat a smarter model: benchmarking AI codegen across 5…

BSON and OSON: documents are designed to be nested, not flat

Upstash for Redis vs Supabase vs Neon: Which One Fits Vibe Coding Workflows in…

I tested 7 vector databases for my RAG stack in 2026, here's the one nobody is…

Why your Anthropic prompt caching probably isn't working (and the npm package I…

Related reading

What I Learned Building a Redis Clone in C++

A 13 KB text file beat a smarter model: benchmarking AI codegen across 5…

BSON and OSON: documents are designed to be nested, not flat

Upstash for Redis vs Supabase vs Neon: Which One Fits Vibe Coding Workflows in…

I tested 7 vector databases for my RAG stack in 2026, here's the one nobody is…

Why your Anthropic prompt caching probably isn't working (and the npm package I…