Why SQLite FTS5's default tokenizer drops your Japanese substrings (and the one-line fix)

FTS5's unicode61 tokenizer silently fails on CJK substring queries. Switching to trigram tokenization fixes it — here's the whole change, plus the two-layer Git + SQLite design I use to index ~800 Claude Code conversations.

martedì 9 giugno 2026 New tab

1,661 words~8 min read

If you're building any kind of personal-memory layer on top of SQLite — Claude Code conversation history, notes app, indexed knowledge base — there's a sharp edge in FTS5 that takes most people by surprise the first time they hit it.

The default tokenizer (unicode61) silently drops most Japanese substring queries. The fix is one line of SQL. But the failure mode is invisible enough that you can ship a personal search tool, use it for weeks, and never realize half your content is unreachable.

This post walks through:

The failure, reproducible in 20 lines of Python

The one-line fix (tokenize='trigram') and what it actually does under the hood

Why SQLite FTS5's default tokenizer drops your Japanese substrings (and the one-line fix)

Why SQLite FTS5's default tokenizer drops your Japanese substrings (and the one-line fix)

Related reading

9 silent-row-loss fixes in 7 days across 7 OSS databases

SQLite Ecosystem: RTree/JSON Bugs, ON CONFLICT DO SELECT & PG Query Planning

SQLite `ON CONFLICT DO SELECT` Proposal, PostgreSQL 19 Features & SQLite…

SQLite Internals, PostgreSQL Performance & Multi-Tenancy Patterns

SQLite Journaling on SMB, TypeGraph for SQL Graphs, Cross-Engine Migrations

DuckDB 1.4.5 LTS, pgEdge ColdFront Beta, and SQLite's FCNTL_PDB Internals

Related reading

9 silent-row-loss fixes in 7 days across 7 OSS databases

SQLite Ecosystem: RTree/JSON Bugs, ON CONFLICT DO SELECT & PG Query Planning

SQLite `ON CONFLICT DO SELECT` Proposal, PostgreSQL 19 Features & SQLite…

SQLite Internals, PostgreSQL Performance & Multi-Tenancy Patterns

SQLite Journaling on SMB, TypeGraph for SQL Graphs, Cross-Engine Migrations

DuckDB 1.4.5 LTS, pgEdge ColdFront Beta, and SQLite's FCNTL_PDB Internals