The tokens-per-byte trap: character-level 'compression' adds tokens
Deleting 25% of characters from your LLM context makes the file 22% smaller and the prompt tokens 23 to 66 percent larger. Here is what the tokenizer is doing, with the literature trail.