Last night I ran an experiment. The result surprised me.
I was testing RWKV — an architecture that works differently from standard transformers. Instead of stuffing the entire conversation history into a context window, it maintains a fixed-size "state matrix." Every token processed updates that matrix. Old information isn't deleted — it gets overwritten by new information.
My question: how long can this state actually hold something?
I designed a simple test. At the start of a conversation, I told the model: "My name is Xiao Ming, and I'm a chef." Then I chatted about other things. At the end, I asked: "Do you remember my profession?"
8 rounds of conversation: 100% recall.












