I’ve been building a personal AI assistant for the past few months. You know the kind: you chat with it, it remembers what you said, and it helps with tasks like summarizing emails, answering questions about your notes, or just being a sounding board.
It started as a weekend project. A few Python scripts, an OpenAI-compatible API endpoint, and a simple loop in the terminal. I was smug. "Look, I built an AI!" But then things got ugly.
The moment I started having longer conversations, the bot became useless. It would forget what I said three messages ago, contradict itself, or start repeating the same advice. I was throwing more and more tokens at the API, and my wallet was crying. Something had to change.
The naive approach (and why it failed)
My first attempt was trivial: just append every new message to a list and send the whole history as the messages array to the API. That worked… for about 10 exchanges. Then token limits kicked in. The API started truncating the oldest messages, breaking the conversation flow.






