Back to Articles
Definitions
Let's define standard reinforcement learning terms with an LLM setup in mind.
State sts_t: The current context which is the original user prompt and all tokens generated so far
Example: Prompt: "The sky is..." →\rightarrow State: ["The", "sky", "is"] in the token-space






