Why pruning OpenAI reasoning models is so hard
In contrast to the Chat Completions API, the newer Responses API is stateful, which means you don't need to pass the actual message payload on every request; OpenAI stores the payload and gives you message references to pass back and forth.
While the Responses API is advertised for better performance, it's practically a necessity for reasoning models in agentic workflows because OpenAI doesn't expose the raw reasoning tokens like other providers do. The raw reasoning is stored by OpenAI and you need to reference it using an ID starting with rs_.
There's an important caveat: reasoning items and everything that follows (function calls, messages, or both) form a group. Each item in the group has an id that ties it back to the preceding reasoning item.
If you keep reasoning but remove items from the group (function_call or message), you'll get errors like:
400 Item 'fc_...' of type 'function_call' was provided without its required 'reasoning' item: 'rs_...'
If you try to prune by removing just the reasoning item, you'll get:
400 Item 'rs_...' of type 'reasoning' was provided without its required following item.
You cannot selectively alter this group: removing a single item while keeping others with their IDs will break the pairing. The only option when pruning old turns is to strip both the reasoning item AND the id fields from all items that followed it. The practical implication is that you will send the full message payload again instead of referencing items.
The good news is that reasoning from past turns is going to be ignored by the model anyway; it only matters within a turn during active tool loops. So pruning old reasoning costs you nothing except some cache efficiency.
Links: