I'm so confused. I just found a small body of literature applying LLMs to reinforcement learning type tasks, exploring the use of LLMs for "autonomous decision making."
I guess people are building more LLM agent systems, and we ought to understand them and what makes them better / worse at what they do.
But I still feel like LLMs are fundamentally not suited to decision making tasks. They don't weigh options and decide. At best, you could say they interpolate what a reasonable choice might look like based on the examples of people making choices in their training data.
That's... really not the same thing! Like, not at all. It's impressive that this sometimes works, but this seems very silly to me when we could be using actual RL systems that really are making informed decisions from experience, with mathematical rigor to estimate the quality of those choices.