The idea is way older than both leetcode and reinforcement learning and is used everywhere (for example when planning SQL queries). If reinforcement learning invented a new way to use the word then that is all their fault because leetcode is true to the original meaning.
Both meanings derive from Bellman. "Reinforcement learning" did not invent a new way, and the field of reinforcement learning predates leetcode by a large margin.
So what is the confusion then? Leetcode got it from real computer science and engineering and retained the original meaning in both steps without any changes. Every bigger piece of software you use probably uses dynamic programming somewhere. Or do you think that e.g. PostgreSQL is leetcode?
There is nothing derived here. It is exactly the same meaning.
Perhaps we misunderstand each other. I'm saying that the mathematical optimization method differs from the algorithmic paradigm, as explained on this Wikipedia page for example: https://en.wikipedia.org/wiki/Dynamic_programming. That is to say, they serve different purposes and are distinct concepts. They sure have the same fundamental basis laid out by Bellman, but the applications have diverged quite a bit.
Do you mean to say that these two concepts also have exactly the same meaning?