The universal approximation therorem is not as powerful as it sounds. Polynomials essentially satisfy it as well [0], the only hiccup being that the Universal Approximation Theroum is explicitly about neural networks.
The UAT is an existence proof, it says nothing about any particular method being capable of constructing such a network. In contrast, with polynomials we have several methods of constructing polynomials that are proven to converge to the desired function.
Indeed, polynomials have been widely used as universal approximators for centuries now, and are often amazingly successful. However, polynomials in this context are only good in low degrees, where they are inherently limited in how well they can approximate [1]. Beyond a certain point, increasing your degrees of freedom with polynomial approximators simply does not help and is generally counter productive, even though a higher degree polynomial is strictly more powerful than a lower degree one.
Looking at the current generative AI breakthrough, the UAT would say that today's transformer based architecture is no more powerful than a standard neurul net. However, it produces vastly superior results that could simply not be achieved by throwing more compute at the problem.
Sure, if you have an infinite dataset and infinite compute, you might have AGI. But at that point
, you have basically just replicated the Chinese room thought experiment.
[0] See the Stone–Weierstrass theorem
[1] They are also used as arbitrary precision approximators, but that is when you compute them analytically instead of interporlating them from data.
The UAT is an existence proof, it says nothing about any particular method being capable of constructing such a network. In contrast, with polynomials we have several methods of constructing polynomials that are proven to converge to the desired function.
Indeed, polynomials have been widely used as universal approximators for centuries now, and are often amazingly successful. However, polynomials in this context are only good in low degrees, where they are inherently limited in how well they can approximate [1]. Beyond a certain point, increasing your degrees of freedom with polynomial approximators simply does not help and is generally counter productive, even though a higher degree polynomial is strictly more powerful than a lower degree one.
Looking at the current generative AI breakthrough, the UAT would say that today's transformer based architecture is no more powerful than a standard neurul net. However, it produces vastly superior results that could simply not be achieved by throwing more compute at the problem.
Sure, if you have an infinite dataset and infinite compute, you might have AGI. But at that point , you have basically just replicated the Chinese room thought experiment.
[0] See the Stone–Weierstrass theorem
[1] They are also used as arbitrary precision approximators, but that is when you compute them analytically instead of interporlating them from data.