Bayesian statistics: the parameters you want to infer are modeled as random variables with a non-empirical prior, and Bayes' rule is used to draw inferences.
Frequentist statistics: you construct estimators for the parameters you care about, subject to appropriate loss/risk criteria, but without any explicit "prior knowledge".
Frequentist statistics with Bayes' theorem: you use available empirical data, plus some exponential-family distribution, to construct an informed prior, then use Bayes' rule to update the prior on evidence. You use this Bayesian approach only for unobservable hypotheses, rather than for parameters which can be estimated.
Machine learning: you stack lots and lots of polynomial regressors onto each-other and train them with a loss function until they predict well on the validation set.
A more charitable take on machine learning: you decide that your criterion is predictive accuracy, and you evaluate it on a holdout set (or you cross-validate).
The idea of evaluation on a holdout set is actually frequentist: it's equivalent to "I really want my model to work well on the true distribution, but that's unknown, so I shall approximate it by the empirical distribution of the data." The empirical distribution is the maximum likelihood fit to the data, if you allow yourself the entire space of distributions.
Compare to how Bayesians do model selection... I've seen several versions:
-- "I have a prior on the set of models, and I compute the model evidence using Bayesian principles, and thereby update my beliefs about the set of models." (This is a clean principled approach. Shame no one does it!)
-- "I compute model evidence using Bayesian principles. The model with the largest evidence is my favoured model." (This is nonsense.)
-- "I compute model evidence. I then use gradient descent to find the hyperparameter values that maximize evidence." This is what is done by all sorts of "Bayesian" frameworks, such as the Gaussian Process models in sklearn. (This is classic frequentism, but for some strange reason Bayesians claim it as their own.)
I certainly wouldn't argue that "predictive accuracy" is the be-all and end-all of modelling -- but it is a nice clean principled approach to model selection. I have honestly never seen a Bayesian who takes a principled approach to model selection.
> A more charitable take on machine learning: you decide that your criterion is predictive accuracy, and you evaluate it on a holdout set (or you cross-validate).
I'm doing a PhD in machine learning, so I quite realize. But it's Bayesian machine learning!
Bayesian statistics is sometimes called subjectivist statistics. Probability in Bayesian statistics reflects your degree of belief in some potential outcome.
If you conduct an experiment, you use Bayes’ theorem to update your degree of belief, which is now conditional on the outcome of your experiment.
By quantifying your degree of belief in a prior, you give yourself some starting point (rather than just assuming 0 probability), even if that prior is only an educated guess and not some well researched position. This can be good because you might not have done the research yet.
Frequentist statistics: you construct estimators for the parameters you care about, subject to appropriate loss/risk criteria, but without any explicit "prior knowledge".
Frequentist statistics with Bayes' theorem: you use available empirical data, plus some exponential-family distribution, to construct an informed prior, then use Bayes' rule to update the prior on evidence. You use this Bayesian approach only for unobservable hypotheses, rather than for parameters which can be estimated.
Machine learning: you stack lots and lots of polynomial regressors onto each-other and train them with a loss function until they predict well on the validation set.