It’s a bit foolish though, because “regression” will always identify a pattern t...

pjmorris · on Oct 12, 2018

I guess what I'm trying to say is that algorithms can't tell whether they're fooling themselves. Someone has to apply the 101 techniques for testing fit, etc. Humans at least have the opportunity, though, as you point out, they don't always take it.

candiodari · on Oct 13, 2018

Maybe it's just me, but to me it seems like 99%+ of humans don't check whether they're fooling themselves, even when using statistics.

Have you ever known anyone to check if the central limit theorem applies before taking an average ? I mean, we did it once when learning what it was and why you might want to check, but ...

The problem with statistics is that they in theory don't work in the real world. For instance, if you check a statistical variable, great. Now you fix something in the real world and recheck your variable. BZZT wrong ! You can't measure a variable after you've tried to influence it, because obviously you're not measuring the same thing anymore. So there is (potentially) no relationship whatsoever between the measurement after the change and the measurement before. So ... statistics CANNOT correctly be used to improve things in the real world.

But ... have you ever known anyone to use statistics any other way ? Also: we don't actually have anything better.

The thing is ... it mostly works in practice, though you can come up with examples where it doesn't.

And of course you can do things very wrong, as you're just adding, multiplying and so on. That works on any set of numbers.

The thing about machine learning is that a well designed machine learning algorithm contains far less details about the problem than a statistical model. So people far less versed in the problem being analyzed can improve things more using machine learning than by using statistics. But the potential maximum improvement you could ever hope to make, statistics is going to be higher. Compare a second-degree regression to and LSTM for a time series. ASSUMING the statistical model works at all, it'll beat the crap out of the LSTM. But the LSTM will sort-of succeed in nearly all cases. So if the variable fits the information you stuck into your statistical model there's no beating that model (in this case that the data is generated by a second-degree process with a not-too-close-to-zero determinant)

Issue for the future is that all interesting problems are beyond the comprehension of any human, so ... machine learning will win. This means humans can't make statistical models for them either. It'll win, not because it is always the best solution, but because for so many problems you might as well say for all problems we will never find anything remotely optimal or understand enough to even figure out how to apply statistics to it.

pjmorris · on Oct 14, 2018

> Issue for the future is that all interesting problems are beyond the comprehension of any human, so ... machine learning will win.

I agree that there's a set of problems that are both beyond human comprehension and interesting to humans. Specifying them, measuring the results of algorithms to solve them, and paying for the results will probably have to remain within human analytical capability, or you wind up with Skynet (unlikely), or some analog to the 'gray goo' problem, where machines are optimizing with unintended consequences.