There was this very interesting paper out of Stanford this last September about pretraining under the unlimited compute but limited data paradigm[0]. Pretty much exactly the same thing but with ~200M training tokens instead.
I see you already mention diffusion - iirc there was a result not too long ago that diffusion models keep improving with more epochs for longer than AR models do.
diffusion is promising, but still an open question how much data efficient they are compared to AR. in practice, you can also train AR forever with high enough regularization, so let's see.
Not understanding the difference between this and something like cargo audit[0]. I suppose it has something to do with "static analysis of vulnerabilities" but I don't see any of that from a quick google search of govulncheck.
govulncheck analyzes symbol usage and only warns if your code reaches the affected symbol(s).
I’m not sure about cargo audit specifically, but most other security advisories are package scoped and will warn if your code transitively references the package, regardless of which symbols your code uses.
Thought this was AWS ECS – turns out it was much more interesting! I love entity component systems conceptually, glad to see data oriented design being discussed (even if it's not necessarily positive)
I like seeing this analysis on new model releases, any chance you can aggregate your opinions in one place (instead of the hackernews comment sections for these model releases)?
reply