This is commendable, but there's room for improvement. Up until now, SOTA-level "open-source" LLM models (LLaMA, Mistral, etc.) have usually only made their inference code and model architecture public. While these elements are not insignificant, they are somewhat trivial when compared to the training code and training datasets, as these two factors largely determine the performance of the model. This is not open at all. It goes without saying that sharing the training datasets and process with other AI researchers is crucial. This transparency would not only help to improve the model(for others could contribute to it) but also contribute to the whole community, as they usually advertised. Otherwise, it will be difficult for these efforts to truly promote the development of LLM.
Maybe it’s just that my monitors are too dark, but when I try this everything just looks too washed out and I can’t really tell what I’m looking at. Switching to light mode was more a side effect of the increased lighting than anything.
I'm curious about what makes this project special, see there's a lot of similar implementations of diffusion models based on pytorch/tf. Is it because it use the cpu itself to produce the diffusion process?
Yeah. For something like this, you ideally would want a powerful GPU with 12-24gb VRAM. If you have something like an RTX 2070 at the bare minimum, you probably don't need this and could do a lot more steps a lot faster on a GPU, but it's great for those who don't have that option.
Yep, 8GB works fine. The 2070 is where I started. I wouldn't consider it ideal, though. There will be cases where you'll wish you could increase the resolution a little more, or could do just a few more per batch, but you're getting CUDA out-of-memory errors