Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another datapoint, first hand report from a random internet person that 7B took 2.5-3hrs on a 8xA100 80Gb setup. If it ends up being a straight up linear thing, that means about ~27 hours for the 65B model. Depending on the host and if it's a preemptable instance or not, that could be about $12-30 per hour.


Have a link? I haven't seen any finetuning scripts in the wild that train a PEFT model on a multrigpu setup yet and would love to play around with one.


The original Alpaca repo has the training script. The readme has the torchrun command and arguments used for train.py. https://github.com/tatsu-lab/stanford_alpaca/blob/main/train...


Awesome, thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: