Another datapoint, first hand report from a random internet person that 7B took 2.5-3hrs on a 8xA100 80Gb setup. If it ends up being a straight up linear thing, that means about ~27 hours for the 65B model. Depending on the host and if it's a preemptable instance or not, that could be about $12-30 per hour.
Have a link? I haven't seen any finetuning scripts in the wild that train a PEFT model on a multrigpu setup yet and would love to play around with one.