The PITA part of condor/grid was software management before containers. Sure, everyone was running at least RHEL4/5/6 (Or SL4/5/6) and in many cases AFS worked and the more advanced operators were adding VM execution, but it was (and still is) annoying to deal with. Most annoying right now is that nobody can agree on a container runtime - there’s Docker/Shifter, singularity, Charlie. It should just all be podman now but everybody is still holding on.
(I have worked with Slurm, condor, Torque/PBS, gridEngine, DIRAC, and LSF)
Plugging a different scheduler into k8s might be an interesting way of solving this - it seems like there’s a lot of work on scheduler plugins when I last looked. Some of the issues are similar in cloud too - coscheduling by latency.
There’s at least some incentive to not solve this - I remember k8s on Mesos being popular and of course we know how that played out.
(I have worked with Slurm, condor, Torque/PBS, gridEngine, DIRAC, and LSF)
Plugging a different scheduler into k8s might be an interesting way of solving this - it seems like there’s a lot of work on scheduler plugins when I last looked. Some of the issues are similar in cloud too - coscheduling by latency.
There’s at least some incentive to not solve this - I remember k8s on Mesos being popular and of course we know how that played out.