Are you able to share what functions or situations result in speedups? In my exp...

setopt · on Sept 20, 2024

The largest speedup I have seen was for a quantum mechanics simulation where I needed to repeatedly calculate all eigenvalues of Hermitian matrices (but not necessarily their eigenvectors).

This was basically the code needed:

    import scipy.linalg as la

    if cuda:
        import cupy as cp
        import cupy.linalg as cla

        ε = cp.asnumpy(cla.eigvalsh(cp.asarray(H)))
    else:
        ε = la.eigvalsh(H)

I was using IntelPython which already has fast (parallelized) methods for this using MKL, but CuPy blew it out of the water.

JBits · on Sept 21, 2024

What did you need the eigenvalues for? I wouldn've guessed exact diagonalization but in that case you would need the eigenvectors.

setopt · on Sept 21, 2024

In quantum mechanics, you use a “Hamiltonian matrix” H to encode, in some sense, “everything that a particle in your system is allowed to do” along with some energy associated with that. For instance, an electron in a metallic crystal is allowed to “hop” over to a neighboring atom and that is associated with some kinetic energy. Or it is in some cases allowed to stay on the same atom as another electron, and that is associated with a potential energy (Coulomb repulsion).

The eigenvalues of this matrix is the answer to “what are the energies of each stable electron state in this system”. If you know how many electrons you have (they tend to fill the lowest energy states they can at zero temperature), and you know what temperature you have (which gives you the probability of each “excited” state being occupied), then you can say a lot about the system. For instance, you can say what physical state lowers the “free energy” of the electrons at a given temperature (which can be used to predict phase transitions and spin configurations), or what is the “density of states” (which can be used to predict electronic resistance). You can also obtain the system’s entropy from the eigenvalues alone.

There are however many cases where you might need eigenvectors too, since they usually provide all the spatial information about “where in your system is this stuff happening”. When I need the eigenvectors, CuPy is still hundreds of times faster on my hardware, but the gap is just not as extreme as it was for pure eigenvalue calculation in my benchmarks.

KeplerBoy · on Sept 20, 2024

Not OP, but think about stuff like FFTs or Matmuls. It's not even a competition, GPUs win when the algorithm is somewhat suitable and you're dealing with FP32 or lower precision.