Abstract: Generic matrix matrix multiplication (GEMM) on graphics processors (GPU) has long been the target of both tuning to find a fastest kernel for a GPU and hand-written assembly to achieve ...