High-Performance CUDA Kernel Execution on FPGAs