Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster