Architecture and Compiler Optimization for Data Bandwidth Improvement in Configurable Processors