Automating High Level Synthesis via Graph-Centric Deep Learning

Project status: 
current

Domain-specific accelerators (DSAs) have shown to offer significant performance and energy efficiency over general-purpose CPUs to meet the ever increasing performance needs. However, it is well-known that the DSAs in field-programmable gate-arrays (FPGAs) or application specific integrated circuits (ASICs) are hard to design and require deep hardware knowledge to achieve high performance. Although the recent advances in high-level synthesis (HLS) tools made it possible to compile behavioral-level C/C++ programs to FPGA or ASIC designs, one still needs to have extensive experience in microarchitecture optimizations using pragmas and code transformation to the input program, which presents a significant barrier to a typical application domain-expert or software developer to design a DSA. Even worse, evaluating each HLS design candidate is time consuming, which makes it very difficult to perform manual design iteration or automated exploration. The proposed project addresses these problems by developing a fully automated framework for evaluating and optimizing the microarchitecture of a DSA design without the invocation of the time-consuming HLS tools. It represents the input C/C++ program as one or a set of graphs with the proper data flow and control flow information, including auto-inserted optimization directives (pragmas), and then makes use of the latest advances in graph-based machine learning (ML) and ML-driven optimizations to quickly evaluate each solution candidate and guide the optimization process. The goal of this project is to enable a typical software programmer to be able to design highly efficient hardware DSAs, with the quality comparable to those designed by experienced circuit designers.

 

The team led by Professors  Jason Cong and Yizhou Sun from the CS Department were recently awarded $1.2M  from the National Science Foundation (NSF) for the project entitled “High Level Synthesis via Graph-Centric Deep Learning”.

 

Below are the summaries of the papers under this project:

 

1. Automated Accelerator Optimization Aided by Graph Neural Networks 

High-level synthesis (HLS) has freed the computer architects from developing their designs in a very low-level language and needing to exactly specify how the data should be transferred in register-level. With the help of HLS, the hardware designers must describe only a high-level behavioral flow of the design. Despite this, it still can take weeks to develop a high-performance architecture mainly because there are many design choices at a higher level that requires more time to explore. It also takes several minutes to hours to get feedback from the HLS tool on the quality of each design candidate. In this paper, we propose to solve this problem by modeling the HLS tool with a graph neural network (GNN) that is trained to be used for a wide range of applications. The experimental results demonstrate that by employing the GNN-based model, we are able to estimate the quality of design in milliseconds with high accuracy which can help us search through the solution space very quickly.

 

2. Improving GNN-Based Accelerator Design Automation with Meta Learning 

Recently, there is a growing interest in developing learning-based models as a surrogate of the High-Level Synthesis (HLS) tools, where the key objective is rapid prediction of the quality of a candidate HLS design for automated design space exploration (DSE). Training is usually conducted on a given set of computation kernels (or kernels in short) needed for hardware acceleration. However, the model must also perform well on new kernels. The discrepancy between the training set and new kernels, called domain shift, frequently leads to model accuracy drop which in turn negatively impact the DSE performance. In this paper, we investigate the possibility of adapting an existing meta-learning approach, named MAML, to the task of design quality prediction. Experiments show the MAML-enhanced model outperforms a simple baseline based on fine tuning in terms of both offline evaluation on hold-out test sets and online evaluation for DSE speedup results.

Faculty: 

Domain-specific accelerators (DSAs) have shown to offer significant performance and energy efficiency over general-purpose CPUs to meet the ever increasing performance needs. However, it is well-known that the DSAs in field-programmable gate-arrays (FPGAs) or application specific integrated circuits (ASICs) are hard to design and require deep hardware knowledge to achieve high performance. Although the recent advances in high-level synthesis (HLS) tools made it possible to compile behavioral-level C/C++ programs to FPGA or ASIC designs, one still needs to have extensive experience in microarchitecture optimizations using pragmas and code transformation to the input program, which presents a significant barrier to a typical application domain-expert or software developer to design a DSA. Even worse, evaluating each HLS design candidate is time consuming, which makes it very difficult to perform manual design iteration or automated exploration. The proposed project addresses these problems by developing a fully automated framework for evaluating and optimizing the microarchitecture of a DSA design without the invocation of the time-consuming HLS tools. It represents the input C/C++ program as one or a set of graphs with the proper data flow and control flow information, including auto-inserted optimization directives (pragmas), and then makes use of the latest advances in graph-based machine learning (ML) and ML-driven optimizations to quickly evaluate each solution candidate and guide the optimization process. The goal of this project is to enable a typical software programmer to be able to design highly efficient hardware DSAs, with the quality comparable to those designed by experienced circuit designers.