Reference for CFAR Phase 2 Proposal: Accelerator-Rich Datacenters

http://www.nytimes.com/2011/09/09/technology/google-details-and-defends-its-use-of-electricity.html?_r=1
Jason Cong, Mohammad Ali Ghodrat, Michael Gill, BeaynaGrigorian, and Glenn Reinman. 2012. CHARM: a composable heterogeneous accelerator-rich microprocessor. In Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design (ISLPED '12). ACM, New York, NY, USA, 379-384.
Jason Cong, Mohammad Ali Ghodrat, Michael Gill, BeaynaGrigorian, Hui Huang, and Glenn Reinman. 2013. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In Proceedings of the 2013 International Symposium on Low Power Electronics and Design (ISLPED '13). IEEE Press, Piscataway, NJ, USA, 305-310.
Lyons, M.J.; Gu-Yeon Wei; Brooks, D., "Multi-accelerator system development with the ShrinkFit acceleration framework," Computer Design (ICCD), 2014 32nd IEEE International Conference on , vol., no., pp.75,82, 19-22 Oct. 2014.
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, “A reconfigurable fabric for accelerating large-scale datacenter services,” in Proc. Int. Symp. on Computer Architecture (ISCA), 2014.
Nvidia GPU. http://www.nvidia.com/object/what-is-gpu-computing.html
Intel Xeon Phi. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html
Alpha Data FPGA card. http://www.alpha-data.com/dcp/products.php?product=adm-pcie-7v3
QuickAssist: http://www.intel.com/content/www/us/en/io/quickassist-technology/quickassist-technology-developer.html
Duncan G Elliott, Michael Stumm, W Martin Snelgrove, Christian Cojocaru, and Robert McKenzie, “Computational ram: Implementing processors in memory,” Design & Test of Computers, IEEE, vol. 16, no. 1, pp. 32–41, 1999.
DavidPatterson,ThomasAnderson,NealCardwell,RichardFromm, Kimberly Keeton, ChristoforosKozyrakis, Randi Thomas, and Katherine Yelick, “A case for intelligent ram,” MICRO’97, vol. 17, no. 2, pp. 34–44, 1997.
PIM At Notre Dame, http://www3.nd.edu/ pim/projects.html.
MaryHall,PeterKogge,JeffKoller,PedroDiniz,JacquelineChame, Jeff Draper, Jeff LaCoss, John Granacki, Jay Brockman, ApoorvSrivastava et al., “Mapping irregular applications to diva, a pim-based data-intensive architecture,” in SC’99. ACM, 1999, p. 57.
J. Ahn et al., “Scatter-Add in Data Parallel Architectures,” in Intl. Symp. on High-Performance Computer Architecture, 2005, pp. 132–142.
R. B. T. Mingliang Wei, Marc Snir, JosepTorrellas, “A near-memory processor for vector, streaming and bit manipulation workloads,” in UIUC Tech. Report, 2005.
Z. Fang et al., “Active memory controller,” J. Supercomput.vol. 62, no. 1, pp. 510–549, Jan. 2012.
J. T. Pawlowski, “Hybrid memory cube (HMC),” in Hot Chips 23, 2011.
G. H. Loh, “3D-Stacked Memory Architectures for Multi-core Processors,” in Intl. Symp. on Computer Architecture, 2008, pp. 453–464.
S. Pugsley et al., “Comparing Different Implementations of Near Data Computing with In-Memory MapReduce Workloads,” IEEE Micro, vol. 34, no. 4, pp. 44–52, 2014.
HDFS. http://hortonworks.com/hadoop/hdfs/
JianOuyang, Shiding Lin, ZhenyuHou, Peng Wang, Yong Wang, and Guangyu Sun. 2013. Active SSD design for energy-efficiency improvement of web-scale data analysis. In Proceedings of the 2013 International Symposium on Low Power Electronics and Design (ISLPED '13). pp. 286-291.
Peng Wang, Guangyu Sun, Song Jiang, JianOuyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys '14), Article 16 , 14 pages.
John Ousterhout, ParagAgrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, SubhasishMitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, Ryan Stutsman. The Case for RAMCloud. Communications of the ACM, Vol. 54 No. 7, Pages 121-130.
Z. Wei, Y. Liang, K. Rupnow, P. Li, D. Chen and J. Cong. Improving High Level Synthesis Optimization Opportunity through Polyhedral Transformations. Proceedings of the 21st ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2013), Monterey, California, pp. 9-18, February 2013.
Yuxin Wang, Peng Li, Jason Cong. Theory and Algorithm for Generalized Memory Partitioning in High-Level Synthesis. International Symposium on Field-Programmable Gate Arrays, FPGA 2014.
Jason Cong, Muhuan Huang, Peng Zhang. Combining computation and communication optimizations in system synthesis for streaming applications. International Symposium on Field-Programmable Gate Arrays, FPGA 2014.
W. Zuo, P. Li, D. Chen, L-N. Pouchet, S. Zhong and J. Cong. Improving Polyhedral Code Generation for High-Level Synthesis. Proceedings of the International Conference on Hardware/Software Co-design and System Synthesis (CODES+ISSS 2013), pp. 1-10, September-October 2013 (Best Paper Award).
Chandan Reddy, UdayBondhugula. Effective automatic computation placement and data allocation for parallelization of regular programs. ICS '14 Proceedings of the 28th ACM international conference on Supercomputing, Jun 2014.
RoshanDathathri, Chandan Reddy, ThejasRamashekar, UdayBondhugula.Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory. International conference on Parallel Architectures and Compilation Techniques (PACT 2013), Sep 2013, Edinburgh, UK.
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark : Cluster Computing with Working Sets,” in USENIX Conference on Hot Topics in Cloud Computing, 2010.
J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, p. 107, Jan. 2008.
Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, SharadAgarwal, MahadevKonar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, BikasSaha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing(SOCC '13). ACM, New York, NY, USA, Article 5 , 16 pages.
FarhadHormozdiari, EmrahKostem, Eun Yong Kang, BogdanPasaniuc and EleazarEskin. Identifying Causal Variants at Loci with Multiple Signals of Association. Genetics, 44, 725–731 (2014).
Heng Li. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv:1303.3997v2 [q-bio.GN], 2013.
Heng Li and Richard Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754-1760, July 2009.
Ben Langmead and Steven Salzberg. Fast gapped-read alignment with bowtie 2. Nature Methods, pages 357-359, 2012.
Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, JueRuan, Nils Homer, Gabor Marth, Goncalo Abecasis, and Richard Durbin. The sequence alignment/map format and samtools. Bioinformatics, 25(16):2078-2079, August 2009.
Picard. http://picard.sourceforge.net/
Aaron McKenna, Matthew Hanna, Eric Banks, AndreySivachenko, KristianCibulskis, Andrew Kernytsky, KiranGarimella, David Altshuler, Stacey Gabriel, Mark Daly, and Mark A. DePristo. The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res, 20 (9):1297-303, 9 2010.
Taking great ideas from the lab to the fab. NSF Press Release14-086. http://www.nsf.gov/news/news_summ.jsp?cntn_id=132053&org=CISE&from=news
JianOuyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, Song Jiang. SDA: Software-Defined Accelerator for Large-Scale DNN Systems. In Proceedings of the HotChips26, Cupertino, CA, August 2014.
Spark Machine Learning Library. http://spark.apache.org/docs/1.1.0/mllib-guide.html

Reference for CFAR Phase 2 Proposal: Accelerator-Rich Datacenters

Main menu