Workshop on Architectures for Intelligent Machines

With the explosion of data creation and uploading across internet of things, hand-held devices and PCs, and cloud and enterprise, there is truly a big opportunity to apply machine learning and deep learning techniques on these terabytes of massive data and deliver breakthroughs in many domains. Deep learning in computer vision, speech recognition, video processing, etc., have sped up advances in many applications from the domains of manufacturing, robotics, business intelligence, autonomous driving, precision medicine, and digital surveillance, to name a few. Traditional machine learning algorithms such as Support Vector Machine, Principal Components Analyses, Alternate Least Squares, K-Means, and Decision Trees are ever present in product recommendations for online users, fraud detection, and financial services. There is a race to design parallel architectures to innovate, cover end-to-end workflows with low time to train while hitting state-of-the-art or higher accuracies without overfitting, low latency inferencing etc., all the while having good TCO, perf/watt and compute and memory efficiencies. Architectural innovation in CPUs, GPUs, FPGAs, ASICs, memories, and on-chip interconnects are needed with utmost urgency by these neural network and mathematical algorithms to attain their latency and accuracy requirements. Mixed and/or low precision arithmetic, high bandwidth stacked DRAMs, systolic array processing, vector extensions in many cores and multi-cores, special neural network instructions and sparse and dense data structures are some of the ways in which GEMM operations, Winograd convolutions, RELUs, fully connected layers etc., are optimally run to achieve expected accuracies and training and inference requirements.