内容推荐 "《融合数字电路与存内计算的高能效神经网络处理器(英文版)》从纯数字电路和融合存内计算的高能效神经网络处理器两个角度开展了四项主要的研究工作。 在数字电路神经网络处理器层面,一方面针对传统架构数据复用优化不充分的问题,提出了针对特定卷积核优化的卷积神经网络处理器 KOP3。另一方面,针对不规则稀疏网络压缩技术引起的显著额外功耗面积开销,采用结构化频域压缩算法 CirCNN,提出整体并行-比特串行的 FFT 电路、低功耗分块转置 TRAM 和频域二维数据复用阵列,以规则的方式压缩了存储和计算量。设计并流片验证的 STICKER-T芯片实现了面积效率和能量效率的提升。在融合数字电路与存内计算的神经网络处理器层面,融合了数字电路的灵活性和存内计算 IP 的高能效特性,进一步提升能量效率。一方面通过分块结构化权重稀疏与激活值动态稀疏、核心内/外高效数据复用与网络映射策略、支持动态关断 ADC 的存内计算 IP,设计流片了存内计算系统芯片 STICKER-IM,在存内计算芯片中实现了稀疏压缩技术。另一方面,进一步针对现有工作与大模型实际应用之间的差距,指出了大模型权重更新引起的性能下降和稀疏利用不充分等问题,提出了组相联分块稀疏电路、乒乓存内计算电路和可调采样精度 ADC 技术。设计并流片验证的STICKER-IM2 芯片考虑了存内计算的权重更新代价,实现了 ImageNet 数据集上的高能效和较高准确率验证。" 目录 Contents
1 Introduction\t1 1.1 Research Background and Significance\t1 1.1.1 Development Trends of Neural Network\t1 1.1.2 Requirements of NN Processor\t2 1.1.3 Energy-Efficient NN Processors\t4 1.2 Summary of the Research Work\t6 1.2.1 Overall Framework of the Research Work\t6 1.2.2 Main Contributions of This Book\t7 1.3 Overall Structure of This Book\t8 References\t9 2 Basics and Research Status of Neural Network Processors\t13 2.1 Basics of Neural Network Algorithms\t13 2.2 Basics of Neural Network Processors\t16 2.3 Research Status of Digital-Circuits-Based NN Processors\t18 2.3.1 Data Reuse\t18 2.3.2 Low-Bit Quantization\t20 2.3.3 NN Model Compression and Sparsity\t21 2.3.4 Summary of Digital-Circuits-Based NN Processors\t23 2.4 Research Status of CIM NN Processors\t23 2.4.1 CIM Principle\t24 2.4.2 CIM Devices\t25 2.4.3 CIM Circuits\t26 2.4.4 CIM Macro\t27 2.4.5 Summary of CIM NN Processors\t28 2.5 Summary of This Chapter\t28 References\t29 3 Energy-Efficient NN Processor by Optimizing Data Reuse for Specific Convolutional Kernels\t33 3.1 Introduction\t33 3.2 Previous Data Reuse Methods and the Constraints\t33 3.3 The KOP3 Processor Optimized for Specific Convolutional Kernels\t35 3.4 Processing Array Optimized for Specific Convolutional Kernels\t36 3.5 Local Memory Cyclic Access Architecture and Scheduling Strategy\t39 3.6 Module-Level Parallel Instruction Set and the Control Circuits\t40 3.7 Experimental Results\t41 3.8 Conclusion\t44 References\t45 4 Optimized Neural Network Processor Based on Frequency-Domain Compression Algorithm\t47 4.1 Introduction\t47 4.2 The Limitations of Irregular Sparse Optimization and CirCNN Frequency-Domain Compression Algorithm\t47 4.3 Frequency-Domain NN Processor STICKER-T\t50 4.4 Global-Parallel Bit-Serial FFT Circuits\t52 4.5 Frequency-Domain 2D Data-Reuse MAC Array\t55 4.6 Small-Area Low-Power Block-Wise TRAM\t59 4.7 Chip Measurement Results and Comparison\t62 4.8 Summary of This Chapter\t69 References\t69 5 Digital Circuits and CIM Integrated NN Processor\t71 5.1 Introduction\t71 5.2 The Advantage of CIM Over Pure Digital Circuits\t71 5.3 Design Challenges for System-Level CIM Chips\t74 5.4 Sparse CIM Processor STICKER-IM\t78 5.5 Structural Block-Wise Weight Sparsity and Dynamic Activation Sparsity\t79 5.6 Flexible Mapping and Scheduling and Intra/Inter-Macro Data Reuse\t81 5.7 Energy-Efficient CIM Macro with Dynamic ADC Power-Off\t85 5.8 Chip Measurement Results and Comparison\t88 5.9 Summary of This Chapter\t92 References\t93 6 A “Digital+CIM” Processor Supporting Large-Scale NN Models\t95 6.1 Introduction\t95 6.2 The Challenges of System-Level CIM Chips to Support Large-Scale NN Models\t95 6.3 “Digital+CIM” NN Processor STICKER-IM2\t97 6.4 Set-Associate Block-Wise Sparse Zero-Skipping Circuits\t98 6.5 Ping-Pong CIM and Weight Update Architecture\t100 6.6 Ping-Pong CIM Macro with Dynamic ADC Precision\t103 6.7 Chip Measurement Results and Comparison\t104 6.8 Summary of This Chapter\t112 References\t112 7 Summary and Prospect\t115 7.1 Summary of This Book\t115 7.2 Prospect of This Book\t117 |