
CLC number: TP389.1
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-03-02
Cited: 0
Clicked: 3215
Citations: Bibtex RefMan EndNote GB/T7714
Qiankun WANG, Xingchen LI, Bingzhe WU, Ke YANG, Wei HU, Guangyu SUN, Yuchao YANG. COPPER: a combinatorial optimization problem solver with processing-in-memory architecture[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2200463 @article{title="COPPER: a combinatorial optimization problem solver with processing-in-memory architecture", %0 Journal Article TY - JOUR
COPPER:具有存内计算架构的组合优化问题求解器1北京大学软件与微电子学院,中国北京市,100871 2北京大学计算机学院,中国北京市,100871 3北京大学集成电路学院,中国北京市,100871 4腾讯人工智能实验室,中国深圳市,518057 5福州大学物理与信息工程学院,中国福州市,350116 6北京集成电路高精尖创新中心,中国北京市,100871 7北京智源人工智能研究院,中国北京市,100080 摘要:组合优化问题(combinatorial optimization problem,COP)是一类在离散空间中寻找最优解的数学问题,具有广泛的应用。然而,许多组合优化问题是NP完全的,随着问题规模的增加,解决问题所需的时间急剧增加,这促使研究人员寻求更快速的解决方法,即使解不一定是最优的,如近似算法、启发式算法和机器学习算法等。一些先前的工作基于 Hopfield神经网络提出了混沌模拟退火(chaotic simulated annealing,CSA),并取得了良好的表现。然而,CSA的计算模式对当前的通用处理器并不友好,且没有专用的计算硬件。为了高效地执行CSA,我们提出一种软硬件联合的设计方案。在软件方面,我们使用适当的位宽对权重和输出进行量化,并修改那些不适合硬件实现的计算模式。在硬件方面,我们设计了一种基于忆阻器的专用存内计算硬件架构COPPER。COPPER能够高效地运行修改后的量化CSA算法,并支持流水线以获得进一步加速。结果表明,COPPER在执行CSA算法时,速度和能耗方面都十分出色。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Chen JR, Wu HQ, Gao B, et al., 2020. A parallel multibit programing scheme with high precision for RRAM-based neuromorphic systems. IEEE Trans Electron Dev, 67(5):2213-2217. ![]() [2]Chen LN, Aihara K, 1995. Chaotic simulated annealing by a neural network model with transient chaos. Neur Netw, 8(6):915-930. ![]() [3]Chi P, Li SC, Xu C, et al., 2016. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput Archit News, 44(3):27-39. ![]() [4]Cipra BA, 1987. An introduction to the Ising model. Am Math Mon, 94(10):937-959. ![]() [5]Hopfield JJ, 1982. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA, 79(8):2554-2558. ![]() [6]Hopfield JJ, Tank DW, 1985. “Neural” computation of decisions in optimization problems. Biol Cybern, 52(3):141-152. ![]() [7]Hung JM, Huang YH, Huang SP, et al., 2022. An 8-Mb DC-current-free binary-to-8b precision ReRAM nonvolatile computing-in-memory macro using time-space-readout with 1286.4-21.6TOPS/W for edge-AI devices. IEEE Int Solid-State Circuits Conf, p.1-3. ![]() [8]Johnson MW, Amin MHS, Gildert S, et al., 2011. Quantum annealing with manufactured spins. Nature, 473(7346):194-198. ![]() [9]Karp RM, 1972. Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (Eds.), Complexity of Computer Computations. Springer, New York, USA, p.85-103. ![]() [10]Li XC, Yuan ZH, Sun GY, et al., 2022. Tailor: removing redundant operations in memristive analog neural network accelerators. Proc 59th ACM/IEEE Design Automation Conf, p.1009-1014. ![]() [11]Lucas A, 2014. Ising formulations of many NP problems. Front Phys, 2:5. ![]() [12]Mirhoseini A, Goldie A, Yazgan M, et al., 2020. Chip placement with deep reinforcement learning. https://arxiv.org/abs/2004.10746 ![]() [13]Shafiee A, Nag A, Muralimanohar N, et al., 2016. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput Archit News, 44(3):14-26. ![]() [14]Shin SW, Smith G, Smolin JA, et al., 2014. How “quantum” is the D-wave machine? https://arxiv.org/abs/1401.7087 ![]() [15]Song LH, Qian XH, Li H, et al., 2017. PipeLayer: a pipelined ReRAM-based accelerator for deep learning. IEEE Int Symp on High Performance Computer Architecture, p.541-552. ![]() [16]Takemoto T, Hayashi M, Yoshimura C, et al., 2019. 2.6 A 2×30k-spin multichip scalable annealing processor based on a processing-in-memory approach for solving large-scale combinatorial optimization problems. IEEE Int Solid-State Circuits Conf, p.52-54. ![]() [17]Takemoto T, Yamamoto K, Yoshimura C, et al., 2021. 4.6 A 144Kb annealing system composed of 9×16Kb annealing processor chips with scalable chip-to-chip connections for large-scale combinatorial optimization problems. IEEE Int Solid-State Circuits Conf, p.64-66. ![]() [18]Vinyals O, Fortunato M, Jaitly N, 2015. Pointer networks. Proc 28th Int Conf on Neural Information Processing Systems, p.2692-2700. ![]() [19]Yamamoto K, Ando K, Mertig N, et al., 2020. 7.3 STATICA: a 512-spin 0.25M-weight full-digital annealing processor with a near-memory all-spin-updates-at-once architecture for combinatorial optimization with complete spin-spin interactions. IEEE Int Solid-State Circuits Conf, p.138-140. ![]() [20]Yamaoka M, Yoshimura C, Hayashi M, et al., 2016. A 20k-spin Ising chip to solve combinatorial optimization problems with CMOS annealing. IEEE J Sol-State Circ, 51(1):303-309. ![]() [21]Yang K, Duan QX, Wang YH, et al., 2020. Transiently chaotic simulated annealing based on intrinsic nonlinearity of memristors for efficient solution of optimization problems. Sci Adv, 6(33):eaba9901. ![]() [22]Zhu ZH, Sun HB, Lin YJ, et al., 2019. A configurable multi-precision CNN computing framework based on single bit RRAM. Proc 56th Annual Design Automation Conf, Article 56. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>