Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2023 Vol.24 No.5 P.731-741

http://doi.org/10.1631/FITEE.2200463

COPPER: a combinatorial optimization problem solver with processing-in-memory architecture

Author(s): Qiankun WANG, Xingchen LI, Bingzhe WU, Ke YANG, Wei HU, Guangyu SUN, Yuchao YANG
Affiliation(s): 1. School of Software & Microelectronics, Peking University, Beijing 100871, China more
Corresponding email(s): gsun@pku.edu.cn
Key Words: Combinatorial optimization, Chaotic simulated annealing, Processing-in-memory

Share this article to： More <<< Previous Article \|Next Article >>>

Qiankun WANG, Xingchen LI, Bingzhe WU, Ke YANG, Wei HU, Guangyu SUN, Yuchao YANG. COPPER: a combinatorial optimization problem solver with processing-in-memory architecture[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(5): 731-741.

@article{title="COPPER: a combinatorial optimization problem solver with processing-in-memory architecture",
author="Qiankun WANG, Xingchen LI, Bingzhe WU, Ke YANG, Wei HU, Guangyu SUN, Yuchao YANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="24",
number="5",
pages="731-741",
year="2023",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200463"
}

%0 Journal Article
%T COPPER: a combinatorial optimization problem solver with processing-in-memory architecture
%A Qiankun WANG
%A Xingchen LI
%A Bingzhe WU
%A Ke YANG
%A Wei HU
%A Guangyu SUN
%A Yuchao YANG
%J Frontiers of Information Technology & Electronic Engineering
%V 24
%N 5
%P 731-741
%@ 2095-9184
%D 2023
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200463

TY - JOUR
T1 - COPPER: a combinatorial optimization problem solver with processing-in-memory architecture
A1 - Qiankun WANG
A1 - Xingchen LI
A1 - Bingzhe WU
A1 - Ke YANG
A1 - Wei HU
A1 - Guangyu SUN
A1 - Yuchao YANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 24
IS - 5
SP - 731
EP - 741
%@ 2095-9184
Y1 - 2023
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200463

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The combinatorial optimization problem (COP), which aims to find the optimal solution in discrete space, is fundamental in various fields. Unfortunately, many COPs are NP-complete, and require much more time to solve as the problem scale increases. Troubled by this, researchers may prefer fast methods even if they are not exact, so approximation algorithms, heuristic algorithms, and machine learning have been proposed. Some works proposed chaotic simulated annealing (CSA) based on the Hopfield neural network and did a good job. However, CSA is not something that current general-purpose processors can handle easily, and there is no special hardware for it. To efficiently perform CSA, we propose a software and hardware co-design. In software, we quantize the weight and output using appropriate bit widths, and then modify the calculations that are not suitable for hardware implementation. In hardware, we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor. COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration. The results show that COPPER can perform CSA remarkably well in both speed and energy.

COPPER：具有存内计算架构的组合优化问题求解器

汪乾坤¹，李星辰^2,3，吴秉哲⁴，杨可³，胡炜⁵，孙广宇^3,6,7，杨玉超³
¹北京大学软件与微电子学院，中国北京市，100871
²北京大学计算机学院，中国北京市，100871
³北京大学集成电路学院，中国北京市，100871
⁴腾讯人工智能实验室，中国深圳市，518057
⁵福州大学物理与信息工程学院，中国福州市，350116
⁶北京集成电路高精尖创新中心，中国北京市，100871
⁷北京智源人工智能研究院，中国北京市，100080
摘要：组合优化问题（combinatorial optimization problem，COP）是一类在离散空间中寻找最优解的数学问题，具有广泛的应用。然而，许多组合优化问题是NP完全的，随着问题规模的增加，解决问题所需的时间急剧增加，这促使研究人员寻求更快速的解决方法，即使解不一定是最优的，如近似算法、启发式算法和机器学习算法等。一些先前的工作基于 Hopfield神经网络提出了混沌模拟退火（chaotic simulated annealing，CSA），并取得了良好的表现。然而，CSA的计算模式对当前的通用处理器并不友好，且没有专用的计算硬件。为了高效地执行CSA，我们提出一种软硬件联合的设计方案。在软件方面，我们使用适当的位宽对权重和输出进行量化，并修改那些不适合硬件实现的计算模式。在硬件方面，我们设计了一种基于忆阻器的专用存内计算硬件架构COPPER。COPPER能够高效地运行修改后的量化CSA算法，并支持流水线以获得进一步加速。结果表明，COPPER在执行CSA算法时，速度和能耗方面都十分出色。

关键词：组合优化问题；混沌模拟退火；存内计算

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Chen JR, Wu HQ, Gao B, et al., 2020. A parallel multibit programing scheme with high precision for RRAM-based neuromorphic systems. IEEE Trans Electron Dev, 67(5):2213-2217.

[2]Chen LN, Aihara K, 1995. Chaotic simulated annealing by a neural network model with transient chaos. Neur Netw, 8(6):915-930.

[3]Chi P, Li SC, Xu C, et al., 2016. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Comput Archit News, 44(3):27-39.

[4]Cipra BA, 1987. An introduction to the Ising model. Am Math Mon, 94(10):937-959.

[5]Hopfield JJ, 1982. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA, 79(8):2554-2558.

[6]Hopfield JJ, Tank DW, 1985. “Neural” computation of decisions in optimization problems. Biol Cybern, 52(3):141-152.

[7]Hung JM, Huang YH, Huang SP, et al., 2022. An 8-Mb DC-current-free binary-to-8b precision ReRAM nonvolatile computing-in-memory macro using time-space-readout with 1286.4-21.6TOPS/W for edge-AI devices. IEEE Int Solid-State Circuits Conf, p.1-3.

[8]Johnson MW, Amin MHS, Gildert S, et al., 2011. Quantum annealing with manufactured spins. Nature, 473(7346):194-198.

[9]Karp RM, 1972. Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (Eds.), Complexity of Computer Computations. Springer, New York, USA, p.85-103.

[10]Li XC, Yuan ZH, Sun GY, et al., 2022. Tailor: removing redundant operations in memristive analog neural network accelerators. Proc 59^th ACM/IEEE Design Automation Conf, p.1009-1014.

[11]Lucas A, 2014. Ising formulations of many NP problems. Front Phys, 2:5.

[12]Mirhoseini A, Goldie A, Yazgan M, et al., 2020. Chip placement with deep reinforcement learning. https://arxiv.org/abs/2004.10746

[13]Shafiee A, Nag A, Muralimanohar N, et al., 2016. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput Archit News, 44(3):14-26.

[14]Shin SW, Smith G, Smolin JA, et al., 2014. How “quantum” is the D-wave machine? https://arxiv.org/abs/1401.7087

[15]Song LH, Qian XH, Li H, et al., 2017. PipeLayer: a pipelined ReRAM-based accelerator for deep learning. IEEE Int Symp on High Performance Computer Architecture, p.541-552.

[16]Takemoto T, Hayashi M, Yoshimura C, et al., 2019. 2.6 A 2×30k-spin multichip scalable annealing processor based on a processing-in-memory approach for solving large-scale combinatorial optimization problems. IEEE Int Solid-State Circuits Conf, p.52-54.

[17]Takemoto T, Yamamoto K, Yoshimura C, et al., 2021. 4.6 A 144Kb annealing system composed of 9×16Kb annealing processor chips with scalable chip-to-chip connections for large-scale combinatorial optimization problems. IEEE Int Solid-State Circuits Conf, p.64-66.

[18]Vinyals O, Fortunato M, Jaitly N, 2015. Pointer networks. Proc 28^th Int Conf on Neural Information Processing Systems, p.2692-2700.

[19]Yamamoto K, Ando K, Mertig N, et al., 2020. 7.3 STATICA: a 512-spin 0.25M-weight full-digital annealing processor with a near-memory all-spin-updates-at-once architecture for combinatorial optimization with complete spin-spin interactions. IEEE Int Solid-State Circuits Conf, p.138-140.

[20]Yamaoka M, Yoshimura C, Hayashi M, et al., 2016. A 20k-spin Ising chip to solve combinatorial optimization problems with CMOS annealing. IEEE J Sol-State Circ, 51(1):303-309.

[21]Yang K, Duan QX, Wang YH, et al., 2020. Transiently chaotic simulated annealing based on intrinsic nonlinearity of memristors for efficient solution of optimization problems. Sci Adv, 6(33):eaba9901.

[22]Zhu ZH, Sun HB, Lin YJ, et al., 2019. A configurable multi-precision CNN computing framework based on single bit RRAM. Proc 56^th Annual Design Automation Conf, Article 56.

Open peer comments: Debate/Discuss/Question/Opinion

<1>