CLC number:
On-line Access: 2024-06-04
Received: 2024-02-09
Revision Accepted: 2024-06-04
Crosschecked: 2024-03-17
Cited: 0
Clicked: 570
Citations: Bibtex RefMan EndNote GB/T7714
Xiaoyun WANG, Xiaodong DUAN, Kehan YAO, Tao SUN, Peng LIU, Hongwei YANG, Zhiqiang LI. Computing-aware network (CAN): a systematic design of computing and network convergence[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(5): 633-644.
@article{title="Computing-aware network (CAN): a systematic design of computing and network convergence",
author="Xiaoyun WANG, Xiaodong DUAN, Kehan YAO, Tao SUN, Peng LIU, Hongwei YANG, Zhiqiang LI",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="25",
number="5",
pages="633-644",
year="2024",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400098"
}
%0 Journal Article
%T Computing-aware network (CAN): a systematic design of computing and network convergence
%A Xiaoyun WANG
%A Xiaodong DUAN
%A Kehan YAO
%A Tao SUN
%A Peng LIU
%A Hongwei YANG
%A Zhiqiang LI
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 5
%P 633-644
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400098
TY - JOUR
T1 - Computing-aware network (CAN): a systematic design of computing and network convergence
A1 - Xiaoyun WANG
A1 - Xiaodong DUAN
A1 - Kehan YAO
A1 - Tao SUN
A1 - Peng LIU
A1 - Hongwei YANG
A1 - Zhiqiang LI
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 5
SP - 633
EP - 644
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400098
Abstract: The coverage of network resources is increasingly extensive, and computing resources have likewise gradually become fundamental infrastructures, providing ubiquitous computing services. However, in wide area networks (WANs), the underlying network and computing resources are not closely investigated or co-designed, and there are still problems reflected in slow computing service scheduling, inflexible data distribution, and inefficient data transmission. This paper proposes the architectural design of a computing-aware network (CAN), with the core contribution of introducing the awareness plane to collect, manage, and synthesize computing and network information. In this way, the awareness plane, control plane, and data plane are formed as a closed-loop control system to improve the overall system’s awareness capability, decision-making capability, and data forwarding functionality. To enable the CAN architecture, three key technologies are proposed as follows: computing-aware traffic steering (CATS), elastic broadcast, and wide-area high-throughput transmission. The paper takes artificial intelligence (AI) model training, inference, and offline parameter transmission as examples to show the applicability of CAN and identifies some future research directions.
[1]Ali-Eldin A, Wang B, Shenoy P, 2021. The hidden cost of the edge: a performance comparison of edge and cloud latencies. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, Article 23.
[2]Arkko J, Hardie T, Pauly T, et al., 2023. Considerations on Application-Network Collaboration Using Path Signals. RFC9419, RFC.
[3]Armbrust M, Fox A, Griffith R, et al., 2010. A view of cloud computing. Commun ACM, 53(4):50-58.
[4]Arun V, Balakrishnan H, 2018. Copa: practical delay-based congestion control for the Internet. Proc 15th USENIX Symp on Networked Systems Design and Implementation, p.329-342.
[5]Baldantoni L, Lundqvist H, Karlsson G, 2004. Adaptive end-to-end FEC for improving TCP performance over wireless links. Proc IEEE Int Conf on Communications, p.4023-4027.
[6]Cardwell N, Cheng YC, Gunn CS, et al., 2016. BBR: congestion-based congestion control: measuring bottleneck bandwidth and round-trip propagation time. Queue, 14(5):20-53.
[7]Chan E, Heimlich M, Purkayastha A, et al., 2007. Collective communication: theory, practice, and experience. Concurr Comp Pract Exper, 19(13):1749-1783.
[8]Chunduri S, Parker S, Balaji P, et al., 2018. Characterization of MPI usage on a production supercomputer. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.386-400.
[9]Clos C, 1953. A study of non-blocking switching networks. Bell Syst Tech J, 32(2):406-424.
[10]Dolganow A, Przygienda T, Aldrin S, et al., 2017. Multicast Using Bit Index Explicit Replication (BIER). RFC8279, RFC.
[11]Dunbar L, Malis A, Jacquenet C, et al., 2024. Dynamic Networks to Hybrid Cloud DCs: Problems and Mitigation Practices-Draft-Ietf-Rtgwg-Net2cloud-Problem-Statement-37. IETF.
[12]Gibson D, Hariharan H, Lance E, et al., 2022. Aquila: a unified, low-latency fabric for datacenter networks. Proc 19th USENIX Symp on Networked Systems Design and Implementation.
[13]Ha S, Rhee I, Xu LS, 2008. CUBIC: a new TCP-friendly high-speed TCP variant. ACM SIGOPS Oper Syst Rev, 42(5):64-74.
[14]IEA, 2024. Electricity 2024: Analysis and Forecast to2026. Available from https://www.iea.org/reports/electricity [Accessed on Feb. 5, 2024].
[15]InfiniBand Trade Association, 2014. Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.2 Annex A17: RoCEv2 (IP Routable RoCE).
[16]ITU-T, 2021. Y.2501: Framework and Architecture of Computing Power Network. Draft Recommendation ITU-T. Available from https://handle.itu.int/11.1002/1000/14768 [Accessed on Feb. 5, 2024].
[17]Kaj I, Olsén J, 2001. Throughput modeling and simulation for single connection TCP-Tahoe. Teletraffic Sci Eng, 4:705-718.
[18]Kind A, Dimitropoulos X, Denazis S, et al., 2008. Advanced network monitoring brings life to the awareness plane. IEEE Commun Mag, 46(10):140-146.
[19]Koop MJ, Jones T, Panda DK, 2007. Reducing connection memory requirements of MPI for InfiniBand clusters: a message coalescing approach. Proc 7th IEEE Int Symp on Cluster Computing and the Grid, p.495-504.
[20]Kurose JF, 2001. Computer Networking: a Top-Down Approach. Pearson, UK.
[21]Li WX, Zhang JY, Liu YF, et al., 2024. Cepheus: accelerating datacenter applications with high-performance RoCE-capable multicast. Proc IEEE Int Symp on High-Performance Computer Architecture.
[22]Liu B, Mao JW, Xu L, et al., 2021. CFN-dyncast: load balancing the edges via the network. Proc IEEE Wireless Communications and Networking Conf Workshops, p.1-6.
[23]Mao YY, You CS, Zhang J, et al., 2017. A survey on mobile edge computing: the communication perspective. IEEE Commun Surv Tutor, 19(4):2322-2358.
[24]Rekhter Y, Li T, Hares S, 2006. A Border Gateway Protocol 4 (BGP-4). RFC-4271, RFC.
[25]Savage D, Ng J, Moore S, et al., 2016. Cisco’s Enhanced Interior Gateway Routing Protocol (EIGRP). RFC7868, RFC.
[26]Singhvi A, Akella A, Gibson D, et al., 2020. 1RMA: re-envisioning remote memory access for multi-tenant datacenters. Proc Annual Conf of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, p.708-721.
[27]Stoica I, Shenker S, 2021. From cloud computing to sky computing. Proc Workshop on Hot Topics in Operating Systems, p.26-32.
[28]Su JS, Zhao BK, Dai Y, et al., 2022. Technology trends in large-scale high-efficiency network computing. Front Inform Technol Electron Eng, 23(12):1733-1746.
[29]Tang XY, Cao C, Wang YX, et al., 2021. Computing power network: the architecture of convergence of computing and networking towards 6G requirement. China Commun, 18(2):175-185.
[30]Xiao JM, Tillo T, Zhao Y, 2013. Real-time video streaming using randomized expanding Reed–Solomon code. IEEE Trans Circ Syst Video Technol, 23(11):1825-1836.
[31]Yao HP, Mai TL, Jiang CX, et al., 2019. AI routers & network mind: a hybrid machine learning paradigm for packet routing. IEEE Comput Intell Mag, 14(4):21-30.
[32]Yao KH, Trossen D, Boucadair M, et al., 2024. Computing-Aware Traffic Steering (CATS) Problem Statement, Use Cases, and Requirements: Draft-Ietf-Cats-Usecases-Requirements-02. IETF.
[33]Yuan BH, He YJ, Davis J, et al., 2022. Decentralized training of foundation models in heterogeneous environments. Proc 36th Int Conf on Neural Information Processing Systems.
[34]Zong MY, Krishnamachari B, 2022. A survey on GPT-3. https://arxiv.org/abs/2212.00857
Open peer comments: Debate/Discuss/Question/Opinion
<1>