CLC number: TP302
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-04-01
Cited: 0
Clicked: 2894
Citations: Bibtex RefMan EndNote GB/T7714
Yun TENG, Zhiyue LI, Jing HUANG, Guangyan ZHANG. ShortTail: taming tail latency for erasure-code-based in-memory systems[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(11): 1646-1657.
@article{title="ShortTail: taming tail latency for erasure-code-based in-memory systems",
author="Yun TENG, Zhiyue LI, Jing HUANG, Guangyan ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="11",
pages="1646-1657",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100566"
}
%0 Journal Article
%T ShortTail: taming tail latency for erasure-code-based in-memory systems
%A Yun TENG
%A Zhiyue LI
%A Jing HUANG
%A Guangyan ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 11
%P 1646-1657
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100566
TY - JOUR
T1 - ShortTail: taming tail latency for erasure-code-based in-memory systems
A1 - Yun TENG
A1 - Zhiyue LI
A1 - Jing HUANG
A1 - Guangyan ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 11
SP - 1646
EP - 1657
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100566
Abstract: in-memory systems with erasure coding (EC) enabled are widely used to achieve high performance and data availability. However, as the scale of clusters grows, the server-level fail-slow problem is becoming increasingly frequent, which can create long tail latency. The influence of long tail latency is further amplified in EC-based systems due to the synchronous nature of multiple EC sub-operations. In this paper, we propose an EC-enabled in-memory storage system called ShortTail, which can achieve consistent performance and low latency for both reads and writes. First, ShortTail uses a lightweight request monitor to track the performance of each memory node and identify any fail-slow node. Second, ShortTail selectively performs degraded reads and redirected writes to avoid accessing fail-slow nodes. Finally, ShortTail posts an adaptive write strategy to reduce write amplification of small writes. We implement ShortTail on top of Memcached and compare it with two baseline systems. The experimental results show that ShortTail can reduce the P99 tail latency by up to 63.77%; it also brings significant improvements in the median latency and average latency.
[1]Abebe M, Daudjee K, Glasbergen B, et al., 2018. EC-Store: bridging the gap between storage and latency in distributed erasure coded systems. Proc IEEE 38th Int Conf on Distributed Computing System, p.255-266.
[2]Andersen DG, Balakrishnan H, Kaashoek MF, et al., 2005. Improving web availability for clients with MONET. Proc 2nd Symp on Networked Systems Design and Implementation, p.115-128.
[3]Balaji SB, Krishnan MN, Vajha M, et al., 2018. Erasure coding for distributed storage: an overview. Sci China Inform Sci, 61(10):100301.
[4]Cooper BF, Silberstein A, Tam E, et al., 2010. Benchmarking cloud serving systems with YCSB. Proc 1st ACM Symp on Cloud Computing, p.143-154.
[5]Dimakis AG, Godfrey PB, Wu YN, et al., 2010. Network coding for distributed storage systems. IEEE Trans Inform Theory, 56(9):4539-4551.
[6]Dragojević A, Narayanan D, Hodson O, et al., 2014. FaRM: fast remote memory. Proc 11th USENIX Conf on Networked Systems Design and Implementation, p.401-414.
[7]Dragojević A, Narayanan D, Nightingale EB, et al., 2015. No compromises: distributed transactions with consistency, availability, and performance. Proc 25th Symp on Operating Systems Principles, p.54-70.
[8]Fan B, Andersen DG, Kaminsky M, 2013. MemC3: compact and concurrent MemCache with dumber caching and smarter hashing. Proc 10th USENIX Conf on Networked Systems Design and Implementation, p.371-384.
[9]Ford D, Labelle F, Popovici FI, et al., 2010. Availability in globally distributed storage systems. Proc 9th USENIX Conf on Operating Systems Design and Implementation, p.61-74.
[10]Ganjam A, Jiang JC, Liu X, et al., 2015. C3: Internet-scale control plane for video quality optimization. Proc 12th USENIX Conf on Networked Systems Design and Implementation, p.131-144.
[11]Gunawi HS, Suminto RO, Sears R, et al., 2018. Fail-slow at scale: evidence of hardware performance faults in large production systems. Proc 16th USENIX Conf on File and Storage Technologies, p.1-14.
[12]Hu YC, Niu D, 2016. Reducing access latency in erasure coded cloud storage with local block migration. Proc 35th Annual IEEE Int Conf on Computer Communications, p.1-9.
[13]Hu YC, Wang YS, Liu B, et al., 2017. Latency reduction and load balancing in coded storage systems. Symp on Cloud Computing, p.365-377.
[14]Hu YC, Cheng LF, Yao QR, et al., 2021. Exploiting combined locality for wide-stripe erasure coding in distributed storage. Proc 19th USENIX Conf on File and Storage Technologies, p.233-248.
[15]Huang C, Simitci H, Xu YK, et al., 2012. Erasure coding in windows azure storage. USENIX Conf on Annual Technical Conf, p.2.
[16]Huang P, Guo CX, Zhou LD, et al., 2017. Gray failure: the Achilles' heel of cloud-scale systems. Proc 16th Workshop on Hot Topics in Operating Systems, p.150-155.
[17]Intel, 2015. Intel Announces Optane Storage Brand for 3D XPoint Products. https://www.anandtech.com/show/9541/intel-announces-optane-storage-brand-for-3d-xpoint-products [Accessed on Nov. 8, 2021].
[18]Kalia A, Kaminsky M, Andersen DG, 2014. Using RDMA efficiently for key-value services. SIGCOMM Comput Commun Rev, 44(4):295-306.
[19]Kalia A, Kaminsky M, Andersen DG, 2016. FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. Proc 12th USENIX Symp on Operating Systems Design and Implementation, p.185-201.
[20]Lamport L, 1998. The part-time parliament. ACM Trans Comput Syst, 16(2):133-169.
[21]Li C, Porto D, Clement A, et al., 2012. Making geo-replicated systems fast as possible, consistent when necessary. Proc 10th USENIX Conf on Operating Systems Design and Implementation, p.265-278.
[22]Li XL, Li RH, Lee PPC, et al., 2019. OpenEC: toward unified and configurable erasure coding management in distributed storage systems. Proc 17th USENIX Conf on File and Storage Technologies, p.331-344.
[23]Lin SY, Gong GW, Shen ZR, et al., 2021. Boosting full-node repair in erasure-coded storage. USENIX Annual Technical Conf, p.641-655.
[24]Narayanan D, Donnelly A, Rowstron A, 2008. Write off-loading: practical power management for enterprise storage. ACM Trans Storage, 4(3):10.
[25]Nishtala R, Fugal H, Grimm S, et al., 2013. Scaling memcache at Facebook. Proc 10th USENIX Symp on Networked Systems Design and Implementation, p.385-398.
[26]Ovsiannikov M, Rus S, Reeves D, et al., 2013. The quantcast file system. Proc VLDB Endow, 6(11):1092-1101.
[27]Pagh R, Rodler FF, 2004. Cuckoo hashing. J Algor, 51(2):122-144.
[28]Pamies-Juarez L, Blagojevic F, Mateescu R, et al., 2016. Opening the chrysalis: on the real repair performance of MSR codes. Proc 14th USENIX Conf on File and Storage Technologies, p.81-94.
[29]Plank JS, Huang C, 2013. Tutorial: erasure coding for storage applications. Proc 11th USENIX Conf on File and Storage Technologies.
[30]Poke M, Hoefler T, 2015. DARE: high-performance state machine replication on RDMA networks. Proc 24th Int Symp on High-Performance Parallel and Distributed Computing, p.107-118.
[31]Rashmi KV, Nakkiran P, Wang JY, et al., 2015. Having your cake and eating it too: jointly optimal erasure codes for I/O, storage and network-bandwidth. Proc 13th USENIX Conf on File and Storage Technologies, p.81-94.
[32]Rashmi KV, Chowdhury M, Kosaian J, et al., 2016. EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.401-417.
[33]Reed IS, Solomon G, 1960. Polynomial codes over certain finite fields. J Soc Ind Appl Math, 8(2):300-304.
[34]Shah NB, Lee K, Ramchandran K, 2016. When do redundant requests reduce latency? IEEE Trans Commun, 64(2):715-722.
[35]Stewart C, Chakrabarti A, Griffith R, 2013. Zoolander: efficiently meeting very strict, low-latency SLOs. Proc 10th Int Conf on Autonomic Computing, p.265-277.
[36]Uluyol M, Huang A, Goel A, et al., 2020. Near-optimal latency versus cost tradeoffs in geo-distributed storage. Proc 17th USENIX Symp on Networked Systems Design and Implementation, p.157-180.
[37]Vajha M, Ramkumar V, Puranik B, et al., 2018. Clay codes: moulding MDS codes to yield an MSR code. Proc 16th USENIX Conf on File and Storage Technologies, p.139-154.
[38]Weil SA, Brandt SA, Miller EL, et al., 2006. Ceph: a scalable, high-performance distributed file system. Proc 7th Symp on Operating Systems Design and Implementation, p.307-320.
[39]Wilcox-O'Hearn Z, Warner B, 2008. Tahoe: the least-authority filesystem. Proc 4th ACM Int Workshop on Storage Security and Survivability, p.21-26.
[40]Wilkes J, Golding R, Staelin C, et al., 1996. The HP AutoRAID hierarchical storage system. ACM Trans Comput Syst, 14(1):108-136.
[41]Wu SZ, Mao B, Chen XL, et al., 2016. LDM: log disk mirroring with improved performance and reliability for SSD-based disk arrays. ACM Trans Storage, 12(4):22.
Open peer comments: Debate/Discuss/Question/Opinion
<1>