CLC number: TP391.4
On-line Access: 2025-07-02
Received: 2024-02-06
Revision Accepted: 2025-07-02
Crosschecked: 2024-07-23
Cited: 0
Clicked: 981
Adnan OZSOY, Mengu NAZLI, Onur CANKUR, Cagri SAHIN. CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(6): 877-895.
@article{title="CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators",
author="Adnan OZSOY, Mengu NAZLI, Onur CANKUR, Cagri SAHIN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="6",
pages="877-895",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400091"
}
%0 Journal Article
%T CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators
%A Adnan OZSOY
%A Mengu NAZLI
%A Onur CANKUR
%A Cagri SAHIN
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 6
%P 877-895
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400091
TY - JOUR
T1 - CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators
A1 - Adnan OZSOY
A1 - Mengu NAZLI
A1 - Onur CANKUR
A1 - Cagri SAHIN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 6
SP - 877
EP - 895
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400091
Abstract: This study presents a parallel version of the string matching algorithms research tool (SMART) library, implemented on NVIDIA’s compute unified device architecture (CUDA) platform, and uses general-purpose computing on graphics processing unit (GPGPU) programming concepts to enhance performance and gain insight into the parallel versions of these algorithms. We have developed the CUDA-enhanced SMART (CUSMART) library, which incorporates parallelized iterations of 64 string matching algorithms, leveraging the CUDA application programming interface. The performance of these algorithms has been assessed across various scenarios to ensure a comprehensive and impartial comparison, allowing for the identification of their strengths and weaknesses in specific application contexts. We have explored and established optimization techniques to gauge their influence on the performance of these algorithms. The results of this study highlight the potential of GPGPU computing in string matching applications through the scalability of algorithms, suggesting significant performance improvements. Furthermore, we have identified the best and worst performing algorithms in various scenarios.
[1]Adams MD, Celniker SE, Holt RA, et al., 2000. The genome sequence of Drosophila melanogaster. Science, 287(5461):2185-2195.
[2]Adey SP, 2013. GPU Accelerated Pattern Matching Algorithm for DNA Sequences to Detect Cancer Using CUDA. MS Thesis, College of Engineering, Pune, India.
[3]Ashkiani S, Amenta N, Owens JD, 2016. Parallel approaches to the string matching problem on the GPU. Proc 28th ACM Symp on Parallelism in Algorithms and Architectures, p.275-285.
[4]Baeza-Yates R, Gonnet GH, 1992. A new approach to text searching. Commun ACM, 35(10):74-82.
[5]Bellekens X, Atkinson RC, Renfrew C, et al., 2013. Investigation of GPU-based pattern matching. Proc 14th Post Graduate Symp on the Convergence of Telecommunications, Networking and Broadcasting, Article 5.
[6]Blelloch GE, 1990. Vector Models for Data-Parallel Computing. MIT Press, Cambridge, USA.
[7]Ceruzzi PE, 2003. A History of Modern Computing. MIT Press, Cambridge, USA.
[8]Cormen TH, Leiserson CE, Rivest RL, et al., 2009. Introduction to Algorithms (3rd Ed.). MIT Press, Cambridge, USA.
[9]Crochemore M, Rytter W, 1994. Text Algorithms. Oxford University Press, Inc., New York, USA.
[10]Faro S, Lecroq T, 2010. The exact string matching problem: a comprehensive experimental evaluation. http://arxiv.org/abs/1012.2547
[11]Faro S, Lecroq T, 2013. The exact online string matching problem. ACM Comput Surv, 45(2):13.
[12]Faro S, Lecroq T, Borzì S, et al., 2016. The string matching algorithms research tool. Proc Prague Stringology Conf, p.99-111.
[13]Feng WC, Cameron K, 2007. The Green500 list: encouraging sustainable supercomputing. Computer, 40(12):50-55.
[14]Girkar M, Polychronopoulos CD, 1995. Extracting task-level parallelism. ACM Trans Program Lang Syst, 17(4):600-634.
[15]Hains D, Cashero Z, Ottenberg M, et al., 2011. Improving CUDASW++, a parallelization of Smith–Waterman for CUDA enabled devices. IEEE Int Symp on Parallel and Distributed Processing Workshops and PhD Forum, p.490-501.
[16]Han TS, Ko SK, Kang J, 2007. Efficient subsequence matching using the longest common subsequence with a dual match index. 5th Int Conf on Machine Learning and Data Mining in Pattern Recognition, p.585-600.
[17]Harris M, 2012. How to Overlap Data Transfers in CUDA C/C++. https://developer.nvidia.com/blog/how-overlap-data-transfers-cuda-cc/ [Accessed on Feb. 3, 2024].
[18]He LT, Fang BX, Sui J, 2005. The wide window string matching algorithm. Theor Comput Sci, 332(1-3):391-404.
[19]Horspool RN, 1980. Practical fast searching in strings. Softw Pract Exp, 10(6):501-506.
[20]Jaiswal M, 2014. Accelerating enhanced Boyer–Moore string matching algorithm on multicore GPU for network security. Int J Comput Appl, 97(1):30-35
[21]Kadhim HA, AbdulRashidx N, 2014. Maximum-shift string matching algorithms. Int Conf on Computer and Information Sciences, p.1-6.
[22]Kouzinopoulos CS, Michailidis PD, Margaritis KG, 2015. Multiple string matching on a GPU using CUDAs. Scalable Comput, 16(2):121-137.
[23]Lee CL, Lin YS, Chen YC, 2015. A hybrid CPU/GPU pattern-matching algorithm for deep packet inspection. PLoS ONE, 10(10): e0139301.
[24]Ligowski L, Rudnicki W, 2009. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. IEEE Int Symp on Parallel & Distributed Processing, p.1-8.
[25]Lin CY, Och FJ, 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. Proc 42nd Annual Meeting of the Association for Computational Linguistics, p.605-612.
[26]Lin DL, Huang TW, 2021. Efficient GPU computation using task graph parallelism. 27th Int Conf on Parallel and Distributed Computing on Euro-Par: Parallel Processing, p.435-450.
[27]Lu SY, Fu KS, 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans Syst Man Cybern, 8(5):381-389.
[28]Mitani Y, Ino F, Hagihara K, 2017. Parallelizing exact and approximate string matching via inclusive scan on a GPU. IEEE Trans Parall Distrib Syst, 28(7):1989-2002.
[29]Nagaveni V, Raju GT, 2014. Various string matching algorithms for DNA sequences to detect breast cancer using CUDA processors. Int J Eng Technol, 14(3):42-47.
[30]Navarro G, Raffinot M, 1998. A bit-parallel approach to suffix automata: fast extended string matching. 9th Annual Symp on Combinatorial Pattern Matching, p.14-33.
[31]NVIDIA, 2019. Record 136 NVIDIA GPU-Accelerated Supercomputers Feature in TOP500 Ranking. https://blogs.nvidia.com/blog/record-gpu-accelerated-supercomputers-top500/ [Accessed on Feb. 3, 2024].
[32]Peng JF, Chen H, 2010. CUgrep: a GPU-based high performance multi-string matching system. Proc 2nd Int Conf on Future Computer and Communication, p.77-81.
[33]Petrakis EGM, 1993. Image Representation, Indexing and Retrieval Based on Spatial Relationships and Properties of Objects. PhD Thesis, University of Crete, Crete, Greece.
[34]Pungila C, Negru V, 2012. A highly-efficient memory-compression approach for GPU-accelerated virus signature matching. 15th Int Conf on Information Security, p.354-369.
[35]Quinn MJ, 2004. Parallel Programming in C with MPI and OpenMP. McGraw-Hill Higher Education, Boston, USA.
[36]Ramos-Frías R, Vargas-Lombardo M, 2017. A review of string matching algorithms and recent implementations using GPU. Int J Secur Appl, 11(6):69-76.
[37]Rasool A, Khare N, 2012. Parallelization of KMP string matching algorithm on different SIMD architectures: multi-core and GPGPUs. Int J Comput Appl, 49(11):26-28.
[38]Sellis TK, 1988. Multiple-query optimization. ACM Trans Database Syst, 13(1):23-52.
[39]Sharma J, Singh M, 2015. CUDA based Rabin-Karp pattern matching for deep packet inspection on a multicore GPU. Int J Comput Netw Inform Secur, 7(10):70-77.
[40]Subhlok J, Stichnoth JM, O’Hallaron DR, et al., 1993. Exploiting task and data parallelism on a multicomputer. Proc 4th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming, p.13-22.
[41]Tian XX, Song YL, Wang XL, et al., 2012. Shortest path based potential common friend recommendation in social networks. 2nd Int Conf on Cloud and Green Computing, p.541-548.
[42]Tran NP, Lee M, 2013. High performance string matching for security applications. Int Conf on ICT for Smart Society, p.1-5.
[43]Tran NP, Lee M, Hong S, et al., 2012. Memory efficient parallelization for Aho–Corasick algorithm on a GPU. IEEE 14th Int Conf on High Performance Computing and Communication & IEEE 9th Int Conf on Embedded Software and Systems, p.432-438.
[44]Weiner P, 1973. Linear pattern matching algorithms. 14th Annual Symp on Switching and Automata Theory, p.1-11.
[45]Xu KF, Cui WK, Hu Y, et al., 2013. Bit-parallel multiple approximate string matching based on GPU. Proc Comput Sci, 17:523-529.
[46]Yao ACC, 1979. The complexity of pattern matching for a random string. SIAM J Comput, 8(3):368-387.
[47]Yong KK, Karuppiah EK, 2013. Hash match on GPU. IEEE Conf on Open Systems, p.150-155.
[48]Zha XY, Sahni S, 2013. GPU-to-GPU and host-to-host multipattern string matching on a GPU. IEEE Trans Comput, 62(6):1156-1169.
[49]Ziv J, Lempel A, 1977. A universal algorithm for sequential data compression. IEEE Trans Inform Theory, 23(3):337-343.
Open peer comments: Debate/Discuss/Question/Opinion
<1>