Full Text:   <792>

Summary:  <15>

CLC number: TP391.4

On-line Access: 2025-07-02

Received: 2024-02-06

Revision Accepted: 2025-07-02

Crosschecked: 2024-07-23

Cited: 0

Clicked: 981

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Adnan OZSOY

https://orcid.org/0000-0002-0302-3721

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2025 Vol.26 No.6 P.877-895

http://doi.org/10.1631/FITEE.2400091


CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators


Author(s):  Adnan OZSOY, Mengu NAZLI, Onur CANKUR, Cagri SAHIN

Affiliation(s):  Department of Computer Engineering, Hacettepe University, Ankara 06800, Türkiye; more

Corresponding email(s):   adnan.ozsoy@hacettepe.edu.tr, mengu@hacettepe.edu.tr, ocankur@umd.edu, cagrisahin@gazi.edu.tr

Key Words:  String matching, Parallel programming, Graphics processing unit (GPU) programming, General-purpose computing on GPU (GPGPU), NVIDIA, Compute unified device architecture (CUDA), String matching algorithms research tool (SMART)


Adnan OZSOY, Mengu NAZLI, Onur CANKUR, Cagri SAHIN. CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(6): 877-895.

@article{title="CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators",
author="Adnan OZSOY, Mengu NAZLI, Onur CANKUR, Cagri SAHIN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="6",
pages="877-895",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2400091"
}

%0 Journal Article
%T CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators
%A Adnan OZSOY
%A Mengu NAZLI
%A Onur CANKUR
%A Cagri SAHIN
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 6
%P 877-895
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2400091

TY - JOUR
T1 - CUSMART: effective parallelization of string matching algorithms using GPGPU accelerators
A1 - Adnan OZSOY
A1 - Mengu NAZLI
A1 - Onur CANKUR
A1 - Cagri SAHIN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 6
SP - 877
EP - 895
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2400091


Abstract: 
This study presents a parallel version of the string matching algorithms research tool (SMART) library, implemented on NVIDIA’s compute unified device architecture (CUDA) platform, and uses general-purpose computing on graphics processing unit (GPGPU) programming concepts to enhance performance and gain insight into the parallel versions of these algorithms. We have developed the CUDA-enhanced SMART (CUSMART) library, which incorporates parallelized iterations of 64 string matching algorithms, leveraging the CUDA application programming interface. The performance of these algorithms has been assessed across various scenarios to ensure a comprehensive and impartial comparison, allowing for the identification of their strengths and weaknesses in specific application contexts. We have explored and established optimization techniques to gauge their influence on the performance of these algorithms. The results of this study highlight the potential of GPGPU computing in string matching applications through the scalability of algorithms, suggesting significant performance improvements. Furthermore, we have identified the best and worst performing algorithms in various scenarios.

CUSMART:利用GPGPU加速器有效并行化字符串匹配算法

Adnan OZSOY1, Mengu NAZLI1, Onur CANKUR2, Cagri SAHIN3
1哈西德佩大学计算机工程系,土耳其安卡拉省,06800
2马里兰大学帕克分校计算机科学系,美国马里兰州,20742
3加齐大学计算机工程系,土耳其安卡拉省,06570
摘要:提出一种字符串匹配算法研究工具(SMART)库的并行版本,该版本在NVDIA的统一计算设备架构(CUDA)平台上实现,采用通用图形处理器(GPGPU)编程概念以提升性能及深入了解这些字符匹配算法的并行版本。利用CUDA应用程序编程接口(API)开发了CUDA增强的SMART(CUSMART)库,该库集成了64种字符串匹配算法的并行迭代。为确保全面且公正的比较,在各种场景下评估这些算法的性能,进而识别它们在特定应用场景中的优势和劣势。探索并建立了优化技术,以评估它们对算法性能的影响。该研究的结果通过算法的可扩展性突出了GPGPU计算在字符串匹配应用中的潜力,表明性能有显著提高。此外,确定了不同场景下表现最佳和最差的算法。

关键词:字符串匹配;并行编程;图形处理器(GPU)编程;通用图形处理器(GPGPU);英伟达(NVDIA);统一计算设备架构(CUDA);字符串匹配算法研究工具(SMART)

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Adams MD, Celniker SE, Holt RA, et al., 2000. The genome sequence of Drosophila melanogaster. Science, 287(5461):2185-2195.

[2]Adey SP, 2013. GPU Accelerated Pattern Matching Algorithm for DNA Sequences to Detect Cancer Using CUDA. MS Thesis, College of Engineering, Pune, India.

[3]Ashkiani S, Amenta N, Owens JD, 2016. Parallel approaches to the string matching problem on the GPU. Proc 28th ACM Symp on Parallelism in Algorithms and Architectures, p.275-285.

[4]Baeza-Yates R, Gonnet GH, 1992. A new approach to text searching. Commun ACM, 35(10):74-82.

[5]Bellekens X, Atkinson RC, Renfrew C, et al., 2013. Investigation of GPU-based pattern matching. Proc 14th Post Graduate Symp on the Convergence of Telecommunications, Networking and Broadcasting, Article 5.

[6]Blelloch GE, 1990. Vector Models for Data-Parallel Computing. MIT Press, Cambridge, USA.

[7]Ceruzzi PE, 2003. A History of Modern Computing. MIT Press, Cambridge, USA.

[8]Cormen TH, Leiserson CE, Rivest RL, et al., 2009. Introduction to Algorithms (3rd Ed.). MIT Press, Cambridge, USA.

[9]Crochemore M, Rytter W, 1994. Text Algorithms. Oxford University Press, Inc., New York, USA.

[10]Faro S, Lecroq T, 2010. The exact string matching problem: a comprehensive experimental evaluation. http://arxiv.org/abs/1012.2547

[11]Faro S, Lecroq T, 2013. The exact online string matching problem. ACM Comput Surv, 45(2):13.

[12]Faro S, Lecroq T, Borzì S, et al., 2016. The string matching algorithms research tool. Proc Prague Stringology Conf, p.99-111.

[13]Feng WC, Cameron K, 2007. The Green500 list: encouraging sustainable supercomputing. Computer, 40(12):50-55.

[14]Girkar M, Polychronopoulos CD, 1995. Extracting task-level parallelism. ACM Trans Program Lang Syst, 17(4):600-634.

[15]Hains D, Cashero Z, Ottenberg M, et al., 2011. Improving CUDASW++, a parallelization of Smith–Waterman for CUDA enabled devices. IEEE Int Symp on Parallel and Distributed Processing Workshops and PhD Forum, p.490-501.

[16]Han TS, Ko SK, Kang J, 2007. Efficient subsequence matching using the longest common subsequence with a dual match index. 5th Int Conf on Machine Learning and Data Mining in Pattern Recognition, p.585-600.

[17]Harris M, 2012. How to Overlap Data Transfers in CUDA C/C++. https://developer.nvidia.com/blog/how-overlap-data-transfers-cuda-cc/ [Accessed on Feb. 3, 2024].

[18]He LT, Fang BX, Sui J, 2005. The wide window string matching algorithm. Theor Comput Sci, 332(1-3):391-404.

[19]Horspool RN, 1980. Practical fast searching in strings. Softw Pract Exp, 10(6):501-506.

[20]Jaiswal M, 2014. Accelerating enhanced Boyer–Moore string matching algorithm on multicore GPU for network security. Int J Comput Appl, 97(1):30-35

[21]Kadhim HA, AbdulRashidx N, 2014. Maximum-shift string matching algorithms. Int Conf on Computer and Information Sciences, p.1-6.

[22]Kouzinopoulos CS, Michailidis PD, Margaritis KG, 2015. Multiple string matching on a GPU using CUDAs. Scalable Comput, 16(2):121-137.

[23]Lee CL, Lin YS, Chen YC, 2015. A hybrid CPU/GPU pattern-matching algorithm for deep packet inspection. PLoS ONE, 10(10): e0139301.

[24]Ligowski L, Rudnicki W, 2009. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. IEEE Int Symp on Parallel & Distributed Processing, p.1-8.

[25]Lin CY, Och FJ, 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. Proc 42nd Annual Meeting of the Association for Computational Linguistics, p.605-612.

[26]Lin DL, Huang TW, 2021. Efficient GPU computation using task graph parallelism. 27th Int Conf on Parallel and Distributed Computing on Euro-Par: Parallel Processing, p.435-450.

[27]Lu SY, Fu KS, 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans Syst Man Cybern, 8(5):381-389.

[28]Mitani Y, Ino F, Hagihara K, 2017. Parallelizing exact and approximate string matching via inclusive scan on a GPU. IEEE Trans Parall Distrib Syst, 28(7):1989-2002.

[29]Nagaveni V, Raju GT, 2014. Various string matching algorithms for DNA sequences to detect breast cancer using CUDA processors. Int J Eng Technol, 14(3):42-47.

[30]Navarro G, Raffinot M, 1998. A bit-parallel approach to suffix automata: fast extended string matching. 9th Annual Symp on Combinatorial Pattern Matching, p.14-33.

[31]NVIDIA, 2019. Record 136 NVIDIA GPU-Accelerated Supercomputers Feature in TOP500 Ranking. https://blogs.nvidia.com/blog/record-gpu-accelerated-supercomputers-top500/ [Accessed on Feb. 3, 2024].

[32]Peng JF, Chen H, 2010. CUgrep: a GPU-based high performance multi-string matching system. Proc 2nd Int Conf on Future Computer and Communication, p.77-81.

[33]Petrakis EGM, 1993. Image Representation, Indexing and Retrieval Based on Spatial Relationships and Properties of Objects. PhD Thesis, University of Crete, Crete, Greece.

[34]Pungila C, Negru V, 2012. A highly-efficient memory-compression approach for GPU-accelerated virus signature matching. 15th Int Conf on Information Security, p.354-369.

[35]Quinn MJ, 2004. Parallel Programming in C with MPI and OpenMP. McGraw-Hill Higher Education, Boston, USA.

[36]Ramos-Frías R, Vargas-Lombardo M, 2017. A review of string matching algorithms and recent implementations using GPU. Int J Secur Appl, 11(6):69-76.

[37]Rasool A, Khare N, 2012. Parallelization of KMP string matching algorithm on different SIMD architectures: multi-core and GPGPUs. Int J Comput Appl, 49(11):26-28.

[38]Sellis TK, 1988. Multiple-query optimization. ACM Trans Database Syst, 13(1):23-52.

[39]Sharma J, Singh M, 2015. CUDA based Rabin-Karp pattern matching for deep packet inspection on a multicore GPU. Int J Comput Netw Inform Secur, 7(10):70-77.

[40]Subhlok J, Stichnoth JM, O’Hallaron DR, et al., 1993. Exploiting task and data parallelism on a multicomputer. Proc 4th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming, p.13-22.

[41]Tian XX, Song YL, Wang XL, et al., 2012. Shortest path based potential common friend recommendation in social networks. 2nd Int Conf on Cloud and Green Computing, p.541-548.

[42]Tran NP, Lee M, 2013. High performance string matching for security applications. Int Conf on ICT for Smart Society, p.1-5.

[43]Tran NP, Lee M, Hong S, et al., 2012. Memory efficient parallelization for Aho–Corasick algorithm on a GPU. IEEE 14th Int Conf on High Performance Computing and Communication & IEEE 9th Int Conf on Embedded Software and Systems, p.432-438.

[44]Weiner P, 1973. Linear pattern matching algorithms. 14th Annual Symp on Switching and Automata Theory, p.1-11.

[45]Xu KF, Cui WK, Hu Y, et al., 2013. Bit-parallel multiple approximate string matching based on GPU. Proc Comput Sci, 17:523-529.

[46]Yao ACC, 1979. The complexity of pattern matching for a random string. SIAM J Comput, 8(3):368-387.

[47]Yong KK, Karuppiah EK, 2013. Hash match on GPU. IEEE Conf on Open Systems, p.150-155.

[48]Zha XY, Sahni S, 2013. GPU-to-GPU and host-to-host multipattern string matching on a GPU. IEEE Trans Comput, 62(6):1156-1169.

[49]Ziv J, Lempel A, 1977. A universal algorithm for sequential data compression. IEEE Trans Inform Theory, 23(3):337-343.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE