CLC number: TP309.5
On-line Access: 2018-08-06
Received: 2016-08-22
Revision Accepted: 2017-03-15
Crosschecked: 2018-06-08
Cited: 0
Clicked: 7551
Ahmad Firdaus, Nor Badrul Anuar, Ahmad Karim, Mohd Faizal Ab Razak. Discovering optimal features using static analysis and a genetic search based method for Android malware detection[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(6): 712-736.
@article{title="Discovering optimal features using static analysis and a genetic search based method for Android malware detection",
author="Ahmad Firdaus, Nor Badrul Anuar, Ahmad Karim, Mohd Faizal Ab Razak",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="6",
pages="712-736",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601491"
}
%0 Journal Article
%T Discovering optimal features using static analysis and a genetic search based method for Android malware detection
%A Ahmad Firdaus
%A Nor Badrul Anuar
%A Ahmad Karim
%A Mohd Faizal Ab Razak
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 6
%P 712-736
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601491
TY - JOUR
T1 - Discovering optimal features using static analysis and a genetic search based method for Android malware detection
A1 - Ahmad Firdaus
A1 - Nor Badrul Anuar
A1 - Ahmad Karim
A1 - Mohd Faizal Ab Razak
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 6
SP - 712
EP - 736
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601491
Abstract: Mobile device manufacturers are rapidly producing miscellaneous android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search (GS), which is a search based on a genetic algorithm (GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Naïve Bayes (NB), functional trees (FT), J48, random forest (RF), and multilayer perceptron (MLP). Among these classifiers, FT gave the highest accuracy (95%) and true positive rate (TPR) (96.7%) with the use of only six features.
[1]Aafer Y, Du WL, Yin H, 2013. Droidapiminer: mining API-level features for robust malware detection in Android. Proc 9th Int ICST Conf on Security and Privacy in Communication Networks, p.86-103.
[2]Adewole KS, Anuar NB, Kamsin A, et al., 2017. Malicious accounts: dark of the social networks. J Netw Comput Appl, 79:41-67.
[3]Afifi F, Anuar NB, Shamshirband S, et al., 2016. Dyhap: dynamic hybrid ANFIS-PSO approach for predicting mobile malware. PLoS ONE, 11(9):e0162627.
[4]Android, 2015. App manifest. http://developer.Android.com/guide/topics/manifest/manifest-intro.html [Accessed on Apr. 28, 2015].
[5]Android Developers, 2015. Android security overview. Android. https://source.Android.com/devices/tech/ security/ [Accessed on Sept. 1, 2015].
[6]Anuar NB, Sallehudin H, Gani A, et al., 2008. Identifying false alarm for network intrusion detection system using hybrid data mining and decision tree. Malays J Comput Sci, 21(2):101-115.
[7]Anuar NB, Papadaki M, Furnell S, et al., 2013. Incident prioritisation using analytic hierarchy process (AHP): risk index model (RIM). Secur Commun Netw, 6(9):1087-1116.
[8]Apvrille A, Strazzere T, 2012. Reducing the window of opportunity for Android malware gotta catch ’em all. J Comput Virol, 8(1-2):61-71.
[9]Arp D, Spreitzenbarth M, Malte H, et al., 2014. Drebin: effective and explainable detection of Android malware in your pocket. Proc Symp on Network and Distributed System Security, p.1-15.
[10]Arzt S, Rasthofer S, Fritz C, et al., 2014. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android Apps. Proc 35th ACM SIGPLAN Conf on Programming Language Design and Implementation, p.259-269.
[11]Aung Z, Zaw W, 2013. Permission-based Android malware detection. Int J Sci Technol Res, 2(3):228-234.
[12]Bartel A, Klein J, Le Traon Y, et al., 2012. Automatically securing permission-based software by reducing the attack surface: an application to Android. Proc 27th IEEE/ ACM Int Conf on Automated Software Engineering, p.274-277.
[13]Bird S, Klein E, Loper E, 2009. Natural language processing with Python—analyzing text with the natural language toolkit. O’Reilly Media.
[14]Burguera I, Zurutuza U, Nadjm-Tehrani S, 2011. Crowdroid: behavior-based malware detection system for Android. Proc 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, p.15-26.
[15]Caruana R, Karampatziakis N, Yessenalina A, 2008. An empirical evaluation of supervised learning in high dimensions. Proc 25th Int Conf on Machine Learning, p.96-103.
[16]Chan PPK, Song WK, 2014. Static detection of Android malware by using permissions and API calls. Proc Int Conf on Machine Learning and Cybernetics, p.82-87.
[17]Chang TK, Hwang GH, 2007. The design and implementation of an application program interface for securing XML documents. J Syst Softw, 80(8):1362-1374.
[18]Chess B, McGraw G, 2004. Static analysis for security. IEEE Secur Priv, 2(6):76-79.
[19]Deshotels L, Notani V, Lakhotia A, 2014. Droidlegacy: automated familial classification of Android malware. Proc ACM SIGPLAN on Program Protection and Reverse Engineering Workshop, Article 3.
[20]Desnos A, 2015. Androguard. https://github.com/androguard/ androguard [Accessed on June 29, 2015].
[21]Díaz-Uriarte R, de Andrés SA, 2006. Gene selection and classification of microarray data using random forest. BMC Bioinform, 7:3.
[22]eBay, 2016. Online shopping. www.ebay.com [Accessed on Apr. 4, 2016].
[23]Faruki P, Ganmoor V, Laxmi V, et al., 2013. AndroSimilar: robust statistical feature signature for Android malware detection. Proc 6th Int Conf on Security of Information and Networks, p.152-159.
[24]Feizollah A, Anuar NB, Salleh R, et al., 2013a. A study of machine learning classifiers for anomaly-based mobile botnet detection. Malays J Comput Sci, 26(4):251-265.
[25]Feizollah A, Shamshirband S, Anuar NB, et al., 2013b. Anomaly detection using cooperative fuzzy logic controller. Proc 16th FIRA RoboWorld Congress, p.220-231.
[26]Feizollah A, Anuar NB, Salleh R, et al., 2015. A review on feature selection in mobile malware detection. Dig Invest, 13:22-37.
[27]Feizollah A, Anuar NB, Salleh R, et al., 2017. Androdialysis: analysis of Android intent effectiveness in malware detection. Comput Secur, 65:121-134.
[28]Feng Y, Anand S, Dillig I, et al., 2014. Apposcopy: semantics-based detection of Android malware through static analysis. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.576-587.
[29]Firdaus A, Anuar NB, 2015. Root-exploit malware detection using static analysis and machine learning. Proc 4th Int Conf on Computer Science and Computational Mathematics, p.177-183.
[30]Frank E, Hall MA, Witten IH, 2016. The WEKA Workbench (4th Ed.). Morgan Kaufmann. http://www.cs.waikato.ac.nz/ml/WEKA/Witten_et_al_2016_appendix.pdf
[31]Fröhlich H, Chapelle O, Schölkopf B, 2003. Feature selection for support vector machines by means of genetic algorithm. Proc 15th IEEE Int Conf on Tools with Artificial Intelligence, p.142-148.
[32]Gascon H, Yamaguchi F, Arp D, et al., 2013. Structural detection of Android malware using embedded call graphs. Proc ACM Workshop on Artificial Intelligence and Security, p.45-54.
[33]Goldberg DE, Holland JH, 1988. Genetic algorithms and machine learning. Mach Learn, 3(2-3):95-99.
[34]Google, 2014. Google play store. https://play.google.com/ store?hl=en [Accessed on Jan. 1, 2014].
[35]Gordon MI, Kim D, Perkins J, et al., 2015. Information-flow analysis of Android applications in droidSafe. Proc Network and Distributed System Security Symp, p.8-11.
[36]Grace M, Zhou YJ, Wang Z, et al., 2012a. Systematic detection of capability leaks in stock Android smartphones. Proc 19th Network and Distributed System Security Symp, p.1-15.
[37]Grace M, Zhou W, Jiang XX, et al., 2012b. Unsafe exposure analysis of mobile in-app advertisements. Proc 5th ACM Conf on Security and Privacy in Wireless and Mobile Networks, p.101-112.
[38]Grace M, Zhou YJ, Zhang Q, et al., 2012c. RiskRanker: scalable and accurate zero-day Android malware detection. Proc 10th Int Conf on Mobile Systems, Applications, and Services, p.281-294.
[39]Hall M, Frank E, Holmes G, et al., 2009. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl, 11(1):10-18.
[40]Huang CY, Tsai YT, Hsu CH, 2013. Performance evaluation on permission-based detection for Android malware. Proc Int Computer Symp, p.111-120.
[41]Huang JJ, Zhang XY, Tan L, et al., 2014. AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradiction. Proc 36th Int Conf on Software Engineering, p.1036-1046.
[42]Ikinci A, Holz T, Freiling F, 2008. Monkey-spider: detecting malicious websites with low-interaction honeyclients. Proc Sicherheit-Schutz und Zuverlässigkeit, p.407-421.
[43]Junaid M, Liu DG, Kung D, 2016. Dexteroid: detecting malicious behaviors in Android apps using reverse-engineered life cycle models. Comput Secur, 59:92-117.
[44]Kang H, Jang JW, Mohaisen A, et al., 2015. Detecting and classifying Android malware using static analysis along with creator information. Int J Distr Sens Netw, 11(6), Article 7.
[45]Karim A, Salleh RB, Shiraz M, et al., 2014. Botnet detection techniques: review, future trends, and issues. J Zhejiang Univ Sci-C (Comput & Elcetron), 15(11):943-983.
[46]Karim A, Salleh R, Khan MK, 2016. Smartbot: a behavioral analysis framework augmented with machine learning to identify mobile botnet applications. PLoS ONE, 11(3):e0150077.
[47]Khatavakhotan AS, Ow SH, 2015. Development of a software risk management model using unique features of a proposed audit component. Malays J Comput Sci, 28(2):110-131.
[48]Komili O, 2015. Sophos detects 100% of Android malware in independent test—for the sixth time in a row. https://blogs.sophos.com/2015/08/14/sophos-detects-100-of-Android-malware-in-independent-test-for-the-sixth-time-in-a-row/ [Accessed on Jan. 1, 2016].
[49]Kotsiantis SB, 2013. Decision trees: a recent overview. Artif Intell Rev, 39(4):261-283.
[50]Kotsiantis SB, Zaharakis ID, Pintelas PE, 2006. Machine learning: a review of classification and combining techniques. Artif Intell Rev, 26(3):159-190.
[51]La Delfa GC, Monteleone S, Catania V, et al., 2016. Performance analysis of visualmarkers for indoor navigation systems. Front Inform Technol Electron Eng, 17(8):730-740.
[52]Lai HJ, Tang Y, Luo HX, et al., 2011. Greedy feature selection for ranking. Proc 15th Int Conf on Computer Supported Cooperative Work in Design, p.42-46.
[53]Lee J, Lee S, Lee H, 2015. Screening smartphone applications using malware family signatures. Comput Secur, 52:234-249.
[54]Lee SH, Jin SH, 2013. Warning system for detecting malicious applications on Android system. Int J Comput Commun Eng, 2(3):324-327.
[55]Liang SY, Keep AW, Might M, et al., 2013. Sound and precise malware analysis for Android via pushdown reachability and entry-point saturation. Proc 3th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, p.21-32.
[56]Lippmann R, 1987. An introduction to computing with neural nets. IEEE ASSP Mag, 4(2):4-22.
[57]Lu L, Li ZC, Wu ZY, et al., 2012. CHEX: statically vetting Android apps for component hijacking vulnerabilities. Proc ACM Conf on Computer and Communications Security, p.229-240.
[58]Middlemiss MJ, Dick G, 2003. Weighted feature extraction using a genetic algorithm for intrusion detection. Proc Congress on Evolutionary Computation, p.1669-1675.
[59]Narudin FA, Feizollah A, Anuar NB, et al., 2016. Evaluation of machine learning classifiers for mobile malware detection. Soft Comput, 20(1):343-357.
[60]Peiravian N, Zhu XQ, 2013. Machine learning for Android malware detection using permission and API calls. Proc 25th Int Conf on Tools with Artificial Intelligence, p.300-305.
[61]Peng H, Gates C, Sarma B, et al., 2012. Using probabilistic generative models for ranking risks of Android apps. Proc ACM Conf on Computer and Communications Security, p.241-252.
[62]Punch WFIII, Goodman ED, Pei M, et al., 1993. Further research on feature selection and classification using genetic algorithms. Proc 5th Int Conf on Genetic Algorithms, p.557-564.
[63]Rasthofer S, Arzt S, Bodden E, 2014. A machine-learning approach for classifying and categorizing Android sources and sinks. Proc Network and Distributed System Security Symp, p.1-15.
[64]Razak MFA, Anuar NB, Salleh R, et al., 2016. The rise of “malware”: bibliometric analysis of malware study. J Netw Comput Appl, 75:58-76.
[65]Russon MA, 2016. Android malware discovered on Google Play has infected millions of users with spyware. http://www.ibtimes.co.uk/Android-malware-discovered-google-play-store-1553341 [Accessed on June 13, 2016].
[66]Sahs J, Khan L, 2012. A machine learning approach to Android malware detection. Proc European Intelligence and Security Informatics Conf, p.141-147.
[67]Samra AAA, Yim K, Ghanem OA, 2013. Analysis of clustering technique in Android malware detection. Proc 7th Int Conf on Innovative Mobile and Internet Services in Ubiquitous Computing, p.729-733.
[68]Sanz B, Santos I, Laorden C, et al., 2013a. PUMA: permission usage to detect malware in Android. Int Joint Conf CISIS’12-ICEUTE’12-SOCO’12 Special Sessions. Springer Berlin Heidelberg, p.289-298.
[69]Sanz B, Santos I, Laorden C, et al., 2013b. Mama: manifest analysis for malware detection in Android. Cybern Syst, 44(6-7):469-488.
[70]Sarip AG, Hafez MB, Daud MN, 2016. Application of fuzzy regression model for real estate price prediction. Malays J Comput Sci, 29(1):15-27.
[71]Sarma BP, Li NH, Gates C, et al., 2012. Android permissions: a perspective combining risks and benefits. Proc 17th ACM Symp on Access Control Models and Technologies, p.13-22.
[72]Schmidt AD, Bye R, Schmidt HG, et al., 2009a. Static analysis of executables for collaborative malware detection on Android. Proc IEEE Int Conf on Communications, p.1-5.
[73]Schmidt AD, Schmidt HG, Batyuk L, et al., 2009b. Smartphone malware evolution revisited: Android next target? Proc 4th Int Conf on Malicious and Unwanted Software, p.1-7.
[74]Schneider J, 2016. Cross validation. http://www.cs.cmu.edu/~schneide/tut5/node42.html [Accessed on Aug. 1, 2016].
[75]Seo SH, Gupta A, Mohamed Sallam A, et al., 2014. Detecting mobile malware threats to homeland security through static analysis. J Netw Comput Appl, 38:43-53.
[76]Shabtai A, Fledel Y, Elovici Y, 2010. Automated static code analysis for classifying Android applications using machine learning. Proc Int Conf on Computational Intelligence and Security, p.329-333.
[77]Shabtai A, Kanonov U, Elovici Y, et al., 2012. “Andromaly”: a behavioral malware detection framework for Android devices. J Intell Inform Syst, 38(1):161-190.
[78]Sharif M, Yegneswaran V, Saidi H, et al., 2008. Eureka: a framework for enabling static malware analysis. Proc 13th Symp on Research in Computer Security, p.481-500.
[79]Sheen S, Anitha R, Natarajan V, 2015. Android based malware detection using a multifeature collaborative decision fusion approach. Neurocomputing, 151:905-912.
[80]Skylot, 2015. Jadx. https://github.com/skylot/jadx
[81]Stein G, Chen B, Wu AS, et al., 2005. Decision tree classifier for network intrusion detection with GA-based feature selection. Proc 43rd Annual Southeast Regional Conf, p.136-141.
[82]Suarez-Tangil G, Tapiador JE, Peris-Lopez P, et al., 2014. Dendroid: a text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst Appl, 41(4):1104-1117.
[83]Talha KA, Alper DI, Aydin C, 2015. Apk auditor: permission-based Android malware detection system. Dig Invest, 13:1-14.
[84]Thomas P, 2015. Google’s Android operating system dominates the smartphone market. http://finance.yahoo.com/news/google-Android-operating-system-dominates-170640913.html [Accessed on June 11, 2016].
[85]Tropp JA, 2004. Greed is good: algorithmic results for sparse approximation. IEEE Trans Inform Theory, 50(10): 2231-2242.
[86]Walenstein A, Deshotels L, Lakhotia A, 2012. Program structure-based feature selection for Android malware analysis. Proc 4th Int Conf on Security and Privacy in Mobile Information and Communication Systems, p.51-52.
[87]Williams G, 2010. ARFF data. http://datamining.togaware.com/survivor/ARFF_Data0.html [Accessed on Sept. 10, 2015].
[88]Wu DJ, Mao CH, Wei TE, et al., 2012. Droidmat: Android malware detection through manifest and API calls tracing. Proc 7th Asia Joint Conf on Information Security, p.62-69.
[89]Yang ZM, Yang M, 2012. LeakMiner: detect information leakage on Android with static taint analysis. Proc 3rd World Congress on Software Engineering, p.101-104.
[90]Yerima SY, Sezer S, McWilliams G, et al., 2013. A new Android malware detection approach using Bayesian classification. Proc IEEE 27th Int Conf on Advanced Information Networking and Applications, p.121-128.
[91]Yerima SY, Sezer S, McWilliams G, 2014a. Analysis of Bayesian classification-based approaches for Android malware detection. IET Inform Secur, 8(1):25-36.
[92]Yerima SY, Sezer S, Muttik I, 2014b. Android malware detection using parallel machine learning classifiers. Proc 8th Int Conf on Next Generation Mobile Apps, Services and Technologies, p.37-42.
[93]Yerima SY, Sezer S, Muttik I, 2015. High accuracy Android malware detection using ensemble learning. IET Inform Secur, 9(6):313-320.
[94]Yu L, Pan ZL, Liu JJ, et al., 2013. Android malware detection technology based on improved Bayesian classification. Proc 23rd Int Conf on Instrumentation, Measurement, Computer, Communication and Control, p.1338-1341.
[95]Zhang LS, Niu Y, Wu X, et al., 2013. A3: automatic analysis of Android malware. Proc 1st Int Workshop on Cloud Computing and Information Security, p.89-93.
[96]Zhang T, 2009. On the consistency of feature selection using greedy least squares regression. J Mach Learn Res, 10: 555-568.
[97]Zhou W, Zhou YJ, Jiang XX, et al., 2012. Detecting repackaged smartphone applications in third-party Android marketplaces. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.317-326.
[98]Zhou W, Zhou YJ, Grace M, et al., 2013. Fast, scalable detection of “Piggybacked” mobile applications. Proc 2nd ACM Conf on Data and Application Security and Privacy, p.185-196.
[99]Zia T, Akhter MP, Abbas Q, 2015. Comparative study of feature selection approaches for Urdu text categorization. Malays J Comput Sci, 28(2):93-109.
Open peer comments: Debate/Discuss/Question/Opinion
<1>