International Journal of Information Engineering and Electronic Business(IJIEEB)

ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online)

Published By: MECS Press

IJIEEB Vol.15, No.2, Apr. 2023

Assessing Similarity between Software Requirements: A Semantic Approach

Full Text (PDF, 885KB), PP.38-53

Views:3   Downloads:1


Farooq Ahmad, Mohammad Faisal

Index Terms

Measurement; Requirements; Similarity; Semantic; Reusability; Framework


The majority of projects fail to achieve their intended objectives, according to research. This could arise for a number of reasons, such as ensuring requirements are managed, excessive documentation of the code, or the difficulty in delivering software that includes all the requested features on time. An effort could be made to overcome such failure rates by establishing a proper management of requirements and concept of reusability. The correct requirements can be identified by checking similarity between the requirements received from the various stakeholders. A reusable software component can result in substantial savings in both time and money. It can be challenging to make a choice regarding the reuse of certain software components. A comparison of the requirements of a new project with those of previous projects prior to starting a new project or even at a later stage during development is useful for identifying reusable components. This paper proposes a framework (ReSim) for identifying software requirements' similarities, in an attempt to improve reusability and identify the correct requirements. A crucial component of ReSim is to measure similarity between software requirements. Different well-known similarity measurement techniques used by the researchers to evaluate the similarity between the software requirements. Some of the methods used to measure this include dice, jaccard, and cosine coefficients, but in this paper, we have used recently developed hybrid method which considers not only semantic information including lexical databases, word embeddings, and corpus statistics, but also implied word order information and produced significant improvements in the results related to the measurement of semantic similarity between words and sentences. As part of the experiments, the study used PURE dataset - in order to demonstrate the efficacy of the proposed framework. As a result, recently developed hybrid method of measuring the requirements similarity is more accurate than Dice, Jaccard, and Cosine, while Cosine is a better choice than Dice, and Jaccard is more accurate than Dice. Thus, ReSim outperforms existing approaches when tested on the PURE dataset, providing the most accurate results for both functional and non-functional requirements.  

Cite This Paper

Farooq Ahmad, Mohammad Faisal, " Assessing Similarity between Software Requirements: A Semantic Approach", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.15, No.2, pp. 38-53, 2023. DOI:10.5815/ijieeb.2023.02.05


[1]Madhavan, J., Bernstein, P. A., Doan, A., and Halevy, A. 2005. Corpus-Based Schema Matching. In Proceedings of the 21st international Conference on Data Engineering (April 05 - 08, 2005). ICDE. IEEE Computer Society, Washington, DC, 57-68.

[2]Seidl, T. and Kriegel, H. 1997. Efficient User-Adaptable Similarity Search in Large Multimedia Databases. In Proceedings of  the 23rd  international Conference on Very Large Data Bases (August 25 - 29, 1997), San Francisco, CA, 506-515. 

[3]Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., and Huang, T. S. 1997. Supporting similarity queries in MARS. InProceedings of the Fifth ACM international Conference on Multimedia (Seattle,  Washington,  United  States, November 09  - 13, 1997). Multimedia '97. ACM, New York, NY, 403- 413. 

[4]Braunmueller, B., Ester, M., Kriegel, H. P. and Sander, J. 2000. Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases.  In Proceedings of the 16th International Conference on Data Engineering, February 28  - March 03, 2000, Washington, DC, 256.

[5]Berchtold, S. and Kriegel, H. 1997. S3: similarity search in CAD database systems. SIGMOD Rec. 26, 2 (Jun. 1997), 564-567. 

[6]Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time series databases. SIGMOD Rec. 23, 2 (Jun. 1994), 419-429.

[7]Badue, C. S., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, A., and  Ziviani, N. 2007. Analyzing imbalance among homogeneous index servers in a web search system. Inf. Process. Manage. 43, 3 (May. 2007), 592-608. 

[8]White M, Tufano M, Vendome C, Poshyvanyk D (2016) Deep learning code fragments for code clone detection. In: International Conference on Automated Software Engineering (ASE), pp. 87–98. IEEE 

[9]Arora C, Sabetzadeh M, Briand L, Zimmer F (2016) Automated extraction and clustering of requirements glossary terms. Trans Softw Eng 43(10):918–945

[10]H. Samosir, D. Siahaan. 2019. Identifying Requirements Association Based on Class Diagram Using Semantic Similarity. Institute for Research and Community Services, Udayana University, Bali-Indonesia, VOL. 10, NO. 1 APRIL 2019

[11]Wael H. Gomaa, Aly A. Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications, 68(13).

[12]Farooq Ahmad, Dr. Mohammad Faisal. 2022. A novel hybrid methodology for computing semantic similarity between sentences through various word senses, International Journal of Cognitive Computing in Engineering 3 (2022) 58–77, Volume 3, June 2022, Pages 58-77

[13]Eyal-Salman H, Seriai AD, Dony C (2013) Feature-to-code traceability in a collection of software variants: Combining formal concept analysis and information retrieval. In: 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI), pp. 209–216

[14]Shatnawi A, Seriai A, Sahraoui H, Ziadi T, Seriai A (2020) Reside: Reusable service identification from software families. JSS 170:110748

[15]Bourque, P., & Fairley, R. E. 2014. Guide to the software engineering body of knowledge (SWEBOK (R)): Version 3.0. IEEE Computer Society Press.

[16]Steinberger, R., Pouliquen, B., & Hagman, J. 2002. Cross-lingual document similarity calculation using the multilingual thesaurus eurovoc. In Computational Linguistics and Intelligent Text Processing (pp. 415-424). Springer Berlin Heidelberg.

[17]Michael Keating. 2012. Reuse Methodology Manual for System-On-A-Chip Designs. Springer Science & Business Media.

[18]The Design Reuse Benchmark Report Seizing the Opportunity to Shorten Product Development February 2007. Aberdeen Group © AberdeenPLM.pdf?contentid=10333.

[19]Dante Carrizo a, , Oscar Dieste b , Natalia Juristo. 2014. Systematizing requirements elicitation technique selection. Information and Software Technology, 56(6), 644-669.

[20]Apoorva Mishra and Deepty Dubey. 2013. A comparative study of different software development life cycle models in different scenarios. International Journal of Advance research in computer science and management studies.

[21]Swarnalatha k s, GN Srinivasan, Meghana Dravid, Raunak kasera, Kopal Sharma. 2014. A Survey on Software Requirement Engineering for Real Time Projects based on Customer Requirement. International Journal of Advanced Research in Computer and Communication Engineering.

[22]Pierre Bourque and Richard E. Fairley. 2014. Guide to the software engineering body of knowledge (SWEBOK (R)):Version 3.0. IEEE Computer Society Press.

[23]J. Natt och Dag, B. Regnell, P. Carlshamre, M. Andersson and J. Karlsson. Evaluating Automated Support for Requirements Similarity Analysis in Market-Driven Development. 7th Int. Workshop on Requirements Engineering: Foundation for Software Quality, June 4-5 2001, Interlaken, Switzerland.

[24]A. Rodríguez and M. Egenhofer. Comparing geospatial entity classes: an asymmetricand context-dependent similarity measure. International Journal of Geographical Information Science, 18(3):229–256, 2004.

[25]M. Raubal. Formalizing conceptual spaces. In A. Varzi and L. Vieu, editors, Formal Ontology in Information Systems, Proceedings of the Third International Conference, FOIS 2004.

[26]C. d’Amato, N. Fanizzi and F. Esposito. A semantic similarity measure for expressive description logics. In CILC 2005, Convegno Italiano di Logica Computazionale, Rome, Italy, 2005.

[27]R. Araújo and H. S. Pinto. Semilarity: Towards a modeldriven approach to similarity. International Workshop on Description Logics, volume 20, pages 155–162. Bolzano University Press, June 2007.

[28]K. Janowicz, P. Maué, M. Wilkes, S. Schade, F. Scherer, M. Braun, S. Dupke and W. Kuhn. Similarity as a Quality Indicator in Ontology Engineering. Formal Ontology in Information Systems: Proceedings of the Fifth International Conference (Fois 2008).

[29]C. Fellbaum, WordNet. Wiley Online Library, 1998.

[30]Rada Mihalcea, Courtney Corley and Carlo Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. Association for the Advancement of Artificial Intelligence.

[31]Papias Niyigena, Zhang Zuping, Weiqi Li and Jun Long. 2015. Efficient Pairwise Document Similarity. Computation in Big Datasets. International Journal of Database Theory and Application.

[32]David Sánchez, Montserrat Batet, David Isern and Aida Valls. 2012. Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications.

[33]Goldstone, R. L. (1994). Similarity, interactive activation, and mapping. Journal of Experimental Psychology: Learning, Memory, and Cognition.

[34]Monika Lanzenberger, Jennifer Sampson. 2008. Making Ontologies Talk: Knowledge Interoperability in the Semantic Web. IEEE Intelligent Systems.

[35]Y. Matsuo and M. Ishizuka, “Keyword extraction from a single document using word co-occurrence statistical information,” International Journal on Artificial Intelligence Tools, vol. 13, no. 01, pp. 157–169, 2004.

[36]D. Bollegala, Y. Matsuo, and M. Ishizuka, “Measuring semantic similarity between words using web search engines.” www, vol. 7, pp. 757–766, 2007.

[37]H. Rubenstein and J. B. Goodenough, “Contextual correlates of synonymy,” Communications of the ACM, vol. 8, no. 10, pp. 627–633, 1965.

[38]Farooq Ahmad, Mohd. Faisal, 2021. Comparative Study of Techniques used for Word and Sentence Similarity, 8th International Conference on Computing for Sustainable Global Development (INDIACom), IEEE Xplore, New Delhi, India, 17-19 March 2021

[39]Ziadi T, Frias L, da Silva MAA, Ziane M (2012) Feature identification from the source code of product variants. In: 2012 16th European Conference on Software Maintenance and Reengineering, pp. 417–422. IEEE

[40]Shatnawi A, Seriai AD, Sahraoui H (2017) Recovering software product line architecture of a family of object-oriented product variants. J Syst Softw 131:325–346

[41]Salton, G. 1989. Automatic  Text  Processing:  the Transformation,  Analysis,  and  Retrieval  of  Information  by  Computer. Addison-Wesley Longman Publishing Co., Inc.  

[42]Zhao L, Alhoshan W, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca EV, Batista-Navarro RT (2021) Natural language processing for requirements engineering: a systematic mapping study. ACM Comput Surv. https:// doi. org/ 10. 1145/ 34446 89

[43]Fernández DM, Wagner S, Kalinowski M, Felderer M, Mafra P, Vetrò A, Conte T, Christiansson MT, Greer D, Lassenius C et al (2017) Naming the pain in requirements engineering. Empir Softw Eng 22(5):2298–2338

[44]Ferrari A, Dell’Orletta F, Esuli A, Gervasi V, Gnesi S (2017) Natural language requirements processing: a 4d vision. IEEE Ann History Comput 34(06):28–35

[45]Kassab M, Neill C, Laplante P (2014) State of practice in requirements engineering: contemporary data. Innov Syst Softw Eng 10(4):235–241

[46]Borg M, Runeson P, Ardö A (2014) Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Emp Softw Eng 19(6):1565–1616. https:// doi. org/ 10. 1007/ s10664- 013- 9255-y

[47]Cleland-Huang J, Gotel OC, Huffman Hayes J, Mäder P, Zisman A (2014) Software traceability: trends and future directions. In: Future of Software Engineering Proceedings, pp. 55–69

[48]Gervasi V, Zowghi D (2014) Supporting traceability through affinity mining. In: International Requirements Engineering Conference (RE), pp. 143–152. IEEE

[49]Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep learning techniques. In: International Conference on Software Engineering (ICSE), pp. 3–14. IEEE

[50]Falessi D, Cantone G, Canfora G (2011) Empirical principles and an industrial case study in retrieving equivalent requirements via natural language processing techniques. Trans Softw Eng 39(1):18–44

[51]Arora C, Sabetzadeh M, Goknil A, Briand LC, Zimmer F (2015) Change impact analysis for natural language requirements: An nlp approach. In: International Requirements Engineering Conference (RE), pp. 6–15. IEEE

[52]Borg M, Wnuk K, Regnell B, Runeson P (2016) Supporting change impact analysis using a recommendation system: an industrial case study in a safety-critical context. IEEE Trans Softw Eng 43(7):675–700

[53]Abdullah Azzam, Yudi Priyadi, Jati Hiliamsyah Husen. 2021. Similarity Software Requirement Specification (SRS) Elicitation Based on the Requirement Statement Using Text Mining on the MNC Play Inventory Management Application. 2021.  4th International Conference of Computer and Informatics Engineering (IC2IE), IEEE Xplore, Depok, Indonesia, 14-15 Sept. 2021

[54]Muhammad Ilyas, Josef Küng. 2009. A Similarity Measurement Framework for Requirements Engineering. Fourth International Multi-Conference on Computing in the Global Information Technology (2009).

[55]Muhammad Ilyas, Josef Küng. 2009. A comparative analysis of Similarity Measurement Techniques through SimReq Framework. FIT '09: Proceedings of the 7th International Conference on Frontiers of Information Technology (2009). Article No.: 47, Pages 1–6

[56]Wael H. Gomaa, Aly A. Fahmy. 2013. A Survey of Text Similarity Approaches. International Journal of Computer Applications (0975 – 8887), Volume 68– No.13, April 2013

[57]Mohammad Mahmoud Tarawneh. 2017. Software Requirements Classification using Natural Language Processing and SVD. International Journal of Computer Applications (0975 – 8887), Volume 164 – No 1, April 2017  

[58]Fatma A. Mihany, Hanan Moussa, Amr Kamel, Ehab Ezat. 2016. A Framework for Measuring Similarity between Requirements Documents. INFOS '16: Proceedings of the 10th International Conference on Informatics and Systems (2016), Pages 334–335.

[59]Fatma A. Mihany, Hanan Moussa, Amr Kamel, Ehab Ezzat, Muhammad Ilyas. 2016. An Automated System for Measuring Similarity between Software Requirements. AMECSE '16: Proceedings of the 2nd Africa and Middle East Conference on Software Engineering (2016), Pages 46–51

[60]Abbas M, Ferrari A, Shatnawi A, Enoiu EP, Saadatmand M (2021), Is requirements similarity a good proxy for software similarity? an empirical investigation in industry. In: The 27th international working conference on requirements engineering: foundation for Software Quality, pp. 3–18. Springer International Publishing

[61]Muhammad Abbas, Alessio Ferrari, Anas Shatnawi, Eduard Enoiu, Mehrdad Saadatmand & Daniel Sundmark. 2022. On the relationship between similar requirements and similar software. A case study in the railway domain, Springer Nature, 18 January 2022

[62]Alessio Ferrari, Giorgio O. Spagnolo, Stefania Gnesi. 2017. PURE: A Dataset of Public Requirements Documents. IEEE 25th International Requirements Engineering Conference (RE). September 2017

[63]Li, B. and Han, L. (2013) “Distance weighted cosine similarity measure for text classification,” Intelligent Data Engineering and Automated Learning – IDEAL 2013, pp. 611–618. Available at: 

[64]Hancock, J.M. (2004) “Jaccard distance (Jaccard Index, Jaccard similarity coefficient),” Dictionary of Bioinformatics and Computational Biology [Preprint]. Available at: 

[65]Achananuparp, P., Hu, X. and Shen, X. (no date) “The evaluation of sentence similarity measures,” Data Warehousing and Knowledge Discovery, pp. 305–316. Available at: