Journal of Cybersecurity and Information Management

Journal DOI

https://doi.org/10.54216/JCIM

Submit Your Paper

2690-6775ISSN (Online) 2769-7851ISSN (Print)

Volume 5 , Issue 1 , PP: 43-61, 2021 | Cite this article as | XML | Html | PDF | Full Length Article

Plagiarism Detection Algorithm Model Based on NLP Technology

Ahmed A. Elngar 1 * , Mohamed Gamal 2 , Amar Fathy 3 , Basma Moustafa 4 , Omar Mahmoud 5 , Mohamed Shaban 6

  • 1 Faculty of Computers and Artificial Intelligence, Beni-Suef University, Beni Suef, City, 62511, Egypt - (elngar_7@yahoo.co.uk)
  • 2 Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence, - (peseesabim@gmail.com )
  • 3 Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence - (fathyamar78@gmail.com )
  • 4 Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence - (basmamoustafa31@gmail.com)
  • 5 Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence - (mora50017@gmail.com )
  • 6 Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence - ( muahmmedshaban153@gmail.com)
  • Doi: https://doi.org/10.54216/JCIM.050104

    ( Received: June 23, 2020 , Revised: August 22, 2020 , Accepted: October 16, 2020)
    Abstract

     

    We can bear in mind that each of us has plagiarized a text without realizing that it was plagiarism, Plagiarism can happen in Articles, Papers, Researches, literature, music, software, scientific, newspapers, websites, Master and PHD Thesis and many other fields, So plagiarism has become serious major problem to teachers, researchers and publishers, There are divergent opinions about how to define plagiarism and what makes plagiarism serious. So, the detecting plagiarism is very important, so in this survey we explicate the concept of "plagiarism" and provide an overview of different plagiarism software and tools to solve the plagiarism problem, and will discuss the plagiarism process, types and detection methodologies. We can define that plagiarism is the brief and the description of this sentence "someone used someone else’s mental product (such as its texts, ideas, or privacy). We suggest that what makes plagiarism so reprehensible is that it distorts scientific credit. In addition, intentional plagiarism indicates dishonesty. Moreover, there are a number of possible negative consequences of plagiarism. So we just create a framework for external plagiarism detection in which a some NLP processes are applied to process a set of suspicious and original documents, we have classified the different plagiarism detection techniques based on Lexical, Semantic, Syntactic and grammar analysis algorithms, And all of these algorithms precedes it NLP processing.

    Keywords :

    Tex  , plagiarism, NLP, detection methodologies, Lexical Analysis, Semantic Analysis, NLTK, LSA, PLSA, LDA

    References

      [1]  Indurkhya, Nitin, and Frederick J. Damerau. Handbook of Natural Language Processing. Chapman & Hall/CRC, 2010.

    [2]  “Natural Language Toolkit¶.” Natural Language Toolkit - NLTK 3.5 Documentation, www.nltk.org/.

    [3]  Angry Ronald Adam & Suharjito, Plagiarism Detection Algorithm using natural language processing based on grammar analyzing.

    [4]  Thomas Hofmann, Probabilistic Latent Semantic Analysis.

    [5]  Yan, Tingxu & Maxwell, Tamsin & Song, Dawei & Hou, Yuexian & Zhang, Peng. (2010).Event-Based Hyperspace Analogue to Language for Query Expansion.

    [6]  Azzopardi, Leif & Girolami, Mark & Crowe, Malcolm. (2005). Probabilistic hyperspace analogue to language.

    [7]  (n.d.).Retrieved from https://www.cs.rochester.edu/~nelson/courses/csc_173/grammars/cfg.html

    [8]  Context Free Grammars. (n.d.). Retrieved from https://brilliant.org/wiki/context-free grammars/#:~:text=A context-free grammar   is,, compiler design, and linguistics.

    [9]Libretexts. (2020, May 18). 4.1: Context-free Grammars. Retrieved  from https://eng.libretexts.org/Bookshelves/Computer _Science/Book:_ Foundations_ of_ Computation _ (Critchlow_and_Eck)/04:_Grammars/4.01:_Context-free_Grammars

    [10]             Context-Free Grammar Introduction. (n.d.). Retrieved from https://www.tutorialspoint.com/automata_theory/context_free_grammar_introduction.htm

    [11]         CFG Simplification. (n.d.). Retrieved from https://www.tutorialspoint.com/automata_theory/cfg_simplification.htm

    [12]         Parsing English with a Link Grammar - arXiv. (n.d.). Retrieved from https://arxiv.org/pdf/cmp-lg/9508004v1.pdf

    [13] Guest. (n.d.). A Robust Parsing Algorithm for Link Grammars. Retrieved from https://mafiadoc.com/a-robust- parsing-algorithm-for-link grammars_ 5b722d8b097c47f2548b457c.html

    [14]         Dependency Grammar and Dependency Parsing (Joakim Nivre). Retrieved from https://cl.lingfil.uu.se/~nivre/docs/05133.pdf

    [15]   Dependency Treebanks :Methods, Annotation Schemes and Tools(Tuomo Kakkonen).Retrieved from https://www.researchgate. net/publication/1960118_ Dependency _Treebanks Methods_Annotation_Schemesand_Tools

    [16]         Dependency parser (Hays 1962).retrived from https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1162/handouts/SLoSP-2014-4-dependencies.pdf

    Cite This Article As :
    A., Ahmed. , Gamal, Mohamed. , Fathy, Amar. , Moustafa, Basma. , Mahmoud, Omar. , Shaban, Mohamed. Plagiarism Detection Algorithm Model Based on NLP Technology. Journal of Cybersecurity and Information Management, vol. , no. , 2021, pp. 43-61. DOI: https://doi.org/10.54216/JCIM.050104
    A., A. Gamal, M. Fathy, A. Moustafa, B. Mahmoud, O. Shaban, M. (2021). Plagiarism Detection Algorithm Model Based on NLP Technology. Journal of Cybersecurity and Information Management, (), 43-61. DOI: https://doi.org/10.54216/JCIM.050104
    A., Ahmed. Gamal, Mohamed. Fathy, Amar. Moustafa, Basma. Mahmoud, Omar. Shaban, Mohamed. Plagiarism Detection Algorithm Model Based on NLP Technology. Journal of Cybersecurity and Information Management , no. (2021): 43-61. DOI: https://doi.org/10.54216/JCIM.050104
    A., A. , Gamal, M. , Fathy, A. , Moustafa, B. , Mahmoud, O. , Shaban, M. (2021) . Plagiarism Detection Algorithm Model Based on NLP Technology. Journal of Cybersecurity and Information Management , () , 43-61 . DOI: https://doi.org/10.54216/JCIM.050104
    A. A. , Gamal M. , Fathy A. , Moustafa B. , Mahmoud O. , Shaban M. [2021]. Plagiarism Detection Algorithm Model Based on NLP Technology. Journal of Cybersecurity and Information Management. (): 43-61. DOI: https://doi.org/10.54216/JCIM.050104
    A., A. Gamal, M. Fathy, A. Moustafa, B. Mahmoud, O. Shaban, M. "Plagiarism Detection Algorithm Model Based on NLP Technology," Journal of Cybersecurity and Information Management, vol. , no. , pp. 43-61, 2021. DOI: https://doi.org/10.54216/JCIM.050104