305 120
Full Length Article
Volume 5 , Issue 1, PP: 43-61 , 2021

Title

Plagiarism Detection Algorithm Model Based on NLP Technology

Authors Names :   Ahmed A. Elngar   1 *     Mohamed Gamal   2     Amar Fathy   3     Basma Moustafa   4     Omar Mahmoud   5     Mohamed Shaban   6  

1  Affiliation :  Faculty of Computers and Artificial Intelligence, Beni-Suef University, Beni Suef, City, 62511, Egypt

    Email :  elngar_7@yahoo.co.uk


2  Affiliation :  Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence,

    Email :  peseesabim@gmail.com


3  Affiliation :  Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence

    Email :  fathyamar78@gmail.com


4  Affiliation :  Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence

    Email :  basmamoustafa31@gmail.com


5  Affiliation :  Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence

    Email :  mora50017@gmail.com


6  Affiliation :  Department of Computer Science, Scientific Innovation Research Group (SIRG) member, Beni Suef University of Computers and Artificial Intelligence

    Email :   muahmmedshaban153@gmail.com



Doi   :  DOI: 10.5281/zenodo.3909971

( Received: June 23, 2020 , Revised: August 22, 2020 , Accepted: October 16, 2020)

Abstract :

 

We can bear in mind that each of us has plagiarized a text without realizing that it was plagiarism, Plagiarism can happen in Articles, Papers, Researches, literature, music, software, scientific, newspapers, websites, Master and PHD Thesis and many other fields, So plagiarism has become serious major problem to teachers, researchers and publishers, There are divergent opinions about how to define plagiarism and what makes plagiarism serious.

 

So, the detecting plagiarism is very important, so in this survey we explicate the concept of "plagiarism" and provide an overview of different plagiarism software and tools to solve the plagiarism problem, and will discuss the plagiarism process, types and detection methodologies.  We can define that plagiarism is the brief and the description of this sentence "someone used someone else’s mental product (such as its texts, ideas, or privacy).   We suggest that what makes plagiarism so reprehensible is that it distorts scientific credit. In addition, intentional plagiarism indicates dishonesty. Moreover, there are a number of possible negative consequences of plagiarism. So we just create a framework for external plagiarism detection in which a some NLP processes are applied to process a set of suspicious and original documents, we have classified the different plagiarism detection techniques based on Lexical, Semantic, Syntactic and grammar analysis algorithms, And all of these algorithms precedes it NLP processing.

 

Keywords :

 

plagiarism , NLP , detection methodologies , Lexical Analysis , Semantic Analysis , NLTK , LSA , PLSA , LDA.

 

References :

 

[1] Indurkhya, Nitin, and Frederick J. Damerau. Handbook of Natural Language Processing. Chapman & Hall/CRC, 2010. [2] “Natural Language Toolkit¶.” Natural Language Toolkit - NLTK 3.5 Documentation, www.nltk.org/. [3] Angry Ronald Adam & Suharjito, Plagiarism Detection Algorithm using natural language processing based on      grammar analyzing. [4] Thomas Hofmann, Probabilistic Latent Semantic Analysis. [5] Yan, Tingxu & Maxwell, Tamsin & Song, Dawei & Hou, Yuexian & Zhang, Peng. (2010).Event-Based Hyperspace Analogue to Language for Query Expansion. [6] Azzopardi, Leif & Girolami, Mark & Crowe, Malcolm. (2005). Probabilistic hyperspace analogue to language. [7] (n.d.).Retrieved from https://www.cs.rochester.edu/~nelson/courses/csc_173/grammars/cfg.html [8] Context Free Grammars. (n.d.). Retrieved from https://brilliant.org/wiki/context-free grammars/#:~:text=A context-free grammar   is,, compiler design, and linguistics. [9] Libretexts. (2020, May 18). 4.1: Context-free Grammars. Retrieved  from https://eng.libretexts.org/Bookshelves/Computer_Science/Book:_Foundations_of_Computation_(Critchlow_and_Eck)/04:_Grammars/4.01:_Context-free_Grammars [10] Context-Free Grammar Introduction. (n.d.). Retrieved from https://www.tutorialspoint.com/automata_theory/context_free_grammar_introduction.htm [11] CFG Simplification. (n.d.). Retrieved from https://www.tutorialspoint.com/automata_theory/cfg_simplification.htm [12] Parsing English with a Link Grammar - arXiv. (n.d.). Retrieved from https://arxiv.org/pdf/cmp-lg/9508004v1.pdf [13] Guest. (n.d.). A Robust Parsing Algorithm for Link Grammars. Retrieved from https://mafiadoc.com/a-robust- parsing-algorithm-for-link grammars_5b722d8b097c47f2548b457c.html [14] Dependency Grammar and Dependency Parsing (Joakim Nivre). Retrieved from                                         https://cl.lingfil.uu.se/~nivre/docs/05133.pdf [15] Dependency Treebanks :Methods, Annotation Schemes and Tools(Tuomo Kakkonen).Retrieved from                https://www.researchgate.net/publication/1960118_Dependency_Treebanks_Methods_Annotation_Schemesand_Tools [16] Dependency parser (Hays 1962).retrived from                https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1162/handouts/SLoSP-2014-4-dependencies.pdf