Journal of Cybersecurity and Information Management JCIM 2690-6775 2769-7851 10.54216/JCIM https://www.americaspg.com/journals/show/3971 2019 2019 Identify and Remove Duplicated Records Using Q-gram and Statistical Techniques from the Data Warehouse University of Anbar, College of Computer Sciences and Information Technology, Anbar, Ramadi, 31001, Iraq Sura Sura University of Anbar, College of Computer Sciences and Information Technology, Anbar, Ramadi, 31001, Iraq Rihab Hazim University of Anbar, College of Computer Sciences and Information Technology, Anbar, Ramadi, 31001, Iraq Yaqeen Saad University of Anbar, College of Islamic Sciences, Anbar, Ramadi, 31001, Iraq Nadia Mohammed There are several real-world uses for the duplication system or record linkage. In order to help the system make the best judgments, it appears in a broad area of recognizing similar data, joining online papers in the wide web, detecting plagiarism, and allowing several applications to enter it. To improve the financial interest and applicability of logistics project, routing is crucial. The following is the issue with this study: Because duplicate receipts contain the same significant change in data restrictions and limitations, and the data change itself is minor, the duplicate record data is ambiguous to other redacted records that are reassembled with the same customer. The purpose of this study is to use statistical techniques and the Q-gram to discover the best method for the detection and removal of duplicate records. We propose the following goals to help achieve that goal: Reduce the size of the data warehouse (DW) by providing a data warehouse free of duplicates. Decrease the amount of time spent looking for the (DW) and improve the DSS. The approach is divided into two stages: first, identify similarity records based on Q-gram similarity; second, determine whether classification records may be improved by statistical methods. The percentage threshold of 0.68 has been determined. It goes through a statistical process that decides whether this record is duplicated if the key ratio similarity is surpassed. The accuracy of the suggested work is 79%. 2026 2026 01 09 10.54216/JCIM.170101 https://www.americaspg.com/articleinfo/2/show/3971