Journal of Intelligent Systems and Internet of Things

Journal DOI

https://doi.org/10.54216/JISIoT

Submit Your Paper

2690-6791ISSN (Online) 2769-786XISSN (Print)

Volume 18 , Issue 2 , PP: 315-326, 2026 | Cite this article as | XML | Html | PDF | Full Length Article

DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks

Marwa Mawfaq Mohamedsheet Al-Hatab 1 * , Maysaloon Abed Qasim 2 , Sinan S. Mohammed Sheet 3

  • 1 Technical Engineering College, Northern Technical University, Mosul, Iraq - (marwa.alhatab@ntu.edu.iq)
  • 2 Technical Engineering College for Computer and Artificial Intelligence, Northern Technical University, Mosul, Iraq - (maysloon.alhashim@ntu.edu.iq)
  • 3 Technical Engineering College, Northern Technical University, Mosul, Iraq - (sinan_sm76@ntu.edu.iq)
  • Doi: https://doi.org/10.54216/JISIoT.180222

    Received: March 14, 2025 Revised: June 02, 2025 Accepted: July 10, 2025
    Abstract

    The promoter is the part of DNA, which is responsible of initiating RNA polymerase transcription of a gene. The location of this part of DNA is upstream the transcription start site. According to researches, the genetic promotors contribute majorly in many human diseases such as cancer, diabetes and Huntington’s disease. Therefore, promotor detection corresponds as a very crucial task. In this study, a hypered detection system, which integrates biologically developed feature extraction with traditional machine learning (ML) algorithms in addition to use Long Short-Term Memory (LSTM) network as a deep learning approach, has been proposed. The dataset used includes 106 nucleotide sequences. Results obtained from the study show that the perfect performance across all metrics (accuracy, sensitivity, specificity, precision, and F1-score) has been achieved when Naive Bayes used as a classifier, which reach 100% and AUC=1.The confusion matrix analyses and ROC curve confirm that LSTM model achieved 100% training accuracy and 84.38% test accuracy. The architecture and performance of the proposed model make it applicable in IoT-based intelligent genomic and healthcare systems, which enabling real-time and remote promoter detection.

    Keywords :

    Promoter detection , Machine learning , LSTM

    References

    [1]       C. Seila, L. J. Core, J. T. Lis, and P. A. Sharp, "Divergent transcription: a new feature of active promoters," Cell Cycle, vol. 8, no. 16, pp. 2557–2564, 2009, doi: 10.4161/cc.8.16.9335.

     

    [2]       V. Nain, S. Sahi, and P. A. Kumar, "In silico identification of regulatory elements in promoters," in Computational Biology and Applied Bioinformatics. InTech, 2011, pp. 47–66.

     

    [3]       R. McWhinnie, "Design of temperature inducible transcription factors and cognate promoters," Ph.D. dissertation, 2016.

     

    [4]       J. Blazeck and H. S. Alper, "Promoter engineering: recent advances in controlling transcription at the most fundamental level," Biotechnol. J., vol. 8, no. 1, pp. 46–58, 2013, doi: 10.1002/biot.201200183.

     

    [5]       M. Oubounyt, Z. Louadi, H. Tayara, and K. T. Chong, "DeePromoter: robust promoter predictor using deep learning," Front. Genet, vol. 10, p. 286, 2019, doi: 10.3389/fgene.2019.00286.

     

    [6]       S. Menon, S. Piramanayakam, and G. Agarwal, "Computational identification of promoter regions in prokaryotes and eukaryotes," EPRA Int. J. Res. Dev. (IJRD), vol. 6, no. 1, pp. 1–5, 2020, doi: 10.36713/epra7667.

     

    [7]       S. S. Bhandari, R. Walambe, and K. Kotecha, "Comparison of machine learning and deep learning techniques in promoter prediction across diverse species," PeerJ Comput. Sci., vol. 7, p. e365, 2021, doi: 10.7717/peerj-cs.365.

     

    [8]       M. A. Habib, M. M. H. Manik, and B. Khulna, "Classification of DNA sequence using machine learning techniques," EasyChair, vol. 4, 2022, doi: 10.36300/easychair.4.2022.

     

    [9]       S. Nikumbh and B. Lenhard, "Identifying promoter sequence architectures via a chunking-based algorithm using non-negative matrix factorisation," PLoS Comput. Biol., vol. 19, no. 11, p. e1011491, 2023, doi: 10.1371/journal.pcbi.1011491.

     

    [10]    S. Paul et al., "MLDSPP: bacterial promoter prediction tool using DNA structural properties with machine learning and explainable AI," J. Chem. Inf. Model., vol. 64, no. 7, pp. 2705–2719, 2024, doi: 10.1021/acs.jcim.4c00230.

     

    [11]    M. Takaku et al., "ATAC-seq Guided Interpretable Machine Learning Reveals Cancer-Specific Chromatin Features in Cell-free DNA," Res. Square, Jan. 2025, doi: 10.21203/rs.3.rs-5485170/v1.

     

    [12] D. Dua and C. Graff, UCI Machine Learning Repository: Promoter Gene Sequences. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Promoter+Gene+Sequences

     

    [13]    M. Martin-Landrove and B. P. Embaid, "Liapunov exponent distributions and maps for multiple parameter logistic equation. Application to DNA and RNA sequences," arXiv preprint, arXiv: 2505.02276, 2025.

     

    [14]    N. Kukushkin, One Hand Clapping: Unraveling the Mystery of the Human Mind. Rowman & Littlefield, 2025.

     

    [15]    F. Peña, L. Univaso, C. Román-Figueroa, and M. Paneque, "In Silico Genomic Analysis of Chloroplast DNA in Vitis Vinifera L.: Identification of Key Regions for DNA Coding," Genes, vol. 16, no. 6, p. 686, 2025, doi: 10.3390/genes16060686.

     

    [16]    Y. Hrytsenko, N. M. Daniels, and R. S. Schwartz, "Determining population structure from k-mer frequencies," PeerJ, vol. 13, p. e18939, 2025, doi: 10.7717/peerj.18939.

     

    [17]    M. M. Hussain, J. A. Zubair, A. Hassan, and K. Benahmed, "An improved K-nearest neighbors' classification for disease prediction," IEEE Access, vol. 8, pp. 100470–100477, 2020, doi: 10.1109/ACCESS.2020.2995684.

     

    [18]    N. Sharma, "Logistic regression in machine learning: A comprehensive study," Int. J. Recent Technol. Eng. (IJRTE), vol. 8, no. 6, pp. 2471–2475, 2020, doi: 10.35940/ijrte.F8455.038620.

     

    [19]    K. Jaiswal and V. Srivastava, "An improved naive Bayes algorithm for disease prediction," in Proc. 2017 Int. Conf. Comput. Commun. Technol. Smart Nation (IC3TSN), Gurgaon, India, 2017, pp. 173–176, doi: 10.1109/IC3TSN.2017.8284507.

     

    [20]    K. Jhajharia and P. Mathur, "A comprehensive review on machine learning in agriculture domain," IAES Int. J. Artif. Intell, vol. 11, no. 2, pp. 753–763, 2022, doi: 10.11591/ijai.v11.i2.pp753-763.

     

    [21]    Y. Yu, X. Si, C. Hu, and J. Zhang, "A review of recurrent neural networks: LSTM cells and network architectures," Neural Comput., vol. 31, no. 7, pp. 1235–1270, 2019, doi: 10.1162/neco_a_01199.

     

    [22]    M. Reyad, A. M. Sarhan, and M. Arafa, "A modified Adam algorithm for deep neural network optimization," Neural Comput. Appl., vol. 35, no. 23, pp. 17095–17112, 2023, doi: 10.1007/s00521-023-08795-0.

     

    [23]    R. R. O. Al-Nima, M. M. M. Al-Hatab, and M. A. Qasim, "An artificial intelligence approach for verifying persons by employing the deoxyribonucleic acid (DNA) nucleotides," J. Electr. Comput. Eng., vol. 2023, Art. no. 6678837, 2023, doi: 10.1155/2023/6678837.

     

    [24]    R. H. M. Ameen, N. M. Basheer, and A. K. Younis, "A survey: Breast cancer classification by using machine learning techniques," NTU-JET, vol. 2, no. 1, 2023, doi: 10.56286/ntujet.v2i1.367.

     

    [25]    S. Q. Hasan, “Shallow Model and Deep Learning Model for Features Extraction of Images”, NTU-JET, vol. 2, no. 3, 2023, doi: 10.56286/ntujet.v2i3.449.

    Cite This Article As :
    Mawfaq, Marwa. , Abed, Maysaloon. , S., Sinan. DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks. Journal of Intelligent Systems and Internet of Things, vol. , no. , 2026, pp. 315-326. DOI: https://doi.org/10.54216/JISIoT.180222
    Mawfaq, M. Abed, M. S., S. (2026). DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks. Journal of Intelligent Systems and Internet of Things, (), 315-326. DOI: https://doi.org/10.54216/JISIoT.180222
    Mawfaq, Marwa. Abed, Maysaloon. S., Sinan. DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks. Journal of Intelligent Systems and Internet of Things , no. (2026): 315-326. DOI: https://doi.org/10.54216/JISIoT.180222
    Mawfaq, M. , Abed, M. , S., S. (2026) . DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks. Journal of Intelligent Systems and Internet of Things , () , 315-326 . DOI: https://doi.org/10.54216/JISIoT.180222
    Mawfaq M. , Abed M. , S. S. [2026]. DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks. Journal of Intelligent Systems and Internet of Things. (): 315-326. DOI: https://doi.org/10.54216/JISIoT.180222
    Mawfaq, M. Abed, M. S., S. "DNA Sequence Identification via Biologically Guided Feature Engineering and Hybrid ML–LSTM Networks," Journal of Intelligent Systems and Internet of Things, vol. , no. , pp. 315-326, 2026. DOI: https://doi.org/10.54216/JISIoT.180222