Volume 3 , Issue 1 , PP: 36-41, 2022 | Cite this article as | XML | Html | PDF | Full Length Article
Shanthalakshmi M 1 * , Susmita Mishra 2 , LincyJemina S 3 , Raashmi P 4 , Mannuru Shalin 5 , Jananeee.v 6
Doi: https://doi.org/10.54216/JCHCI.030105
This paper focuses on providing a solution to the direct conversion of speech to shorthand. Since shorthand is not understood by many but is used for writing quick transcripts, a product is developed that converts the speech to its appropriate Gregg shorthand. A website that will be used as a front end, will use a speech-to-text API to record the speech in real-time. The converted text will then be fed into a text-to-image retrieval model that derives its corresponding Gregg shorthand for the text. The text will then be displayed to the user in real-time. By achieving this, the model reduces the need to depend upon stenographers for transcribing scripts. The resulting model achieves a good result.
Devising Stenography , Cross Modal Attention , speech shorthand , speech conversion
[1] DionisA. Padilla, Nicole Kim U. Vitug and Julius Benito S. Marquez., “Deep learning approach in
Gregg shorthand word to English word conversion” (2020)
[2] ZhongJi and Kexin Chen, “Step-Wise Hierarchical Alignment Network for Image-Text Matching ’’
(2021)
[3] Xing Xu, Tan Wang, Yang Yang, Lin Zuo, FuminShen, and Heng Tao Shen, “Cross Model Attention
with Semantic Consistence for Image Text Matching’’ (2020)
[4] Neha Sharma andShipraSardana, “A Real-Time Speech to Text Conversion system using Bidirectional
Kalman Filter Matlab’’(2016)
[5] Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu and Xiaodong He, ”Stacked Cross Attention for
Image-Text Matching” (2018)
[6] K. R. Abhinand and H. K. AnasuyaDevi,“An Approach for Generating Pattern-Based Shorthand Using
Speech-to-Text Conversion and Machine Learning ’’ (2013)
[7] R.Rajasekaran , K.Ramar, “Handwritten Gregg Shorthand Recognition’’ in International Journal of
Computer Applications (2012)
[8] Zihao Wang , Xihui Liu , Hongsheng Li , Lu Sheng , JunjieYan , Xiaogang Wang and Jing Shao,
“CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval’’ in IEEE/CVF
International Conference on Computer Vision (ICCV) (2019)
[9] StanislavFrolov , Tobias Hinz , Federico Raue , J¨ornHees and Andreas Dengel, “Adversarial Text-to-
Image Synthesis: A Review” (Neural Networks Journal,2021)
[10] SaifuddinHitawala, “Comparative Study on Generative Adversarial Networks’’(2018)
[11] Cheng Wang, Haojin Yang, Christian Bartz and ChristophMeinel, “Image Captioning with Deep
Bidirectional LSTMs’’ (2016)
[12] Daniela Onita , Adriana Birlutiu and Liviu P. Dinu, “Towards Mapping Images to Text Using Deep-
Learning Architectures’’ (2020)
[13] Christine Dewi , Rung-Ching Chen , Yan-Ting Liu and Hui Yu , " Various Generative Adversarial
Networks Model for Synthetic Prohibitory Sign Image Generation'' , (2021)
[14] Hao Wu , Jiayuan Mao , Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, and Wei-Ying Ma.,
"Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning
Representations'' , (2019)
[15] Scott Reed, ZeynepAkata, Xinchen Yan, LajanugenLogeswaran , BerntSchiele and Honglak Lee,
"Generative Adversarial Text to Image Synthesis'' , (2016)