Apparel Recommendation Engine Using Inverse Document Frequency and Weighted Average Word2vec

Parvesh K¹, Tharun C² and Prakash M³

¹Postgraduate, School of Computer, University of Exeter, England, UK

²Computer Science and Engineering, Panimalar Engineering College, Chennai ,600123, India

tharunc17@gmail.com

³Professor, Department of Computer Science and Engineering, Karpagam College of Engineering , Coimbatore 641032, India.

prakashmohan@kce.ac.in

* Corresponding Author: tharunc17@gmail.com

Abstract

The rapid development of e-commerce shopping marketplaces necessitates the use of recommendation engines and quick, precise, and efficient algorithms in order for the company's business models to generate a massive amount of profit. A computer vision software programme enables a computer to learn a great deal from digital images or movies. Machine learning methods are used in computer vision, and several machine learning techniques have been developed specifically for this purpose. Information retrieval is the process of extracting useful information from a dataset, and computer vision is the most commonly used tool for this purpose nowadays. This project consists of a series of modules that run sequentially to retrieve information from a marked area on a receipt. A receipt image is used as an input for the model, and the model first uses various image processing algorithms to clean the data, after which the pre-processed data is applied to machine learning algorithms to produce better results, and the result is a string of numerical digits including the decimal point. The program's accuracy is primarily determined by the image quality or pixel density, and it is necessary to ensure that an input receipt is not damaged and content is not blurred.

Keywords: Term Frequency (TF), Collaborative filtering

1. Introduction

1.1 Recommendation engines

A recommendation engine uses various algorithms to discover, detect, predict, analyse, and filter data before recommending it to users. It initially records a customer's previous behaviour and then suggests items that the user may want to buy. The most important question is, what does a computer perceive in a picture? Pictures are nothing more than pixel layouts, and when coloured, each pixel has a depth of three. Machine learning algorithms are tasked with analysing pixel patterns and extracting information based on training. It is essentially a research area known as artificial intelligence and machine learning, which can benefit from specialised techniques and employ advanced

learning algorithms. Since the subject is an intellectual frontier, scientists have proposed numerous alternative definitions, such as computer vision. It is fascinating and chaotic, as are all borders, and there is sometimes no reliable authority to which to appeal. Many beneficial ideas lack a theoretical foundation or foundation, while some theories are in fact insignificant or useless; developed fields are widely dispersed, and one frequently appears to be inaccessible to the other. Before delving into this topic, we'll look at how to recommend products to users:

1. We may recommend to a client or user goods and services that are most popular among all users.

2. We may divide users into many groups with no theoretical basis or foundation, and certain ideas may be stupid or useless in reality.

Currently, computer vision is widely used for information collection. Individuals, even very young children, can solve information collection or IR trivially, but such simple models have not been proven to be very solid and trustworthy, so different researchers have provided an answer to this issue, and the concept is to link information recovery.

1.2. Approaches for designing recommendation engines

1.2.1. Collaborative filtering

Collaborative filtering is one method for developing highly useful recommendation systems. Collaborative filtering is based on the assumption that in the future, individuals who have agreed will agree. It is a difficult task to collect and analyse optical data in order to support or possibly replace human eyesight. Smart algorithms and systems are currently used in computer vision techniques. In fact, pixel computing, also known as visual computing, has emerged as one of the most important applications of this technology. However, computer vision is more than just machine learning in action. Three-dimensional scene modelling, multi-view camera geometry, movement structure, stereo correspondence, cloud-based point processing, and motion prediction were all included, but ML was not a major feature or aspect. These engines' goal is to collect or extract information from pixels that is interesting or significant. For instance, whether or not a particular item is present in a given scenario. Computer vision is not limited by pixels and can be far more sophisticated than image processing. These complex or sophisticated processes can be summed up in function detectors, which provide extensive information on the video/image material. Both technologies will be put to good use in a variety of ways. Probably where the benefits or advantages of both are instilled. Allen's approach to the nearest neighbour (k-NN), for example, and the Pearson Correlation.

1.2.2. Content-based filtering

Content-driven filtering, also known as cognitive filtering, makes recommendations based on a comparison of item content with the user profile. As a result, keyword or text-based retrieval techniques for digital multimedia appear to fall short of the requirement that humans precisely obtain multimedia material or information. Several issues must be addressed when developing a content-based filtering system. To begin, words can be assigned manually or automatically. On the other hand, it imposes an excessive amount of duty. Insecurity and subjectivity, on the other hand, persist. The extraction of picture features is the nature of this technique; whether or not the recovery is best depends on the efficiency and accuracy of the extraction features.

1.3. Mathematical Algorithms

1.3.1. Bag-of-words

The bag-of-words model is a simplified representation used in natural language processing and collection. By matching the function to compute the similarity between numerous functional data collected previously by defining a new or primitive feature, a fast re-compilation of the total score for a new set of weights is enabled. When comparing different images, the similarity score is generated or calculated for each primitive in the current query combination using the primitive's distance function. It is a technique for extracting textual features for use in machine learning algorithms.

1.3.2. TF or term frequency

Term frequency (TF) is utilised in the information retrieval process and indicates how often an expression (term, word) appears in a text. In the overall text, the word frequency shows the meaning of a specific phrase. This number is frequently discussed in the context of the IDF inverse frequency paper. Based on artificial ground notes, the picture recovery initially uses text, since the reverse image recovery into conventional keywords is already available. It is a difficult job to gather and analyse optical data so as to support or perhaps replace human eyesight. Computer vision techniques currently use smart algorithms and systems. To a certain degree pixel computing or visual computing has become one of the major areas for the effective use of this technology. However, computer vision is more than applied machine learning. It included work and tasks such as three-dimensional scene modelling, multi-view camera geometry, movement structure, stereo correspondence, cloud processing, motion forecast. The further study that adapted the visual image properties was conceived and became a key study and document called its term frequency.

1.3.3. IDF or inverse document frequency

This definition is mathematical, but there is another: a picture can be described using a two-dimensional array in columns and rows. Digital photographs are made up of a small number of components, each of which has a unique value in a particular location. These elements are also known as image components, picture elements, and pixels. A pixel is a unit of measurement that is commonly used to represent digital picture components. The quality of each image is determined by the pixels; the better the pixel in the unit area, the better the image.

From tf–idf, there have been a number of term weighting methods. TF–PDF (Term Frequency * Proportional Frequency Document) is one. In 2001, TF–PDF was introduced. This is a mathematical concept with several definitions, and a two-dimensional array can be described in columns and lines. Digital images are made up of a small number of components, each of which has a specific value in a specific location. These characteristics are commonly seen as picture elements, picture elements, and a pixel. A pixel is most commonly used to describe the components of a digital picture. The pixels determine the quality of any image; the better the pixel in the region, the higher the image quality.

1.3.4. Word2Vec

Word2vec is a collection of word embedding models. Before we analyse a picture, let's first understand what it means. A two-dimensional grid can be defined as a grid with (x, y) locations or coordinates, or as a function F (x,y) with x and y being location coordinates within a grid or mesh. When F is used in any context, the amplitude refers to the intensity of the image at the location, point, or region of the coordinates. And if the x, y, and amplitude values are final, we call this image a digital image. This concept is based on mathematics; an image can be described in columns and lines using a two-dimensional array. Digital photographs are made up of a limited number of components, each of which has a unique value in a specific location.

These are the fundamental elements of photographing and viewing a photograph. It is simple to take a photograph, but it is more difficult to process it. Many high-level languages, such as C++ and Python, can handle images. We take photos to document wonderful and pleasant events in our lives or in history. Memories have been saved and are ready to be "opened" at any time in the future. This chapter covers the fundamentals of image processing and modification in order to provide you with the necessary knowledge. Deeplearning4j offers a Word2vec distributor form for Java and Scala that is compatible with GPUs on Spark. Text extraction is a fancy phrase that is frequently used and required for jobs that involve document processing. It is also known as text mining or simply text analytics, and it is a set of processes or processes for producing high-quality information from a wide range of documents that contain trivial words, and that contains a large percentage of trivial words or text when compared to texts of interest or context words. The acquisition of high-quality information is dependent on the program's pattern recognition capability or some pattern learning technique. Text mining typically entails analysing and structuring input texts (typically parsing, combining some derived linguistic characteristics, removing other attributes, and storing the results in a database), generating patterns in structured data, and finally evaluating and interpreting output. The most common application is scanning a collection of high-level texts or natural languages such as English, Spanish, and so on.

2. Literature Review

We examined twenty research papers, four by each team member, and some of them were found to be extremely useful for this project.

2.1. Review Process Adopted

Priyankameel, Agnivagoswami Studyof The rapid development of e-commerce marketplaces, as well as the need for an engine to suggest and implement effective algorithms, have made it necessary for businesses' business models to generate enormous profits. This article describes a hybrid method that benefits both the semantics and frequency search markets.

Chantimalimaksomul, Using the Smart Closet Application, you can keep track of your closet directly on your smartphone. Users can add their own clothing and accessories to the Smart Closet system. This programme allows users to mix and match their clothing and accessories from head to toe. The Smart closet app is distinguished by the ability to save data on frequently worn clothing and accessories, and dressing styles are suggested based on what has been learned from previous statistics. In addition, this programme may recommend suitable clothing choices for current weather and particular events.

Abdullah Ammar, The use of the Word2Vec model in the classification of labelled data in English and Turkish Twitter feeds, as well as the effect of getting root on feeds to the Word2Vec model, are investigated in this study. There are two data sets in our study: English and Turkish. In the absence of twitter feeds, BOW and Word2Vec models were applied to each data set, and get roots were extracted. In this study, which is written in Python, the success percentages are compared using the scikit-learn classification algorithms, Linear SVM, and Logistic Regression.

Philip Berger, Pinterest, as a new social bookmarking tool, allows us to obtain more context for a picture than was previously possible. We provide much more sophisticated context by utilising the board header, pin descriptions, and the actual content of the bookmarked sites. We propose blog post images as a useful example to demonstrate the viability of the Pinterest environment. To provide matched images for articles, we use tag-based recovery models. This enables blog writers to obtain images for their posts, allowing them to produce more appealing articles more quickly.

Vipin deep kaur, On the publicly available GoodReads and Amazon Book Review datasets, I used unattended (Semantic Orientation - Mutual Information Point – Recovery of Information) and supervised machine learning methods (Support Vector Machine and Nave Bayes). The monitored method outperforms the unmonitored method on Amazon datasets, with Nave Bayes providing maximum accuracy of 73.72 percent to 74.73 percent in 5- and 10-fold cases, respectively.

3 THEORETICAL ASPECTS

3.1 Machine Learning and Recommendation Engines

In today's society, every consumer has numerous options. Until recently, people had typically purchased goods recommended by friends or others they trusted. Because each invoice is the same, extracting information from them is difficult. The majority of people spend their time doing things by hand. In large organisations, they attempt to create software using templates and struggle to handle so many corner cases. Some people, as previously described, train cutting-edge neural networks to effectively extract data from previously unknown and unseen bills or receipts. Now is the time to democratise and place it in the hands and areas of fascinating individuals or simple coders who are extremely powerful. We have so far provided a truly incredible solution for online applications for large businesses, such as WageWorks. When we hear the word "text," we immediately think of three basic words. Optical Recognition is defined as the electrical or mechanical conversion of typed, handwritten, or printed text images from a scanned document, a scene photograph, a document photograph, or some subtitle text overlaid on an image. It is now widely used as a form of information input from printed paper data doctory or documentation, whether invoices, passport documents, bank statements, computerised receipts, business cards, mail, statical data impression, or other desired or appropriate documentation:

· We may advise a person that is most popular among all users

· We may split consumers into many groups according to their preferences (user functions) and suggest goods depending on their segments

The majority of people spend their time doing things by hand. In large organisations, they attempt to create software using templates and struggle to handle so many corner cases. Some people, as previously described, train cutting-edge neural networks to effectively extract data from previously unknown and unseen bills or receipts. Now is the time to democratise and place it in the hands and areas of fascinating individuals or simple coders who are extremely powerful. We have so far provided a truly incredible solution for online applications for large businesses, such as WageWorks. Natural text scene recognition, on the other hand, is much more difficult; nowadays, we must pay close attention to image quality. Because every smartphone has a 5mp camera at a minimum, pixel density is not a major issue at this time. A natural scene recognition challenge is the issue with the sensor image, blurrings, lighting conditions, resolution, non-paper subject, non-planary item, unknown layout, and viewing angles. Before moving on to the KNN or closest neighbours machine learning algorithm, we'll look at some of the failed methods. Tesseract is Google's Optical Character Recognition software, which resulted in extremely small accurate digit recognition despite the use of Tesseract's other capabilities to identify the picture as a single text line and OCR numbers. It should be noted that the images with background noise were completely deleted prior to using Tesseract. The second method for text recognition is to create template images for each of the nine digits, then detect each digit in an image to compare each of the 0 to 9 templates using the Open CV's Match Template Function.

With the exception of training and prediction, all other stages have been covered in previous chapters. Now is the time to discuss another prediction or machine learning method that is commonly used to achieve these goals; the presentation stage is not too important to explain. The final stage was to display the results of the Machine Learning model in an excellent file, as seen below. We attempted to provide the content of the expected number strings below the actual likelihood for digits that were not predicted with such pleasing precision. This presentation modification was unsatisfactory for the user's time to repair waste by 80 percent. Furthermore, this action is not considered clever because it is unwilling to use the human brain or mental effort. In a few minutes, a customer can easily provide the data and graphically predict the actual outcome. Because many of the forecasts were incorrect, the user did not have to make many corrections.[2]

3.2 multi-criteria recommender systems

Multi-criteria recommendation systems (MCRSs) are recommendation systems that incorporate multi-criteria preference information. This presentation modification was unsatisfactory for the user's time to repair waste by 80 percent. Furthermore, this action is not considered clever because it is unwilling to use the human brain or mental effort. In a few minutes, a customer can easily provide the data and graphically predict the actual outcome. Because many of the forecasts were incorrect, the user did not have to make many corrections.

3.3 Risk-aware recommender systems

As we all know, machine learning algorithms, such as k-Nearest Neighbor, are widely used in information retrieval today, with k-Nearest Neighbor being one of them, primarily for text recognition. Let's look at how k-Nearest Neighbor works. It falls under the supervised learning domain and has numerous applications such as intrusion detection, data mining, pattern recognition, and so on. It is easily related to real-life applications and scenarios, despite the fact that it is non-parametric. DRARS, a system that models context-conscious recommendation as a bandit problem, is one possibility.

3.4 Mobile recommender systems

Mobile Recommenders use smart phones to make personalised, context-sensitive recommendations. The main advantage of using k-Nearest Neighbor is that it does not make any false predictions or assumptions about the distribution of data, unlike GMMs, which assume that the given data has a gaussian distribution and that GMMs work well or accurately when the data has a gaussian distribution. We can provide you with an intuitive understanding of k-Nearest Neighbor in the simplest way possible. K-NN is a type of instance-based learning that is also used as a lazy learning algorithm, in which local approximation of mathematical functions is used. It can be used to solve classification and regression type machine learning problems; for each unseen input on which we need to make a prediction, it simply calculates the first k data points that are closest to the input data point on the basis of Euclidean distance.

3.5 Hybrid recommender systems

The majority of recommended systems now use a hybrid approach that includes collaborative filtering, content-based filtering, and other methods. We don't need to go over the training stages because we already went over the previous part of the KNN-KN algorithm. Even with such simplicity, sophisticated results are possible that would be impossible with certain fantastic and mathematically complex methods. KNN algorithm may also be used for regression issues, making it more versatile than other algorithms. The only difference between the methods and methodology described is that the closest neighbours' averages are used instead of voting. KNN can be written in a single line in many high-level languages such as R, Python, and others. We can now understand the power of a basic machine learning algorithm using very simple mathematical ideas, and, most importantly, you don't have to be a mathematician to understand that method.

3.6 Reproducibility in recommender system research

Previous research had little impact on the actual use of advisory systems. By 2011, Ekstrand, Konstan, and colleagues observed that "replicating and extending system research findings are currently difficult," and that assessments are "not handled consistently." According to Konstan and Adomavicius, "the Recommender Systems research community fails to evaluate and, as a result, to make meaningful contributions," with a significant percentage of articles displaying findings that add nothing to common knowledge. As a result, much of the research on recommending systems cannot be considered repeatable. We don't need to go over the training stages again, as stated in the previous section. The k-NN method is one of the most basic and straightforward regression algorithms. Even with such simplicity, sophisticated results are possible that would be impossible with certain fantastic and mathematically complex methods. KNN algorithm may also be used for regression issues, making it more versatile than other algorithms. The main difference is that instead of votes from the closest neighbours, the averages of the closest neighbours are used. KNN can be written in a single line in many high-level languages such as R, Python, and others. We can now understand the power of a basic machine learning algorithm using very simple mathematical ideas, and, most importantly, you don't have to be a mathematician to understand that method.

4 Design and Implementation

4.1 Architectural Design of the Work

Following the selection of the value chain, the fourth stage of the project cycle is design and implementation. Even with such simplicity, sophisticated results are possible that would be impossible with certain fantastic and mathematically complex methods. KNN algorithm may also be used for regression issues, making it more versatile than other algorithms. It is critical that the analysis continue throughout the implementation phase in order to lead the adjustments in the competitiveness strategy in response to the market, the facilitating environment, or the chain itself. Python, for example. So we now understand the power of such a simple method based on very simple mathematical principles.

Figure 1: Complete Design flow

1. Data Acquisition: Have used AWS or Amazon web services application programming interface to retrieve image data from Amazon's server, which is nothing more than a python script that when run, connects to a remote server and makes a request for image data, which initiates an automatic download of that data on the client's computer.

2. Data cleaning: The total number of features is 19 and the first number of data points after collections is 183138. The 19 features include terms like 'asin', 'author,' 'titile,' and so on. However, only a few of these are useful for uniquely identifying items, so we excluded all others and kept features like ‘asin', ‘brand', ‘colour', ‘product type name', ‘medium image url', ‘title', and ‘formatted price'.

3. Removing near duplicate items: We have a pool image with digits, and each digit is put into collect, which is similar to the font of digits found on receipts; this allows us to reduce the size of training data, which is actually unnecessary. Let's take a look at the training dataset.

4. Data pre-processing: The similarity between average Word2Vec and IDF Word2Vec is that in average Word2Vec, we are supposed to take the IDF at 1, but here we run Word2Vec for each word, W. (j). We start with a 300-dimensional vector, then multiply the IDF (inverse document frequency) by that word, and finally sum all cells of each vector to create a new vector.

5. Computing IDF: IDF is always calculated for the specific word Wj and the specific document corpus D, with IDF(Wj,D) equal to log(Wj,D) (number of titles in Corpus D, divided by a number of titles in the D containing the term Wj).

6. Computing Word2Vec:

• Let T(i) be a title with "k" words, 1....,wk, and we get a 300-dimensional vector with each word W(j) by only running the Word2Vec model on words W(j) (j). • For each T(i) title, multiply the 300dimensional vector of each W(j) word from the previous step by the W(j) IDF obtained in step 1.

• Make a vector of size 300, with each cell's value equal to the sum of the values of all the associated cells of each word W vector (j). Finally, each cell of the resulting vector is divided by an IDF sum of each word W(j) in the title (derived from step1).

7. Euclidean distance calculation:

We can provide you with an intuitive understanding of k-Nearest Neighbor in the simplest way possible. K-NN is a type of instance-based learning that is also used as a lazy learning algorithm, in which local approximation of mathematical functions is used. It can be used to solve classification and regression type machine learning problems; for each unseen input on which we must make a prediction, it simply calculates the first k data points that are closest to the input data point on the basis of Euclidean distance. As a result, computer vision techniques are increasingly relying on intelligent algorithms and systems. In fact, pixel computing, also known as visual computing, has emerged as a critical component of making this technology work effectively. However, computer vision is more than just machine learning in action. When ML is not an important feature or aspect, it includes 3D scene modelling, geometry of multi-view camera, structure-out of motion, stereo correspondence, cloud processing, and motion prediction.[3]

4.2 Details of Inputs/Data Used

The data were obtained using the public Amazon product advertisement API https://docs.aws.amazon.com/AWSECommerceService/latest/ DG/Welcome.html

1. Number of data points: 183138

2. Number of features/variables: 19

4.3 Experimental Scenarios

4.3.1 Bag of Words results

Because the result indicates that pictures with more frequent terms than the query title have less Euclidian range, the search results include more preference, for example, in "pink tiger tiger zebra stripes xl xxl" the most common words, thus less Euclidian distance.

4.3.2 Tf-idf

It is also a frequency-based algorithm, so the results are similar to BagofWords, but it outperforms BagofWords because it considers the presence of the word across the entire data corpus.

4.3.4 Idf Weighted Word2Vec Results

In addition to the results of BagofWords and tf-idf, we obtained the results shown in (Fig. 4) from idf weighted Word2Vec, which were not provided by the previous two algorithms at all because it combines both semantic and frequency-based results. It produces results based on the fact that a word appears frequently in a title and that very few titles in the data corpus should contain that specific word.

4.4 Performance evaluation

The results of Bag of Words and TF-IDF show that these algorithms are solely based on the frequency of words in the title and across the entire data corpus, whereas algorithms such as average word2vec place a greater emphasis on semantically based recommendations. full documents and the ability to service large user populations Eventually, paperless electronic systems that provide a wide range of individually tailored information services to users may be developed. Many new features, such as natural language recognition, graphics processing, speech recognition, and low-cost point-to-point communications, could be included in such systems.

5 Experimental Results & Analysis

The results of Bag of Words and TF-IDF show that these algorithms are solely based on the frequency of words in the title and across the entire corpus of data, whereas methods like the mean word2vec emphasise semanticized suggestions. When these two types of algorithms are combined, results that display both semanticized and textual goods are obtained. Word2Vec results with IDF weighting.

INPUT

OUPUT

5.1 Comparison with other contemporary works

This article discusses clothing techniques for women. Clothing is an item that is worn on the body. The preceding chapters' content covers current information retrieval theory and practise, as well as potential expansions that, in theory, can be applied in the right environment at the same time. This final chapter discusses cutting-edge concepts and technology that can go beyond the current state of the art. Technological advancements are described that will significantly alter the current search and retrieval process, and theoretical developments are addressed that will provide new insight into the search and retrieval functions of information. One of the characteristics of today's operational recovery systems is the significant investment in people and resources required to provide even simple recovery services. Creating and implementing retrieval systems are important jobs; significant resources are also required to create or acquire data bases that will be modified and searched. In some cases, even minor procedural changes necessitate careful consideration.

6 Conclusions and Future Scope

6.1 Conclusion

The following results show the benefits of using idf-weighted word2vec's Bag of Words, tf-idf method. It provides a better forecast of future customer requirements due to semantic results, and frequency-based results provide predictions based on the user query product. The preceding chapters' content covers current information retrieval theory and practise, as well as potential expansions that, in theory, can be applied in the right environment at the same time. This final chapter discusses cutting-edge concepts and technology that can go beyond the current state of the art. Technological advancements are described that will significantly alter the current search and retrieval process, and theoretical developments are addressed that will provide new insight into the search and retrieval functions of information. One of the characteristics of today's operational recovery systems is the significant investment in people and resources required to provide even simple recovery services. Creating and implementing retrieval systems are important jobs; significant resources are also required to create or acquire data bases that will be modified and searched. In some cases, even minor procedural changes necessitate careful consideration. We can provide you with an intuitive understanding of k-Nearest Neighbor in the simplest way possible. K-NN is a type of instance-based learning that is also used as a lazy learning algorithm, in which local approximation of mathematical functions is used. It can be used to solve classification and regression type machine learning problems; for each unseen input on which we need to make a prediction, it simply calculates the first k data points that are closest to the input data point on the basis of Euclidean distance.

6.2 Future scope

Companies like Amazon spend a lot of money on research and development to improve the existing algorithms. We can provide you with an intuitive understanding of k-Nearest Neighbor in the simplest way possible. K-NN is a type of instance-based learning that is also used as a lazy learning algorithm, in which local approximation of mathematical functions is used. It can be used to solve classification and regression type machine learning problems; for each unseen input on which we need to make a prediction, it simply calculates the first k data points that are closest to the input data point on the basis of Euclidean distance.

References

[1] R. C. Bagher, H. Hassanpour, and H. Mashayekhi, “User trends modeling for a content-based recommender system,” Expert Syst. Appl., vol. 87, pp. 209–219, 2017.

[2] M. S. Tajbakhsh and J. Bagherzadeh, “Microblogging hash tag recommendation system based on semantic TF-IDF: Twitter use case,” Proc. - 2016 4th Int. Conf. Futur. Internet Things Cloud Work. W-FiCloud 2016, pp. 252–257, 2016.

[3] G. Carullo, A. Castiglione, and A. De Santis, “Friendship recommendations in online social networks,” Proc. - 2014 Int. Conf. Intell. Netw. Collab. Syst. IEEE INCoS 2014, pp. 42–48, 2014.

[4] J. Hannon, M. Bennett, and B. Smyth, “Recommending twitter users to follow using content and collaborative filtering approaches,” Proc. fourth ACM Conf. Recomm. Syst. - RecSys ’10, p. 199, 2010

[5] S. Aral and D. Walker, “Identifying Influential and Susceptible Members of Social Networks,” Science, vol. 337, no. 6092, pp. 337- 341, June 2012.

[6] Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg S.; Dean, Jeff (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems.

[7] Levy, Omer; Goldberg, Yoav; Dagan, Ido (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings.

[8] Luhn, Hans Peter (1957). "A Statistical Approach to Mechanized Encoding and Searching of Literary Information" (PDF). IBM Journal of Research and Development.

[9] Breitinger, Corinna; Gipp, Bela; Langer, Stefan (2015-07-26). "Research-paper recommender systems: a literature survey". International Journal on Digital Libraries.

[10] Breitinger, Corinna; Gipp, Bela; Langer, Stefan (2015-07-26). "Research-paper recommender systems: a literature survey". International Journal on Digital Libraries.

[11] Sivic, Josef; Zisserman, Andrew (2003-01-01). Video Google: A Text Retrieval Approach to Object Matching in Videos. Proceedings of the Ninth IEEE International Conference on Computer Vision.

[12] Langer, Stefan; Gipp, Bela (2017). "TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users' Personal Document Collections".

[13] S. Aral and D. Walker, “Identifying Influential and Susceptible Members of Social Networks,” Science, vol. 337, no. 6092, pp. 337- 341, June 2012.

[14] Banerjee, Imon; Chen, Matthew C.; Lungren, Matthew P.; Rubin, Daniel L. (2018). Radiology report annotation using intelligent word embeddings: Applied to multi-institutional chest CT cohort.

[15] Visualizing Data using t-SNE . Journal of Machine Learning Research, 2008. Vol. 9, pg. 2595. Retrieved 18 March2017.

[16] V.D.Ambeth Kumar, Dr.M.Ramakrishnan, V.D.Ashok Kumar and Dr.S.Malathi (2015) “Performance Improvement using an Automation System for Recognition of Multiple Parametric Features based on Human Footprint” for the International Journal of kuwait journal of science & engineering, Vol 42, No 1 (2015), pp:109-132

[17] T Ramya, S Malathi, GR Pratheeksha, VDA Kumar, " Personalized authentication procedure for restricted web service access in mobile phones", Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014) (DOI: 10.1109/ICADIWT.2014.6814702)

[18] VDA Kumar, D Elangovan, G Gokul, JP Samuel, VDA Kumar, " Wireless sensing system for the welfare of sewer labourers", Healthcare technology letters 5 (4), 107-112. DOI: 10.1049/htl.2017.0017

[19] Kumar, V.D.A., Sharmila, S., Kumar, A. et al. A novel solution for finding postpartum haemorrhage using fuzzy neural techniques. Neural Comput & Applic (2021). https://doi.org/10.1007/s00521-020-05683-z

[20] Ambeth Kumar V.D., Ramakrishan M. (2011) Footprint Based Recognition System. In: Das V.V., Thomas G., Lumban Gaol F. (eds) Information Technology and Mobile Communication. AIM 2011. Communications in Computer and Information Science, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20573-6_63.