Article Information
Corresponding Author: Nammee Moon* , nammee.moon@gmail.com
Jinah Kim*, Dept. of Computer Science and Engineering, Hoseo University, Asan, Korea, jina9406@gmail.com
Junhee Park*, Dept. of Computer Science and Engineering, Hoseo University, Asan, Korea, cach456@gmail.com
Minchan Shin*, Dept. of Computer Science and Engineering, Hoseo University, Asan, Korea, shinmc0322@gmail.com
Jihoon Lee*, Dept. of Computer Science and Engineering, Hoseo University, Asan, Korea, develomona@gmail.com
Nammee Moon*, Dept. of Computer Science and Engineering, Hoseo University, Asan, Korea, nammee.moon@gmail.com
Received: March 19 2020
Revision received: September 15 2020
Accepted: October 4 2020
Published (Print): August 31 2021
Published (Electronic): August 31 2021
1. Introduction
Recommendation systems are being widely used in a variety of contexts, and the datasets used in the research related to them have evolved as the accumulation of user data has become easier because of the proliferation of smart devices. In previous recommendation systems, recommendations were based on a user’s item rating using content-based, collaborative, and hybrid filtering methods [1-3]. In content-based filtering, a user recommends an item that has properties similar to a previous item that the user prefers, whereas in collaborative filtering, other users prefer items that are similar to the user’s preference. To overcome the shortcomings of the existing recommender systems, the hybrid method that combines the content-based and collaborative filtering systems.
Recently, various types of recommendation systems have been developed to increase the accuracy of personalized recommendation prediction by linking more types of data to users or items. Recommendations can be obtained by acquiring additional information to understand the user’s preferences, such as the user’s post-purchase reviews, social relationships through SNS, and behavior logs until purchase [4,5]. Although these methods help predict the user’s preferences, it is difficult to understand the factors affecting the user’s purchasing decisions. To this end, research is being conducted on a multi-criteria recommendation system that integrates ratings according to the attributes of an item into one rating [6,7]. Because it can reflect the priority of each user’s item decision, this system can achieve higher prediction accuracy compared with traditional recommendation systems that use a single rating [6,8].
Multi-criteria recommendation systems generally require large amounts of rating input from the user, which can be cumbersome for the user. In addition, the rating is a numerical value that summarizes the satisfaction of the user and cannot explain the reason for the user’s evaluation. To compensate for this, many studies using explicit data such as ratings and implicit data, such as reviews, have been conducted and have demonstrated significantly high prediction accuracy [9-11]. Both explicit and implicit data can be used to reflect detailed features of user satisfaction with items effectively.
However, even if implicit data are used, the factors affecting the user’s purchasing decisions are not necessarily included in the data. Because product reviews evaluate item purchases, evaluating the user’s sentiment after purchase is possible, but determining what the user focuses on when purchasing the product is challenging.
This study proposes a method for generating new recommendation candidates by grasping the factors affecting the user purchase decisions using reviews in a multi-criteria recommendation system. It predicts ratings using multiple criteria for reviews and predicts the user purchase decision priority. Using deep learning methods on the user’s reviews, ratings by multi-criteria are inferred, and in addition to these results, learning weights about the effects of multi-criteria ratings on the overall rating are extracted. This result becomes the user’s purchasing decision factor and reflects this to derive the final rating for the user and item. The result comprises rating scores that are optimized for the tendencies of users to purchase items.
The remainder of this paper is organized as follows. Section 2 describes the research on the multicriteria recommendation system and rating prediction from review data. Section 3 provides an overview of the new recommendation candidate methods proposed in this study, and Section 4 describes in detail each model of the proposed recommendation system. Section 5 compares and analyzes the existing approaches and the proposed method. Finally, in Section 6, we present our conclusions and suggest directions for future research.
2. Related Works
2.1 Multi-Criteria Recommendation System
Multi-criteria recommendation systems have been studied extensively. These systems are used to deal with uncertainty in the decision-making process. The difference between the existing single-criteria rating system and the multi-criteria rating system is that the latter has more information about the user or item [12]. Because the overall rating does not contain information about why the user selected such a rating, it is possible to address the difficulty of knowing the exact user preference from the overall rating by adding a rating for multiple criteria [8]. The general form of a rating in a multi-criteria recommendation system can be expressed as Eq. (1) [12], in which [TeX:] $$R_{0}$$ is the overall rating and [TeX:] $$\left\{R_{c}\right.$$ is the rating value for each criterion [TeX:] $$c(c=1, \ldots, \mathrm{k}).$$
Previous multi-criteria recommendation-related studies focus on the process of defining criteria and calculating ratings through data statistics or data mining [13]. Although the validity of criteria has been shown in past studies, the process of automatically extracting ratings with real-time performance is required, because the data become larger and both implicit and explicit data are used together. Alotaibi [10] proposed an improved recommendation method using multi-criteria ratings, employing social networks as implicit data. Ebadi and Krzyzak [9] proposed a highly accurate hotel recommendation system that used a multi-aspect rating. This system proposes personalized and customized hotels to users by extracting implicit features from user reviews through natural language processing and topic modeling.
Although the accuracy of recommendations has increased due to the use of implicit data, the direction of research is changing because of the nature of the recently collected data, which requires a change to the learning base. Nassar et al. [14] proposed a novel deep multi-criteria collaborative filtering model for recommendation systems and effectively predicted the criteria rating and overall rating based on deep learning. Hassan and Hamada [15] proposed a neural network approach to improve the accuracy of multicriteria recommendation systems. Compared with single-criteria rating systems, the performance of their system was shown to be significantly better.
However, the systems proposed in studies conducted to date are lacking in their ability to sufficiently extract complex features between users and items according to multiple criteria. Specifically, predicting the user’s rating by multiple criteria for an item indicates the preference for the item but does not mean the priority when the user selects the item. Therefore, a supplementary method is needed. This study proposes a method for extracting a user’s priority features.
2.2 Prediction of Rating Using Review Data
Both explicit and implicit data are being used to improve the accuracy of recommendation systems. Users’ intentions or sentiments can be extracted using text data, such as product review data, SNS data, audio data, and video data. Among these, review data, which are implicit data, are most frequently used in the field of recommendation systems. Review data are used in recommendation systems to re-adjust ratings or make new predictions using sentiment analysis.
In particular, many studies have been conducted to improve the accuracy of recommendations by grasping the intentions or sentiments of users by linking the ratings with reviews [16]. When proceeding based on sentiment, text mining based on natural language processing (NLP) has been used. The methods of determining the polarity are point-wise mutual information (PMI) and semantic orientation from PMI (SO-PMI), and the polarity can be determined using a preset set of positive and negative words. De Albornoz et al. [17] conducted a study evaluating the overall sensitivity of a product by extracting the product characteristics from the review contents and assigning weights to each product feature. Zhang et al. [18] extracted product features and user opinions through sentiment analysis of user reviews, and personalized recommendations were made according to the users’ interests and product features.
However, most existing studies have limitations that cannot be overcome by understanding the context of the overall review content. To improve this, a neural-network-based sentiment analysis method has been proposed. As various types of deep learning-based sentiment analyses have been performed, a hybrid method has recently been proposed to suit the characteristics of data, because the result obtained by using a combination of several techniques in a hybrid format is better than that of using one technique [19]. In particular, there are many combinations of convolutional neural networks (CNNs), which can automatically extract features from data, and long short-term memory (LSTM), which can grasp data according to the overall time sequence. Based on a movie review, Park and Kim [20] confirmed that the excellent performance of a model using CNN-LSTM in combination with the traditional single learning technique (CNN, LSTM). Yenter and Verma [21] described a novel approach for sentiment analysis using a combined kernel from multiple branches of a CNN with LSTM layers. Wang et al. [22] further proposed a deep neural network (DNN) architecture based on CNN-LSTM attention by adding more attention and confirmed that it showed better performance compared with the architectures proposed in previous studies. An and Moon [23] proposed a system that enables custom recommendations for tourist spots through CNN-LSTM-based sentiment analysis for reviews and classification of seasons and weather. In recent years, studies using bidirectional learning using both previous and future data have been conducted [24]. Based on previous research, this study focuses on a multi-criteria rating prediction based on CNN and bidirectional long short-term memory (BiLSTM).
3. System Overview
This study describes a method for generating recommendation candidates based on user priority in multi-criteria recommendation systems. The goal is to predict the ratings by criteria for items through reviews and to obtain weights by grasping user-defined criteria priorities. Finally, multi-criteria recommendation systems based on the proposed candidate generation method are used to improve the satisfaction of personalized recommendations by calculating the new rating. It is significant in that it reflects different priorities for the multiple criteria of sophisticated users.
The proposed multi-criteria recommendation system is illustrated in Fig. 1. The data consist of reviews and ratings posted by users for each item. The review data are preprocessed, such as processing stopwords and unifying verb tenses.
Generating a candidate for recommendation can be subdivided into three main steps. First is a process of deriving a rating for criteria for reviews through context identification and using it to derive the priority of items and users. Item priority is derived by synthesizing the predicted ratings for each review by item. User priority is derived by extracting weights in the process of deriving the overall rating from the ratings for each criterion predicted through linear regression. Second, the two processes are then combined to predict a new score for each user-specific item. Finally, the top-N recommendation list is provided to the user from the recommendation item candidates created by the previous process.
This study focuses on the proposed process for generating the candidates. In Section 4, the detailed process is divided into CNN-BiLSTM-based multi-criteria rating prediction, linear regression-based user priority prediction, and prediction of the overall rating.
4. Method of Recommended Item Candidate Generation
The process of the recommended item candidate generation is illustrated in Fig. 2. Based on the user’s reviews, the multi-criteria ratings are predicted, and the overall rating is predicted again by synthesizing all ratings. For this, a CNN-BiLSTM-based prediction model and a linear regression-based user priority prediction model are used.
First, a review table and rating table between the user and item were obtained by pre-processing the raw data. Using the CNN-BiLSTM model, a predicted multi-criteria rating table (PMR) was obtained from the review data. The PMR becomes the input value of the linear regression-based model along with the rating table, and the user’s priority of criteria (UPC) and the user’s predicted overall rating (UPR) are obtained. The item’s priority of criteria (IPC) is an average value calculated by grouping each item from the PMR, which means the item’s priority of criteria. Finally, by integrating all of the UPC, UPR, and IPC, a final user-item score (FS) is derived to generate a recommendation candidate for the user.
4.1 CNN-BiLSTM-based Multi-Criteria Rating Prediction
The process of the CNN-BiLSTM model for multi-criteria rating prediction is shown in Fig. 3. Using the review data that the user has left on the item, supervised learning is performed based on a rating for each criterion. Through this, the prediction model, CNN-BiLSTM, is used to derive prediction scores for criteria for evaluation data.
Process of recommended item candidate generation.
The CNN layer of this model applies three 1-dimensional convolution layers to extract features. The first convolution layer sets the output size to 256, the size of the kernel as 7, and the size of the maxpooling as 4. The second convolution layer sets the output size as 128, the size of the kernel as 5, and the size of the max-pooling as 3. The last convolution layer sets the output size as 64, the size of the kernel as 3, and the size of the max-pooling as 2. For the next layer of the CNN, the BiLSTM layer is applied to improve the accuracy by predicting the words and modeling the sequence vector. After the BiLSTM layer, a dropout layer is added. Multiple networks in the training model lead to high operational requirements and time-consuming operations. To solve this problem, we used the dropout layer to randomly turn off the node and reduce the probability of the problem. Finally, to derive the predicted rating score of each criterion of the IPC matrix, RMSprop was used as the sigmoid activation function and optimization function.
CNN-BiLSTM model for multi-criteria rating prediction.
Through these processes, the PMR is generated by predicting the score for the criteria corresponding to the review data of each item. The IPC consists of the average value of the predicted rating, which is determined by grouping the items in the review table.
4.2 Linear Regression-based User Priority Prediction
To derive the priority that users consider important among multiple criteria, this study uses a linear regression model consisting of one layer. If m is the number of users and n is the number of criteria, this model predicts [TeX:] $$R_{m},$$ the overall rating of user m, with the criteria rating [TeX:] $$C R_{m, n}$$ predicted in the previous CNN-BiLSTM model. This process can be expressed as Eq. (2), and the weight [TeX:] $$U w_{m, n}$$ for each criterion of user m is derived through linear regression.
This supervised learning model consists of only an input and an output layer. The sigmoid activation function is used in the learning method as a learning model for each user to derive the overall rating using CR. Furthermore, the sigmoid activation function is used in the output layer and serves to normalize the overall rating from learning in the input layer to a value between 0 and 1. In addition, Adam, which combines the advantages of Adagrad and RMSProp, is used as an optimizer; it is known to perform a stable descent for optimization even when the gradient increases. The mean squared error (MSE) was used as the loss function; it has shown excellent performance in numerical prediction as a method of determining the part showing the features of the error by squaring the distance difference. Since this model is a process of deriving the weight [TeX:] $$U w$$ through a linear regression model, the dropout layer is not used. When learning is completed, a matrix UPC representing the user’s priorities is constructed using the learning weights extracted from the layer, as shown in Fig. 4, and the UPR, which is the overall rating matrix predicted by the learning result, is constructed.
Process of user priority prediction based on linear regression.
4.3 Prediction of Overall Rating
In the previous process, the IPC, UPC, and UPR matrices were derived based on the ratings for each user and item. In this process, the derived matrices are aggregated to derive a FS between users and items.
First, it is checked whether the user has a rating for the item. This rating has the highest accuracy and reliability because it is the score that the user actually rated for an item. Any rating is used as is. If no rating exists, because the item has not been purchased previously, the similarity between users is obtained using UPR. This is calculated based on cosine, and the closer the value for the item prioritized for each criterion, the closer it is to 1. Based on this, the rating of the item that the user has not experienced is replaced with the average value of the rating of a similar upper user.
Finally, for the rating of an item that cannot be obtained by the previous two methods, the similarity between the items is obtained using IPC. The similarity is calculated based on cosine as previously obtained for the user similarity, and the closer the item’s rating for each criterion is, the closer it is to 1. The rating is applied to the FS for items that are most similar to the items previously purchased by the user. The final FS matrix is derived by synthesizing the above methods, and based on this, the item with the top rating to the user is recommended.
5. Experiment
5.1 Collected Data
The experiment was conducted using review data on “TripAdvisor” provided by Wang et al. [25]. These data are suitable because the criteria are clearly demarcated compared to other data and provide a rating for the criteria. The details about the dataset, collected over a period of 10 years, from April 2002 to September 2012, are presented in Table 1.
The dataset used in the experiment
The total number of criteria provided by “TripAdvisor” is eight, which is the same as service, cleanliness, rooms, value, sleep quality, business service, and check-in/front desk. However, in the case of business service, check-in/front desk was excluded because it had a missing value of 90% or more. As shown in Table 2, six criteria-specific ratings and overall ratings reflecting them were collected. Service denotes the user’s rating for hotel services, cleanliness is the user’s rating for hotel cleanliness, rooms is the user’s rating for hotel rooms, location is the user’s rating for hotel location, value is the user’s rating for hotel prices, and sleep quality refers to the user’s rating of the quality of sleep in a hotel.
Because the proposed model performs personalized learning, it is required to retain a minimum of individual training data, and thus, filtering was performed only when the user left at least 13 or more reviews. These data contained many public IDs that many people could access, and thus, they were identified and removed.
Description of multi-criteria ratings used in the experiment
5.2 Environment
The model proposed in this study was designed and implemented using TensorFlow and Keras. The detailed experimental environment is presented in Table 3.
5.3 Experiment Result
To evaluate the performance of the proposed recommendation service, the proposed model is set to M1, and the comparison model, the singular value decomposition (SVD)-based matrix decomposition method, is set to M2. The SVD-based matrix factorization is a method for predicting an item that has not been evaluated through matrix factorization based on the user’s evaluation of the item.
In the performance evaluation, after dividing the data into 70% of the training data and 30% of the test data based on time, the training was conducted, and the predicted recommendation list and the actual stay in the 30% data were checked. Precision, recall, and F-measure values were calculated, as shown in Table 4. The precision is the ratio of the hotel the user stayed at among the hotels predicted, as in Eq. (3), and recall is the ratio predicted from the hotel stayed by the user, as in Eq. (4). Then the F-measure, based on precision and recall, is calculated using Eq. (5).
The precision, recall, and F-measure values were calculated when the number of recommendations was from 1 to 300; the results are shown in Fig. 5. Although M2 showed high performance in some parts depending on the number of recommendations, the proposed model M1 generally shows high performance. Table 5 compares the average values of precision, recall, and F-measure when the number of recommendations ranges from 1 to 300. The precision, recall, and F-measure were approximately 32.4%, 24.5%, and 32.6%, respectively, which were higher than those of M2.
The general recommendation model uses only explicit data and ratings, whereas the proposed model suggests recommendations only using reviews, which are implicit data. Therefore, this study is meaningful in that it predicted the multi-criteria rating from reviews to grasp the user’s priority, and predicted the recommendation to show better performance than the existing recommendation model.
Result of performance evaluation between SVD-based matrix factorization model and supposed system.
An average result of performance comparison
6. Conclusion
We proposed a method for generating a new recommendation candidate for users by employing a learning model that predicts multiple criteria ratings and overall ratings from reviews. Using the proposed model, it is possible to grasp the criteria that the user considers important with the review data written by the user, and provide a personalized recommendation by assigning it as a weight.
CNN-BiLSTM was used to predict the user’s rating for each criterion through word embedding in the user review data, and the overall rating was predicted using a linear regression model. In this process, we derived and used the priorities of criteria of the user and the item. The user’s priority of criteria uses the weights extracted from the linear regression model, whereas the item’s priority of criteria represents an average value obtained by grouping the rating for each criterion based on the item. Subsequently, a recommendation candidate is generated by synthesizing the predicted overall rating of the user’s item. The proposed method was applied to the user’s hotel recommendation using the “TripAdvisor” dataset. The experiment confirmed the high performance of the proposed model.
This study contributes to the research of recommendation systems. The proposed system uses both implicit and explicit data to generate recommendation candidates by inferring the priorities of users and items. In most cases, each detailed multiple criteria has no rating but instead has only an overall rating and review; thus, the method proposed in this study is suitable for such a system.
However, as most of the publicly disclosed data exclude personal information (gender, age, preference, etc.) due to data privacy issues, there was a limit to subdividing users’ priorities as the data used in this study. It can also cause cold-start issues owing to a lack of data when not reviewed by the user. We need other information from the user to compensate for this problem. To improve these limitations in the future, we will improve the accuracy of personal recommendations through linkage with other relevant data.
Acknowledgement
This research is supported by Ministry of Culture, Sports and Tourism (MCST) and Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research & Development Program 2020 (No. R2018020083).