1. Introduction
In the era of networked media, a growing number of individuals have turned to online platforms to gather information, voice their opinions, and express their emotions [1]. As a widely-used social medium disseminating copious real-time information, analyzing the emotional content of microblog comments holds significant practical importance for platform management and public opinion regulation [1]. Sentiment analysis primarily involves scrutinizing the content of generated text to discern its emotional polarity: positive or negative [2]. There are three principal approaches to sentiment analysis: those based on emotion dictionaries [3], machine learning [4], and deep learning [5] for emotional analysis.
The emotion dictionary-based approach necessitates the construction of an emotion lexicon. HowNet stands as the most prevalent Chinese emotion dictionary [6]. This lexicon constitutes a crucial emotional resource for sentiment analysis, applicable to tasks of varying granularity such as words, phrases, and attribute sentences [7]. Leveraging HowNet and SentiWordNet dictionaries, Zhou et al. [8] deconstructed Chinese words, calculated their emotional polarity, established the SLHS Chinese dictionary, and utilized the SVW classifier to analyze emotions in microblog texts, achieving an accuracy of 83.84%. Addressing issues of scale and colloquial word scarcity, Zhao et al. [9] constructed a 100,000-scale emotion lexicon based on extensive microblog data, resulting in a 1.13% improvement in microblog emotion classification performance. Jiang et al. [10] addressed unknown emotional words in text content by selecting HowNet as the seed and constructing a domain-specific emotion dictionary using pointwise mutual information (PMI) and Word2vec algorithms, yielding a substantial enhancement in dictionary accuracy compared to others.
Machine learning models have progressively found application in sentiment classification. Pang et al. [
11] pioneered the incorporation of machine learning into emotional analysis. Through experimental comparisons on film reviews, support vector machines emerged with the highest efficacy, achieving an accuracy of 82.92%. Li et al. [
12] introduced a multi-label maximum entropy-based machine learning model, applying it to datasets comprising Twitter, microblog, and other comments, achieving an accuracy of 86.06%. Kaur et al. [
13] leveraged N-grams for feature extraction and, in conjunction with the knearest neighbor classification algorithm, raised accuracy to 82%. Zhang et al. [
14] integrated the emotional dictionary with a machine learning model, augmenting emotional characteristics with negative words in the health domain, and subsequently applied the model to classify negative topics on microblogs, yielding an accuracy of 74.1%.
Advancements in technology have facilitated the integration of deep learning into natural language processing (NLP)-based emotion analysis. In cases of intricate content, hybrid models prove superior to singular models for certain sentiment classification tasks. Sun et al. [15] harnessed the GloVe model for word vector training, employing the bidirectional gated recurrent unit (BiGRU) for contextual feature extraction, and incorporated an attention mechanism to achieve emotion classification. This model exhibited an accuracy of 91.21% on the IMDB dataset. Zhao et al. [16] introduced the bidirectional long short-term memory (BiLSTM) and a convolutional neural network (CNN) serial hybrid network to comprehend lexeme information in texts, achieving an accuracy of 91.31% on a dataset encompassing review texts from six fields. Miao et al. [17] argued that the CNN-BiGRU model effectively captured static and sequential features through CNN and bidirectional GRU, thereby simplifying the feature extraction process, reducing dimensionality, and enhancing accuracy and training efficiency. Yang et al. [18] leveraged GloVe for preprocessing, incorporating an attention mechanism into BiGRU to extract critical information from texts, and further extracted features through the forward attention mechanism, resulting in an augmented effectiveness in sentiment classification. In response to the challenge of embedding models struggling to incorporate word sentiment information characteristics, Yan et al. [19] proposed the parallel CNN and BilSTM-attention model for evaluating JD e-commerce review datasets, substantially enhancing overall effectiveness. Fan and Li [20] generated character and word vectors using FastText, employing the GRU neural network for sentiment classification, leading to a 92% improvement in sentiment classification accuracy for networked short texts. Yang et al. [21] addressed the issue of neglecting the potency of sentiment feature words and introduced an enhanced BiLSTM-CNN+Attention model based on a comprehensive emotional dictionary, proficiently extracting semantic features and elevating accuracy. Shen et al. [22] introduced a model integrating knowledge-enhanced semantic representation with a dual attention mechanism, amalgamating contextual and sentiment features to enhance sentiment analysis accuracy. Ali Al-Abyadh et al. [23] conducted a comparative study between single models and various hybrid deep learning models, revealing that the hybrid deep learning-support vector machine model elevated sentiment analysis accuracy by 91.3%. Hu et al. [24] integrated a multilayer attention mechanism, combined with BiGRU and multi-granularity convolutional neural network, achieving a sentiment analysis accuracy of 92.75% in the hotel review dataset. Khan et al. [25] applied a combination of machine learning models and deep learning hybrid modes to scrutinize the sentiment of Urdu language reviews. The experiments demonstrated that models utilizing Bert pre-trained word embedding exhibited exceptional effectiveness. Wu et al. [26] engineered a sentiment classifier based on word embedding and lexical polarity analysis, implementing a two-level long short-term memory network, effectively resolving the challenge of applying the model to high-quality training sets characterized by high label accuracy.
The method founded on emotion dictionaries only considers the semantics of individual words, neglecting contextual semantic information. Additionally, it demands substantial time investment for constructing an emotion lexicon, imposing inherent limitations. Machine learning methods hinge on human annotation and struggle to discern deeper semantics. Although deep learning models excel in sentiment classification, there remains a need for refinement in efficiently eliminating special symbols and accurately comprehending contextual semantic information, particularly in the face of complex microblog comments, regardless of whether a singular or hybrid model is employed. In this study, a TextCNN-BiLSTM hybrid model is employed to comprehensively extract both local and contextual features, thereby enhancing the efficiency of sentiment classification for microblog comments.
2. Theoretical Basis
2.1 Word2vec Vectorization
Initially, sentiment analysis necessitates the vectorization of the target comment texts. Common vectorization methods encompass One-hot, Word2vec, Glove, and BERT. Word2vec constructs word embeddings through skip-grams or continuous bag-of-words (CBOW). CBOW determines the central word based on several consecutive words preceding and following it, enabling it to predict the target word in context. In contrast, Skip-gram predicts its context from the central word. This article employs the CBOW model for word embedding.
2.2 TextCNN Network Structure
CNN, a feedforward neural network with a convolution structure, comprises convolution and pooling layers, proficient at extracting local features for text classification [27]. TextCNN represents a variant of CNN that can simultaneously employ filters of varying sizes to extract features of different dimensions from the text, thereby obtaining representative features. The TextCNN model consists of convolution, pooling, and classification layers. The schematic representation of the TextCNN structure is illustrated in Fig. 1.
At the heart of TextCNN lies the convolutional layer, responsible for acquiring diverse text features through multiple convolutional kernels. The calculation formula is as follows:
where f denotes the activation functions, encompassing Tanh, ReLU, sigmoid, and others. [TeX:] $$w_{i(x, y)}$$ signifies the weight of the filter input node corresponding to the i node in the output matrix; (x, y), denotes the value of the node [TeX:] $$c_{(x, y)}$$ in the filter; [TeX:] $$b_i$$ represents the bias corresponding to the i node. Local feature extraction is achieved by employing three filters with convolution kernel sizes of 1, 3, and 5, culminating in the final result of the convolution layer [TeX:] $$h_i$$.
The pooling layer serves to reduce the dimensionality of features post-convolution and guards against overfitting. Given that BiLSTM must be integrated after TextCNN, retaining the spatial information of the text becomes imperative. Since pooling would lead to the loss of this information, it is omitted in this study.
The classification layer amalgamates the features extracted from the pooling layer into a composite vector. Subsequently, this vector undergoes classification using the softmax classifier to accomplish emotion classification, with dropout implemented as a preventive measure against overfitting.
Text convolutional neural network.
2.3 BiLSTM Network Structure
LSTM network, a type of recurrent neural network [28], establishes a crucial link between texts and sentences due to the diversification and complexity inherent in Chinese texts. Microblog comments encapsulate users’ sentiments and perspectives expressed in varying tenses. Hence, to gain a comprehensive contextual understanding, employing bidirectional LSTM becomes imperative.
BiLSTM comprises two LSTM networks operating in reverse, jointly determining the output of the entire network. It remedies the limitation of LSTM’s inability to encode information in a bidirectional manner. The internal structure of BiLSTM is depicted in Fig. 2.
[TeX:] $$f_t$$ represents the forgetting gate. This calculation allows us to choose which information from the upper layer should be forgotten. The formula for the forgetting gate is as follows: σ denotes the activation function, [TeX:] $$W_f$$ signifies the weight of the forgetting gate, and [TeX:] $$b_f$$ represents the offset of the forgetting gate.
[TeX:] $$i_t$$ represents the input gate. The gated state’s input includes the previous moment’s hidden layer [TeX:] $$h_{t-1}$$ and the current input word [TeX:] $$x_t.$$ At this stage, there exists a temporary cell state [TeX:] $$\tilde{c}_t.$$ The calculation formulas for [TeX:] $$i_t \text{ and } c_t$$ are as follows: [TeX:] $$W_i$$ represents the weight of the input gate, [TeX:] $$b_i$$ denotes the offset of the input gate, [TeX:] $$W_c$$ signifies the weight of cell information, and [TeX:] $$b_c$$ represents the offset of cell information.
The internal structure of bidirectional long short-term memory model.
Following the application of the forgetting gate and the input gate, the cell state from the last moment, [TeX:] $$c_{t-1},$$ undergoes an update to yield the new cell state, [TeX:] $$c_t.$$ This can be calculated using the following formula:
[TeX:] $$o_t$$ stands for the output gate, governing the output information of the network structure. The final output is represented by [TeX:] $$h_t$$. The formulas for [TeX:] $$h_t \text{ and } o_t$$ are as follows: [TeX:] $$W_o$$ denotes the weight of the output gate, and [TeX:] $$b_o$$ signifies the offset of the output gate.
3. Network Structure Design
3.1 Research Process
The research process in this article unfolds as follows: initially, comment data is inputted. We employ the Jieba Word segmentation tool to process the inputted comments, forming word vectors through Word2vec. Subsequently, the results of word vectorization are fed into the TextCNN model for feature extraction, while BiLSTM is employed for context feature extraction. Finally, sentiment polarity classification is accomplished, categorizing the polarity of each sentence comment into two classes. The sentiment analysis research process is illustrated in Fig. 3.
3.2 TextCNN-BiLSTM Network Structure
Firstly, the preprocessed microblog comments are trained into a word vector using the CBOW in the Word2vec model, which serves as the input for the hybrid model. Next, TextCNN, without a pooling layer, is utilized to extract local features, while BiLSTM is employed to capture global features reflecting contextual information. Ultimately, a softmax layer is employed for the sentiment classification of microblog comments. The structure of the hybrid model is depicted in Fig. 4.
4. Experimental Results and Analysis
4.1 Experimental Data
The dataset utilized in this article is derived from Weibo_senti_100K, comprising over 100,000 microblog comments annotated with emotions. Weibo_senti_100K encompasses 50,000 positive and 50,000 negative comments, designated as 1 and 0, respectively. For the experiment, 10,000 data samples were selected from both positive and negative sentiment datasets, resulting in a total of 20,000 microblog comment data pieces for the model experiment. The proportion of training, testing, and validation datasets was set at 8:1:1. Sample comments are illustrated in Figs. 5 and 6.
Examples of positive comments.
Examples of negative comments.
4.2 Data Processing
1) Text segmentation: Chinese text lacks explicit segmentation, necessitating methods for segmentation to train the word vector via the pre-training model. Common Chinese word segmentation tools include Jieba, SnowNLP, NLPIR, and THULAC. This article employs Jieba as the word segmentation tool.
2) Using stop-word list: Chinese texts often contain unnecessary high-frequency words like conjunctions, prepositions, and modal words, which can impact the efficiency and accuracy of classification models. Thus, these words should be removed during data processing. Microblog data often contains interference symbols and URLs. After regular expression processing, the stopword list is applied to process the data. Due to the specific nature of microblog comments, this study combines Harbin Institute of Technology (HIT) and Baidu stop-word lists, supplementing them with additional specialized stop words to form an enhanced mixed stop-word list, thereby improving data processing effectiveness. The process is as follows: first, the texts are segmented using the combined Baidu and HIT stop-word list. Then, based on the word segmentation results, high-frequency symbols and English words are added to the combined list, resulting in a new, more comprehensive mixed stop-word list. The added stop words are detailed in Fig. 7.
Examples of added stop words.
In the experimental data (Section 4.1), 30 positive and negative data samples were processed using the Jieba segmentation tool along with the newly integrated mixed stop-word list. The results of word segmentation for these examples are illustrated in Figs. 8 and 9.
The word segmentation results of positive comments in Fig. 5.
The word segmentation results of negative comments in Fig. 6.
3) Vectorization processing: After data preprocessing, the processed data undergoes vectorization, transforming the result of word segmentation into the model’s input vector. In this paper, the CBOW model in Word2vec is adopted for word vectorization. Partial results after word vectorization are depicted in Fig. 10.
The result of vectorization by continuous bag-of-word model.
4.3 Experimental Environment and Evaluation Indicators
4.3.1 Experimental environment
The environmental parameters are shown in Table 1.
The experimental environment and environmental parameter
4.3.2 Evaluation indicators
The standard evaluation indicators for sentiment classification include accuracy rate (Acc), precision rate (P), recall rate (R), and comprehensive evaluation value (F1). The calculation formulas are as follows:
The quantity of correctly predicted results by the classification model is represented by T, while N denotes the total number of samples. TP signifies the number of positive class data correctly forecasted as positive, FP stands for the number of negative class data incorrectly forecasted as positive, and FN represents the number of positive class data incorrectly forecasted as negative. This paper adopts the average values of the positive and negative indicators, specifically AP, AR, and AF1, as the evaluation criteria.
4.4 Analysis of the Experimental Results
The TextCNN-BiLSTM hybrid sentiment classification model is built on the PyTorch deep learning framework. Extensive parameter tuning tests were conducted to achieve optimal effectiveness (Table 2).
1) Comparison experiment for stop-word list
As a critical step in sentiment analysis, data processing directly impacts the accuracy and efficiency of deep learning model training. In Jieba word segmentation, the stop-word list is applied to eliminate highfrequency interfering vocabulary, ensuring data reliability.
Based on the same parameters of the TextCNN-BiLSTM model, experiments were conducted using different stop-word lists: Baidu stop-word list (stopwords1), HIT stop-word list (stopwords2), Baidu and HIT mixed stop-word list (stopwords3), and the new mixed stop-word list (stopwords4). The experimental results are shown in Fig. 11.
The experimental results of TextCNN-BiLSTM model with different stop-word lists.
From Fig. 11, it’s evident that the single Baidu and HIT stop-word lists are inadequate for data processing. The Baidu stop-word list covers only a limited number of specific words, while the HIT stopword list lacks English stop words. Data processing based on the combined Baidu and HIT mixed stopword list has a significantly improved effect. The new mixed stop-word list, built upon the combined Baidu and HIT list, demonstrates even better data processing effectiveness, particularly for special emoticons and interfering English words in microblog comments. The accuracy rate of the new mixed stop-word list is 4.13%, 2.56%, and 0.28% higher than that of the Baidu stop-word list, HIT stop-word list, and Baidu and HIT mixed stop-word list, respectively.
2) Comparison experiment for activation function
Common activation functions include Tanh, ReLU, and sigmoid, with ReLU being the frequently used function in the model. In this paper, while keeping other parameters constant, the activation functions of the TextCNN convolution layers and the softmax layers of the hybrid model were set to Tanh, ReLU, and sigmoid. The experimental results are shown in Fig. 12.
From Fig. 12, it’s evident that the model achieves the highest precision when Tanh is used as the activation function. It outperforms ReLU by 0.3% and Sigmoid by 0.41%. Given the numerous hidden layers in BiLSTM, using Sigmoid can lead to gradient disappearance and relatively poor classification results. While ReLU helps mitigate the vanishing gradient problem, it comes with the issue of some neurons remaining inactive, preventing parameter updates. Tanh function proves effective in handling the large number of hidden layers in the hybrid model, making it more suitable for binary classification tasks, resulting in overall superior performance.
The experimental results of TextCNN-BiLSTM model with different activation functions.
3) Comparison experiment of the classification model
To validate the effectiveness of our proposed improved hybrid model, we conducted tests using the same dataset. We continued to train word vectors with Word2vec, which were then used as inputs for the model. TextCNN, LSTM, BiLSTM, and LSTM-ATT models were employed for comparative testing. The experimental results of each model are presented in Table 3.
Upon examination of Table 3, it becomes evident that TextCNN, a variant of CNN, excels at extracting local features. However, due to its limited ability to capture long-distance dependencies between words, its classification effectiveness is relatively modest. LSTM and BiLSTM models demonstrate similar classification effects. Yet, the BiLSTM model, with its focus on context, exhibits a slightly enhanced classification effect compared to the LSTM model which solely emphasizes the preceding context. The BiLSTM-ATT model, leveraging bidirectional LSTM for context feature extraction followed by the application of an attention mechanism to weigh the extracted crucial features, achieves an outstanding classification effect. The TextCNN-BiLSTM-ATT model combines a convolutional neural network and bidirectional LSTM to extract features, followed by a weighted feature processing step. However, given that microblog comments are inherently brief, the accuracy of the extracted features diminishes after weighting. The hybrid TextCNN-BiLSTM model in this paper, by integrating TextCNN for local feature extraction and BiLSTM for context feature extraction, improves accuracy, recall rate, and F1 value by 1.21%, 1.25%, and 1.25%, respectively, compared with the standalone TextCNN model. Furthermore, it demonstrates an improvement of 0.78%, 0.9%, and 0.9% respectively when compared with the BiLSTM model. The comparative results for these models are illustrated in Fig. 13.
The experimental results of different sentiment analysis models.
Examples of e-commerce reviews.
The experimental result of different models in e-commerce reviews.
4) Verification Experiments with Different Datasets The aforementioned experimental results indicate that the TextCNN-BiLSTM classification model excels in sentiment analysis of microblog comments. To corroborate the validity of this model, we selected another dataset for verification under the same experimental environment and parameters as in experiment (3).
The online_shopping_10_cats dataset encompasses e-commerce reviews for 10 categories of products, ranging from books and mobile phones to computers and hotels. Each category consists of positive and negative comments, with positive comments labeled as 1 and negative comments as 0. This dataset offers broad coverage and is highly representative. We selected specific reviews from online_shopping_10_cats to form an experimental dataset, comprising 10 thousand positive reviews and 10 thousand negative reviews. A portion of the experimental dataset is displayed in Fig. 14.
Comparative experiment results demonstrate that the improved hybrid model continues to outperform other models in sentiment analysis of e-commerce reviews. The experimental results for these classification models are depicted in Fig. 15.
5. Conclusion
This paper introduces an enhanced mixed deep learning model for classifying emotions in microblog user comments. The methodology involves using Jieba word segmentation with a mixed stop-word list for data preprocessing, followed by Word2vec model utilization for word vectorization. Subsequently, the TextCNN model is employed for local feature extraction, while BiLSTM fully captures context features. The outcome is effective emotion classification. Experimental results demonstrate that the improved hybrid neural network model surpasses single models in sentiment analysis, achieving a precision of 94.75%. By employing a mixed stop-word list and Tanh activation function, the model demonstrates superiority over the unimproved version. Specifically, the results indicate that the model with the new stop-word list and Tanh activation function enhance accuracy rates by 4.1% and 0.4%, respectively. However, it should be noted that this study predominantly focuses on classifying emotions in Chinese texts, whereas microblog comments on major social platforms often include English content. Therefore, further research into sentiment classification of mixed Chinese and English texts is warranted in the future.