ENHANCEMENT OF TEXT BASED EMOTION RECOGNITION PERFORMANCES USING WORD CLUSTERS

Human Computer Interaction (HCI) researches the use of computer technology mainly focused on the interfaces between human users and computers. Expression of emotion comprises of challenging style as it is produced with plaint text and short messaging language as well. This research paper investigates on the overview of emotion recognition from various texts and expresses the emotion detection methodologies applying Machine Learning Approach (MLA). This paper recommends resolving the problem of feature meagerness, and largely improving the emotion recognition presentation from short texts by achieving the three aims: (I) The representing short texts along with word cluster features, (II) Presenting a narrative word clustering algorithm, and (iii) Making use of a new feature weighting scheme of the Emotion classification. Experiments were performed for the classifying the emotions with different features and weighting schemes, on the openly available dataset. We have used the word clusters in place of unigrams as features, the micro-averages of accuracy have been found to be enhanced by more than three percentage, which suggests that the overall accuracy value of the text emotion classifier has been improved. All the macro-averages were enhanced by more than one percentage, which suggests that the word cluster feature can advance the generalization potential of the emotion classifier. The experimental results suggest that the text words cluster features and the proposed weighting scheme can moderately resolve the problems of the emotion recognition performance and the feature sparseness.


Introduction
One of the important capabilities of the Intelligent machines is the 'affective capability', making them to understands and expresses emotions, has become an emerging research area in the domain of the artificial intelligence community which has been elaborated by Russell and Norvig in [01]. Recently, Cambria et al. presented a thought called 'affective computing technique' and correlated the computations method that relates to the emotions [02]. The emotion recognition is one of the most important constituents of affective computing. The emotions can be put into practice in various ways of media, and includes speech characteristics, facial expression, and physiological indications as stated by Bravo-Marquez et al. in [03]. However, text messages are still the most popular communication medium at present. The text messages are having many applications and it is having important task to be recognized emotions from texts effectively. An intelligent conversation on tweeter can recognize emotions from a user's discussion, it can give extra adaptive and human-like response. In the text-to-speech amalgamation, if a system can distinguish emotions from word text, the system can produce speech that sounds natural. The recognizing emotions from text messages are noteworthy for the implementation of the appealing human-computer interaction.
The 'machine learning based' and the 'knowledge based' approaches are the two main approaches which is being recognized now these days. The second approach which is frequently being initiated by constructing an affective lexicon, and then combining the syntactic and semantic rules to recognize emotions from the texts. The above two methods perform well in the particular domains, but their performances are extremely dependent on the quality of their 'syntactic rules' and 'affective lexicon'. It is exclusive and more complex to build higher quality of affective lexicons and syntactic rules. The first approach (machine learning approach) considers emotion detection as a 'classification problem'. It extracts the features from emotion labeled texts, and represents each and every text message as a 'feature vector'. The features can be of a cluster of thousands of words that occur in the 'labeled texts'. The 'Machine learning algorithms', such as Naive Bayes, 'Support Vector Machine' (SVM), and 'Maximum Entropy', are applied to instruct an emotion classifier. These techniques are requiring the building an emotion-labeled corpus beforehand, it is comparatively easier than physically constructing a syntactic rule based affective lexicon. The emotion classifier has been previously trained; these approaches are comparatively more efficient when recognizing the emotions from a new text as stated in [04] by Xu and Peng and in [05] by Gaudette and Japkowicz.
There are various efficient classification algorithms available now these days and machine learning-based approach appears to be simple and efficient. The present research work has suggested that these algorithms do not perform as well as they do for topic text classification. Aman and Szpakowicz in [06] and [07] observed the important reason which is that the processed texts in emotion recognition are normally very short, or a few words leading to large, thin featured space. The research paper mainly focuses on the feature sparseness problem to get better the performance of emotion recognition particularly in short text massages. To determine this problem, this research work proposed to use the 'word clusters' instead of 'words' that were normally used as features. Word clusters have been used as features in the classification of texts. However, as far as in the literature survey, that has not been applied in recognizing the emotions. It has been proposed a detailed word clustering algorithm as well as a new weighting scheme, which can measure the feature values with better accurately. More significantly, the experiments suggest that this approach can lead to improve the effectiveness of the emotion recognition from short text.

Literature Review
The literatures related to detecting the emotions and sentiment in the form of text has been presented in detail in this research paper. According to their research perspectives, the literatures can be mainly grouped into two categories, they are Commercial perspective focused on sentiment detection in reviews from users (like product reviews) as stated in [08] by Pang and Lee and the movie reviews as explained [09] by Feldman. They made efforts to identify the polarity of sentiment in text sentiments like positive, neutral and negative feelings. It is called name by researches as sentiment analysis. It can be termed as opinion mining. There are increasing literatures those are focused on detecting the multiple emotions in open social web texts, like the messages in social network sites, discussion forms, blog forms and from the perspective of emotional psychology as elaborated in [10] by Neviarouskaya et al. Neviarouskaya et al. [10] proposed a rule-based affect analysis model to be acquainted with emotions expressed by the text messages. The approaches based on affective knowledge perform better in some specialized domains since, they hugely depend on a previously created affective lexicon rules. The generalization capability of these approaches is not of very high quality, because it is difficult and expensive to create an emotion lexicon and associated rules that are appropriate for all the domains. An attempt to identify a variety of emotions in the text, usually focusing on the six basic emotions of anger, fear, disgust, joy, sadness, and surprise had been pointed out by Haggag et al. in [11]. These are called the refined sentiment detection/emotion recognition. In this research paper, it has been focused on identifying emotions in short text, targeting multiple emotion recognition. Emotion recognition approach presented by Subasic and Huettner in [12] is based on affective knowledge that is based on syntactic rules, semantic rules and the emotional keywords. They built a fuzzy affective lexicon, and proposed a convenient fusion of 'natural language processing' and 'fuzzy logic techniques' for the analyzing affect content in the free text forms. Another researcher Elliott in [13] applied two hundred unambiguous emotional words with a little modifier, clue phrases, and heuristic rules to distinguish the emotions expressed in text form. Boucou-valas [14] applied a tagged dictionary to extract the emotional text words in a sentence, and applied a parser to recognize the associated objects, following which he engaged syntactic rules to recognize emotions in real time dialogues.
An approach to learning a miniature society of linguistic affect models from a big scale real world generic knowledge base applying the Open Mind Common Sense had been proposed by Liu et al. [15]. In [16] Alm modified and improved the feature sets and developed the experimental results of superior grained emotion classification. In the research she constructed a corpus of text blog posts based on the corpus, and applied the noticeable emotion words present in the sentences as features, and used Naive Bayes and Support Vector Machine (SVM) to classify the sentences from blog posts into emotional/non-emotional categories and employed unigrams and emotional words as features to categorize the sentences into six basic emotions. To recognize emotions of news highlights, Katz et al. in [17] applied a supervised learning method which is based on a unigram model. Strapparava and Mihalcea in [18] proposed a number of semantic methods applying the Latent Semantic Analysis (LSA) methods. Machine learning based approaches are comparatively more efficient and generalized, because they keep away from the difficult task of creating emotion lexicons and related rules. However the text categorization methods encounter significant performance degradation when used to emotion recognition process, because of the intrinsic nature of text characteristics, and the statistical machine learning algorithms necessitate longer input for practical accuracy. In the classification phase, an unlabeled sentence is represented as a feature vector and is classified into a basic emotion by the classifier.
Ryoko Tokuhisa et al. in [19] proposed a classification method for the massive text examples that are being extracted from the web. The quantity and the accuracy of the text examples those were emotion aggravating was collected and the Emotion Protocol (EP) corpus was formed. The process involved two steps: The sentiment polarity classification, where in case the value of the classification is closer to the decision boundary than it was considered as neutral and other examples with negative or positive classification values are fed to the superior graining emotion classification where n-nearest neighbor approach is applied and n-most similar examples from the Emotion Protocol corpus is retrieved. Matteo Baldoni et al. in [20] discussed on the translation of tags from the sentences to the emotions. This method applies the perception that if a tag belongs to the ontology of emotions it is classified as emotional other-wise SentiWordNet is applied to analyze the text words as the objective and the subjective tags where the subjective tags alone carry emotions. The subjective tags deliver the emotional notion to user. Ranking of emotions is being done in the process and associates them with user input.
An automated classification method for self induced emotions is developed where the wavelet transform technology is used for signal processing and feature extraction is done using ad-hoc classification using Support Vector Machine (SVM) and ended up with better success rate is obtained. Mohammad Soleymani et al. in [21] proposed Multimodal sentimental analyses which analyze the sentiment. Due to the enormous handling of social media, the need for the analysis of the vocal expressions in addition to the textual content aroused which can be resolved by Multimodal sentiment analysis that can perceive sentiment through affective traces, such as vocal displays. This approach influences the acknowledgment of emotions and the context assumption to determine the polarity of an individual sentiment. Chang-Tai Hsieh et al., in his paper in [22] gives an overview about the emotion detection from text and explains the methods in current generation which are categorized as keyword based, learning based, and applying the hybrid recommendation approaches. To prevail over the current detection methods limitations, HCI and extracting keywords with semantic analysis was suggested.

Emotion Recognition Based on Word Clustering
The framework for emotion detection based on word clustering is shown as follows in Figure 1.
The framework consists of three main phases: (I) Text word clustering phase, (II) The word training phase, and (III) The classification phase. In the first phase, all the effective words in the emotion based corpus are clustered by their semantic resemblance. In the second phase, each word text is symbolized as a feature vector, and the SVM learning algorithm is functional to train an emotion classifier.

Text Representation based on Text Word Clusters
A text must be represented in computational form ahead of it can be processed and analyzed. The commonly known vector space model is employed in text mining process. It selects certain features to put together a high dimensional vector space, and each and every text then turn into a vector in this space. Usually, the features are text words that occur in the corpus, and the characteristic Http://www.granthaalayah.com ©International Journal of Research -GRANTHAALAYAH [242] values are calculated by their TFIDF (term frequency inverse document frequency) values. The above methods cannot be applied in a straight line when the features are text word clusters. We propose a new weighting scheme for the word clusters, which is based on the bias degree of clusters and the representation degree of words. The representation degree of the text words is the extent to which a word characterizes a word cluster. A word is closer to the center of a text word cluster whose representation degree is better strong. If a word is closer to the center of a cluster, in that case the variance of the semantic distances between the word and other words will be less important. According to this notion, we assume that text word cluster c is represented as c = (w1, w2, …, wi, …, wn), and wi represents one of the words to be in the right place to cluster c. The representation degree of the text word w for the cluster c can be quantified as follows: In the last phase, an unlabeled word sentence is also correspond to feature vector and is classified into a basic emotion by the classifier. In the following subsection, we will illustrate key steps of the framework.

Semantic Similarity Measure
Semantic relationship measure is important in the process of the text word clustering process. The Researches have proposed various approaches to determining the intangible similarity in the taxonomy. These can be categorized as 'in sequence content' approaches and 'conceptual space' approaches. Our semantic similarity determination is based on the collective approach proposed by Jiang in [23], which had been derived from the conceptual distance approach by including information content as a assessment factor. WordNet has been used here as the taxonomy to find out the shortest pathway that links both concepts. When calculating the weighting value of the link between two adjacent concepts, the link strength is considered only. The link strong point is defined as the difference between a primitive concept and its advanced concept. It can be assumed that there are two words W1 and W2, and Dist(w1, w2) represents the semantic similarity between w1 and w2. The semantic similarity can be considered as follows: Dist(w1, w2) = min(IC(c1)+IC(c2) -2x. IC (LSuper (c1, c2 )) Equation 1 where c1, c2 denote a possible logic of word w1 and w2; LSuper(c1, c2) is the lowest super ordinate of c1 and c2; IC is the information substance of concept ci.
Following the notation in information theory, the information content of concept c can be calculated as follows: Where words(c) is the set of text words representing concept c, freq(w) denotes the frequency of text word w in the corpus, and N denotes the whole frequency of the corpus. The frequencies of concepts are approximated by using the frequencies from a universal semantic concordance as explained by Miller et Al. [24], in detail.

Word Clustering Algorithm
There are many word clustering algorithms, but some are more difficult in use with respect to others to apply in the emotion recognition. This semantic similarity measure is based on the hierarchical arrangement in WordNet, in which nouns and verbs are systematized hierarchically. This provides semantic similarity measure, which can only be applied to the noun pairs and the verb pairs. The emotion words are significant; it is more effective to assemble them by their grammar and emotion categories. The categorization process of the words as text content words and emotion oriented words, with different application methods are applied to cluster them. Particularly to the emotion words, we apply the emotion word listed by Strapparava & Mihalcea in [25] to group them into twenty four cluster. The words in each one have the equivalent part-ofspeech and their matching emotion category. For the content words, we recommend a clustering algorithm to collect the similar words into a cluster. This algorithm clusters each word into a separate group, and successively amalgamates clusters until a stopping criterion is contented. The proposed word clustering algorithm includes two processes. First, it analyzes the semantic similarity of the each pair of words. If their similarity is more than the defined threshold, α, they are the synonyms for each other. The text word and its comparable words are then initialized as a disconnected cluster. Second, it discovers the closed two word clusters and envisages whether their similarity is more than the predefined threshold value, β. If it is, than the two nearest word clusters are combined, otherwise the procedure is ended. The similarity measure of text word clusters is quantified as follows:
The inequity degree of a word cluster is its ability to discriminate between different kinds of emotions assuming that the distribution of a text words in different emotions are imbalanced, which constructs the discrimination degree comparatively strong. According to the details of information theory, the discrimination degree of a text word can be quantified as the reciprocal of its information entropy. We can understand that a word cluster c including n text words are represented as c = (w1, w2, wi, …, wn), and the m emotion categories are represented as e = (e1, e2, …, ej, …, em). The discrimination degree of a word cluster is quantified as follows: Where d is the discrimination degree of the word cluster c; P(wi|ej) indicates the provisional probability of encountering the text word wi provided the texts belonging to emotion ej. Based on the illustrative degree of words and the discrimination degree of the word clusters, the weighting significance of the word cluster wci in text can be quantified as follows:

Equation 6
Where di denotes the discrimination degree of word cluster wci; and rij denotes the representation degree of a word wj; in the cluster wci; k is the number of incidences of the word wj in the text; n is the total of words in cluster wci.

Classifier based on SVM
The SVM technique has been very successful tool in the text categorization applications. In text categorization, it has been proved that the SVM has superior performance than the statistical and machine learning approaches (MLA) as explained in his research by Jachims in [26]. Furthermore, Dumais et al. in [27] have reported in the research paper that linear SVM performs better than nonlinear one. In this research paper, here the linear SVM method is implemented in the LIBLINEAR package as per the guidelines of researcher Fan et al. provided in [30] to prepare a multi-class emotion classifier. It is implemented in the one vs. all approach. If we consider that there are K classes, this strategy would then create K binary SVMs. The original training text data are separated into two distinct classes. The first is the creative class of K classes; all others are divided into the second class. Each binary SVM is then trained on the dataset and a new word texts sample are classified to a class whose binary SVM has the greatest values.

Experiments and Evaluation
Experiments are conducted in this research work on a publicly available dataset. Initially we studied the impact of varying parameters in the text word clustering algorithm and than compared the performance of the emotion recognition when it is applied to the different features and weighting schemes.

Dataset
We conducted two group experiments on a publicly available dataset. In the first group, we studied the impact of varying parameters in the word clustering algorithm. In the second group, we compared the performance of emotion recognition when we used different features and weighting schemes. Our experiments were conducted on the News Headlines (publicly available corpus),. It was developed by Strapparava and Mihalcea in [25] for the emotional text. The task focused on the emotion classification of news headlines, including the emotion annotation subtask and the valence labeling subtask. For the emotion annotation subtask, each and every news headline was added with footnotes to indicate the diverse degrees of six basic emotions proposed by Ekman in [28]. For the valence labeling task, the interval for the valence annotation was targetted to [-100, 100]. The value 0 indicates the news headline is neutral emotion, 100 indicates the news headline is highly positive emotion, and -100 indicates the news headline is highly negative emotion. The News Headlines corpuses were placed into two datasets. First is a development dataset, having 250 annotated news headlines, and a test dataset with 1,000 annotated headline news massages. In our experiment, we combined the two datasets and classified news headline information to the emotion category with the highest degree. The distribution of emotion classes is tabled in Table 1.

Performance Evaluation Criteria
In this research paper, we have considered the emotion recognition as a multi-class classification problem. The, precision, recall, and F1 measure have been applied to evaluate the performance of the emotion classifier for each and every emotion category, and use the macro-average (Macro (Avg.)) and micro-average (Micro (Avg.)) to evaluate the performance of the emotion classifier for all the classes.
The macro-average equally weighs all the classes, regardless of how many texts belong to them, which can be quantified as the formulation given for the 1864 samples of the word text and its distribution of emotion classes in the table 1 as per the equation 7 equated below has been presented as follows: Macro (Avg.) = where Xi is the precision, recall, or F1 value for an emotion category.

Equation 8
Where Ci is the number of word texts that are classified into the emotion category by the word classifier, N is the number of emotion classes and M is the total of words texts considered.

Results and Analysis
According to the procedures of text emotion recognition, we conducted the experiments on two groups using the above dataset of 1864 samples. All the experiments are performed using multifold cross validation of five; with approximately 81% train and 19 % test data as input. When the LIBLINEAR package is applied to train the word test emotion classifier, the parameters are fixed to s = 1, c = 1.6. We have compared our results with the result obtained by existing systems. The details are given as follows:

Parameter Determining Process
When the text word clusters are used as features, the results of the word clustering are very important for the performance of text emotion recognition. We categorize the text words as emotion contents. We assembled the emotion words into twenty four clusters based on their partof-speech and text emotion categories. Content words were grouped applying the algorithm introduced before clustering them. The two parameters, α and β, are determined and the results of word clustering are assigned. To discover the optimal values for the text emotion classification, experiments were conducted with the weighting scheme presence. This scheme is applied to the word cluster features, with the weighting value of one, if one of the text words in the cluster is presented in the word text; otherwise the weighting value is zero. In the experiment, we first set a nominal value of β to 0.45, and varied α from the value 0.45 to 0.70 to find the most favorable α value. Next, we set α to the best possible value, and varied β from 0.45 to 0.70 to find out the most advantageous β value. The experimental results for the macro-average of F1 values as well as micro-average of F1 values conducted are shown in the tables 2 and 3.  According to the above Tables 2 and 3, we get the result that, when α = 0.55 and β = 0.60, the word text emotion classification presentation is the best one for the News Headlines Corpus emotion determination. The next experiment while the word clusters are applied as features, we used the best possible values and compare the different features and weighting schemes as we know that the features and weighting schemes are important for the performance evaluation of the emotion classifiers. To test the validity of the text word clusters as features, we compared the outcomes from using emotion words and word clusters as features with the presence scheme. To validate the weighting scheme for word clusters, we had compared our weighting scheme to the presence scheme as per the guideline stated by Wang et al. in [29]. It has been suggested that the Presence scheme provides the best classification results in the sentiment classification. Therefore, it is not required to compare the result of the weighting scheme with the Absolute Frequency, the Relative Frequency, and the TFIDF. Table 4, represents the emotion words as features, & applies the Presence scheme. Our weighting scheme is based on the discrimination degree of the text word cluster and the representation degree of the word. Table 4, shows that when we used unigrams instead of the emotion words as features, both the micro-average and the macro-average of precisions, recalls and F1 values are largely advanced, which suggest that some non-emotional text words are also important for emotion recognition. When we used the word clusters instead of unigrams as features, in that case the micro-averages of precisions, recalls the F1 values are enhanced more than 3%, which suggests that the overall accuracy of the text emotion classifier has been improved. All the macro-averages were observed to be enhanced by more than 1%, which suggests that the text word cluster feature can improve the capability of the text emotion classifier. By applying the proposed weighting scheme instead of Presence scheme, the micro-average as well as macroaverage of precisions, the F1 values were further improved. Based on these improvements in the cluster features, we can conclude that the word cluster feature and the proposed weighting scheme improves the overall performance of the emotion recognition. Table 5 given below shows the overall results of three different systems participating in the emotion explanation task of the UA, SWAT and UPAR and five systems explained by researcher Strapparava and Mihalcea in [25]: LSA single word, WN-affect presence, LSA emotion, LSA all emotion words and NB trained on social blogs. The outcomes suggest that the task of emotion recognition is a complicated task, because of the texts analyzed are typically very short and some related emotions can exist in one or two text only.

Conclusions
In this research paper, we proposed a narrative on emotion recognition approach based on the word clustering for short texts. Based on the semantic resemblance of the words, we proposed a word clustering algorithm. In this paper we used the word clusters as features and additionally proposed a weighting scheme based on the discrimination-based degree of word clusters as well as the representation degree of words. In these experimentations, we examined the emotion recognition performance on a openly available dataset when different features and weighting schemes were applied. The experimental results showed that by using the word clusters as features which can mainly reduce the dimension of the feature space and when the short texts are presented by word cluster features with proposed weighting scheme, the emotion classifier performs are improved for most of the specific emotions. In the last by applying the word cluster features, the proposed weighting scheme improves the complete performance of emotion recognition. This study suggests some interesting problems for further exploration also. We see from the experimental results that the text word cluster features and the proposed weighting scheme are not much effective for the text emotion termed as 'disgusted'. This may suggest that different features are more adaptive for different emotions. Besides, thinking the intensity of the emotive words may improve the emotion recognition performances.