8 | P age b hist(data$Ratings_Service) hist(data$Ratings_Cleanliness) hist(data$Ratings_Value) hist(data$Ratings_Sleep.Quality) hist(data$Ratings_Rooms) hist(data$Ratings_Location) library(Hmisc) data$Ratings_Service<-impute(data$Ratings_Service,median) data$Ratings_Cleanliness<-impute(data$Ratings_Cleanliness,median) data$Ratings_Value<-impute(data$Ratings_Value,median) data$Ratings_Sleep.Quality<-impute(data$Ratings_Sleep.Quality,median) data$Ratings_Rooms<-impute(data$Ratings_Rooms,median) data$Ratings_Location<-impute(data$Ratings_Location,median) datavar_imp<-data[,1:7] summary(datavar_imp)
9 | P age b
10 | P age b After checking boxplot graph for the data, we have noticed that there were no outliers present
11 | P age b in the data. Finally, the cleaned data was ready, which was used for all analysis throughout. We have saved the file into laptop for further use using ‘write.csv’. boxplot(datavar_imp) write.csv(data,"C://Users/hp/Desktop/Desk/XLRI/Term4/Business Intelligence and Data Mining(BIDB18- 4)/trip_imp.csv")
12 | P age b Sentiment Analysis To understand the sentiments of the reviews on Tripadvisor, we develop a sentiment analysis model. To do this, we first create a separate text file from the reviews. The reviews by customers will have lots of stop words, special characters, and words of social media lingo (cuz, b etc) which are not helpful for us to understand the sentiment of the sentence. We use the tm package and the following codes for the purpose of preparing the data. In the following codes, west create a corpus from the text file. Next, we change everything to lowercase so that case sensitivity does not affect our analysis. Then we remove stopwords, which are words such as is, and, the etc. These words don’t tell us anything about the sentiments of the speaker, and thus are not useful for us. Next we remove the punctuations and numbers. On checking, we see that some special characters like $, €, # etc. remain. We use the gsub function for removing those special characters as well. Finally, we also remove all the extra whitespaces. Then we create a wordcloud of the top 50 words. The review has too many words and making a word cloud with all the words was not providing any proper insights. So, the number of words were limited to 50, as that should provide some good idea about the topic.
13 | P age b As can be expected, most of the topics are around the word room, followed by resort, hotel, stay etc. Tripadvisor being a review for holidays and hotel stays, this result seems to be quite inline with the expectations. To understand the sentiments of the sentences, we next carryout sentiment analysis of the sentences. The tidyverse package is used for this purpose. We use the “bing” list, developed by Bing Liu and collaborators. “Bing” list is a sentiment lexicon, which divides the words into positive or negative sentiments. The following code is run to get the sentiments of the reviews – The document is first tokenized, i.e. broken into individual words. An innerjoin function is used to get the sentiment of the column words, which is created by tokenization of the text in doc, which was our corpus of data after cleaning operations.
14 | P age b We get the following result – Negative Positive Sentiment 54031 168516 114485 The result shows that there were 54031 negative words in the list and 168516 positive words, i.e. there were 114485 more positive words than negative words – the sentiment of the reviews. What this tells us is that the reviews are mostly positive. CONCLUSION Sentiment analysis can help service businesses like hotels, restaurants and others to understand the general feedback of their target group about their service. This will help them in continuously monitoring and improving their services, to stay ahead of their competition. Sentiment analysis also helps, as there are stream of reviews flowing on all platforms – review sites, different social media comment etc, and often a bad review might go viral and influence the image of the brand. Sentiment analysis will help the firm segregate the two different kinds of feedback and this will help them take prompt action on the negatives one, as they have the highest potential to harm the image. It can also be used to gather intelligence about the competition, since these reviews are often posted on public platforms, and are easily accessible by all the parties. The company can then try to understand what the strengths are they have over competition, which can then be used as their USP. In our study, we had taken the reviews of one particular hotel on TripAdvisor platform over the 2017-18 period. We found out that reviews for this hotel are majorly positive. However, the negative reviews are quite substantial as well, and thus the hotel should drill down into these reviews to understand what is that is affecting their customer satisfaction, and take steps to improve the customer delight.
Share with your friends: |