Sentiment analysis of hotel reviews

Download 0.86 Mb.

View original pdf

Page	7/8
Date	17.12.2020
Size	0.86 Mb.
	#54951

1 2 3 4 5 6 7 8

BIDM Group A3 (1)

6 |
P age b
W
hile checking statement of data, we have identified few columns which were inappropriate for our analysis or meaning wasn’t clear. Those columns were to be removed. After removing the inappropriate columns, we observe that two of the review columns are almost empty as conveyed by the image below after observing the summary of the data. data<-read.csv("C://Users/hp/Desktop/Desk/XLRI/Term4/Business Intelligence and Data Mining(BIDB18-
4)/trip.csv", header = TRUE) str(data) data$AuthorLocation<-NULL data$Title<-NULL data$Author<-NULL data$ReviewID<-NULL summary(data)

7 |
P age b
Once we have removed inappropriate columns as well as the almost empty columns from the data, we have split the data into two parts. Once was with the review variables like cleanliness, service, overall rating etc. and other with variables like content etc. We have checked summary for basic variable data. By observing, we have found that data consisted few missing values. Data Imputation In order to fill the missing data, we have used imputation method. Firstly, we have checked for the properties of the data using histograms and it was slightly skewed data towards the higher rating. In such case, it is better to use Median for data imputation rather than Mean. We have used Median for data imputation and as shown in the post imputation summary, all the missing data have been imputed. data$Ratings_Business.service<-NULL data$Ratings_Checkin<-NULL str(data) datavar<-data[,1:7] str(datavar) summary(datavar)

Download 0.86 Mb.

Share with your friends:

1 2 3 4 5 6 7 8