7 | P age b
Once we have removed inappropriate columns as well as the almost
empty columns from the data, we have split the data into two parts. Once was with the review
variables like cleanliness,
service, overall rating etc. and other with variables like content etc. We have checked summary for basic variable data.
By observing, we have found that data consisted few missing values. Data Imputation In
order to fill the missing data, we have used imputation method. Firstly, we have checked for the properties of the data using histograms and it was slightly skewed data towards the higher rating.
In such case, it is better to use Median for data imputation rather than Mean. We have used Median for data imputation and as shown
in the post imputation summary, all the missing data have been imputed. data$Ratings_Business.service<-NULL data$Ratings_Checkin<-NULL str(data) datavar<-data[,1:7] str(datavar) summary(datavar)