Ranking the medical websites according to the “Global Rank” variable’s value and choosing the 100 medical websites with the longest time duration that the users tend to spend on them and the 100 websites with the lowest, identification of the factors that can affect this variable was attempted.
By observing Table 28, the factors that present the greatest difference among the medical websites with the longest and shortest time duration spent on them by the users are: the status of the website, the nature/structure of the organizations that are supporting/providing the websites, the interactive services offered, the certification, the source of income, the web presence in social networks and the various estimations based on the traffic of website as well as the percentage of the website’s visits that are referred from search engines.
Table : Metrics of the websites with the longest and shortest time spent on them by the users
Medical websites with the highest percentage of visits referred from search engines
|
Medical websites with the lowest percentage of visits referred from search engines
|
Characteristic
|
Percentage/ Number
|
Characteristic
|
Percentage/ Number
|
Interactive service offering
|
56%
|
Interactive service offering
|
52%
|
Online websites
|
84%
|
Online websites
|
79%
|
Specific health problem
|
16%
|
Specific health problem
|
23%
|
General Health
|
25%
|
General Health
|
50%
|
Drug info
|
6%
|
Drug info
|
2%
|
Medical resources
|
27%
|
Medical resources
|
30%
|
Medical products
|
6%
|
Medical products
|
8%
|
Medical Education
|
15%
|
Medical Education
|
7%
|
Kids health
|
3%
|
Kids health
|
4%
|
Senior Health
|
2%
|
Senior Health
|
1%
|
Profit organizations
|
26%
|
Profit organizations
|
50%
|
Non-profit private organizations
|
57%
|
Non-profit private organizations
|
44%
|
Non-profit governmental
|
17%
|
Non-profit governmental
|
6%
|
Ask a doctor
|
5%
|
Ask a doctor
|
11%
|
Find a doc/local care
|
22%
|
Find a doc/local care
|
12%
|
Drug symptom checker
|
9%
|
Drug symptom checker
|
9%
|
Other
|
26%
|
Other
|
23%
|
Social network
|
52%
|
Social network
|
38%
|
Certification
|
15%
|
Certification
|
24%
|
Value for patients
|
69%
|
Value for patients
|
70%
|
Blog
|
18%
|
Blog
|
15%
|
Average global rank
|
266,136
|
Average global rank
|
37,350,140
|
Average % of visits referred form search engines
|
24%
|
Average % of visits referred form search engines
|
16.3%
|
Average time on website
|
4.4 min
|
Average time on website
|
1 min
|
Average lifetime
|
11.1 years
|
Average lifetime
|
9.5 years
|
Average unique visits
|
964,192
|
Average unique visits
|
163,321
|
Average linked websites
|
780
|
Average linked websites
|
180
|
Governmental funding
|
22%
|
Governmental funding
|
6%
|
Products/services/shop
|
26%
|
Products/services/shop
|
40%
|
Donations/sponsors/support from non-profit org./grants
|
17%
|
Donations/sponsors/support from non-profit org./grants
|
21%
|
Membership/registration
|
5%
|
Membership/registration
|
5%
|
University funding/research/project funding
|
21%
|
University funding/research/project funding
|
6%
|
Partners/mother profit org.
|
2%
|
Partners/mother profit org.
|
7%
|
Advertisements
|
5%
|
Advertisements
|
8%
|
Average Google rank
|
6.5
|
Average Google rank
|
4.5
|
Average Facebook likes
|
587
|
Average Facebook likes
|
200
|
In order to examine the factors from our dataset that can affect the time the users tend to spend on the medical websites, a linear regression model was implemented to test the null hypothesis that the factors from presented the greatest difference between the websites with the longest and shortest time that the users tend to spend on them affect indeed the dependent variable which is going to be the “time the users spend on website” variable. As independent variables/predictors are used the medical website status, the interactive service offering, the web presence in social networks, the global rank, the kind of interactive services, the category in which the medical website belongs to, the web presence of any kind of certification, the number of unique visits and linked websites and the nature/structure of the organization that that supports the medical website as well as the source of income of the websites.
According to the aforementioned criteria of the linear regression model, the first step of preparing the data for the regression process was to check the normality criterion. In the sample the “time the users spend on the website” needed to be transformed in order to follow the normal distribution by estimating the decimal logarithm of its value. The variable presents a high positive Skewness and Kurtosis as it is presented in Figure 27 which presents the histogram and the descriptive statistics of the variable’s values in a standard normal distribution Skewness should be 0 and Kurtosis 3.
Figure : Histograms of the dependent variable before and after the transformation in order to follow a normal distribution
The variable presents a high positive Skewness and Kurtosis as it is presented in Figure 27 which presents the Histogram and the descriptive statistics of the variable’s values in a standard normal distribution Skewness should be 0 and Kurtosis 3.
Table : Descriptive Statistics of the “Log10 time on website” variable
N
|
307
|
Missing
|
9
|
Mean
|
.4300
|
Median
|
.4771
|
Std. Deviation
|
.48
|
Variance
|
.20882
|
Skewness
|
.044
|
Std. Error of Skewness
|
-.347
|
Kurtosis
|
.139
|
Std. Error of Kurtosis
|
0.785
|
As it was afore-mentioned, P-P plots and Q-Q plots are also useful in assessing if the data are following a specific distribution.
Figure : P-P and Q-Q plots of the “time spends on website” variable
Although the normality tests indicate non-normality, since we have a large sample with more than 200 observations (N=316) it is considered better to look at the shape of the distribution and the values of the Skewness and Kurtosis rather than estimate their significance (Field, 2009; MVP Programs-Normality Testing Guidelines, accessed 11/1/2012). The z-score statistic for the Skewness and the Kurtosis is considered useful to be calculated since it enables us to compare the values of these metrics in different samples. In order to calculate the z-scores of the Skewness and the kurtosis we will use the formula below (subtracting the mean of the normal distribution that is 0 and dividing the result with the standard error of the Skewness or Kurtosis):
The Zskewness = 0.486 and the Zkurtosis = 2.83, so both values are below the threshold of 3.29 in which the values are significant at p<0.001 so we can assume normality (Field, 2009).
Table : Tests of Normality for the “Log10 time on website” variable
Variable
|
Kolmogorov-Smirnov
|
Shapiro-Wilk
|
Statistic
|
df
|
Sig.
|
Statistic
|
df
|
Sig.
|
LOG10timeonite
|
.148
|
307
|
.000
|
.949
|
307
|
.000
|
The next step of the analysis involved the assessment of the linear relationship between the dependent and the numeric independent variables by estimating Pearson’s R and Spearman rho correlation Coefficient, as it was performed in the case of the “Global Rank” dependent variable. After summarizing the results on Table 32, it is observed that the independent variables present a statistically significant positive linear relationship with the dependent variable except from the “Global Rank” and the “Percentage of visits referred from search engines” predictors that present a negative linear relationship with the predictor. This means that as the percentage of visits to the website that are referred from search engines increases, the time the user spend on the website tends to decrease. This can be explained assuming that the users get familiar with the website as they visit it more frequently through a search engine and know where to find the information they need.
Table : Pearson’s R and Spearman estimation for the linear relationship between dependent-independent variables
|
LOG10 “Time spent on the website”
|
Google Rank
|
LOG10 “Time spent on the website”
|
Spearman’s rho Correlation Coefficient
|
1
|
-.578**
|
Sig. (2-tailed)
|
|
.000
|
N
|
307
|
279
|
Google Rank
|
Spearman’s rho Correlation Coefficient
|
-.578**
|
1
|
Sig. (2-tailed)
|
.000
|
|
N
|
279
|
307
|
|
LOG10 “Time spent on the website”
|
Unique Visits
|
LOG10 “Time spent on the website”
|
Spearman’s rho Correlation Coefficient
|
1.000
|
.346**
|
Sig. (2-tailed)
|
.
|
.000
|
N
|
279
|
261
|
Unique Visits
|
Spearman’s rho Correlation Coefficient
|
.346**
|
1.000
|
Sig. (2-tailed)
|
.000
|
.
|
N
|
261
|
279
|
|
LOG10 “Time spent on the website”
|
Linked Websites
|
LOG10 “Time spent on the website”
|
Spearman’s rho Correlation Coefficient
|
1.000
|
.229**
|
Sig. (2-tailed)
|
.
|
.000
|
N
|
279
|
279
|
Linked Websites
|
Spearman’s rho Correlation Coefficient
|
.229**
|
1.000
|
Sig. (2-tailed)
|
.000
|
.
|
N
|
279
|
316
|
|
LOG10 “Time spent on the website”
|
Fraction of visits from search engines
|
LOG10 “Time spent on the website”
|
Pearson’s R Correlation Coefficient
|
1
|
-.189**
|
Sig. (2-tailed)
|
|
.003
|
N
|
279
|
253
|
Fraction of visits from search engines
|
Spearman’s rho Correlation Coefficient
|
-.189**
|
1
|
Sig. (2-tailed)
|
.003
|
|
N
|
253
|
258
|
After implementing the linear regression model, the results showed that, according to the R-squared value, the independent variables can account for 28.6% of the variation in the dependent variable so there might be also other variables that can explain the variation. ANOVA test presents an evaluation of the ability of prediction of the model. Since the value of the F-test is significant (p<0.01) this means that the model provides a significantly better prediction or improvement than if we had used the mean value of the dependent variable (Table 32). In addition to the above, the value of the Durbin-Watson test for serial correlation since it is substantially above 1 shows that there is independence of errors according to the assumptions of the linear regression.
Table : Linear regression model’s summary for the Log10 “Time spent on website” variable
Linear Regression Model Summary
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
Durbin-Watson
|
1
|
.457
|
.209
|
.131
|
.19526
|
2.143
|
ANOVA
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
2.662
|
26
|
.102
|
2.686
|
.000
|
Residual
|
10.103
|
265
|
.038
|
|
|
Total
|
12.765
|
291
|
|
|
|
Moreover, after examining the coefficient’s output of the linear regression model (Table 33), it was observed that the percentage of visits to the website referred from search engines, the “specific healthcare issues” website category, the governmental funding as source of income, the unique visits and the Google rank are significant variables/predictors that influence and have an impact on the dependent variable. The sign of the beta coefficient implies that this impact is negative in the case of the “percentage of visits to the website referred from search engines” predictor, implying that as the fraction of visits referred to from search engines increases, the time that the users tend to spend on the website tends to decrease, while for the rest of the significant predictors, the positive sign implies a positive relationship between them and the dependent variable.
Table : Coefficients output
Model
|
Unstandardized Coefficients
|
Standardized Coefficients
|
t
|
Sig.
|
B
|
Std. Error
|
B
|
(Constant)
|
.388
|
.128
|
|
3.027
|
.003
|
Service
|
.017
|
.030
|
.041
|
.590
|
.555
|
Website status
|
.033
|
.038
|
.054
|
.872
|
.384
|
Ask a Doctor
|
-.063
|
.046
|
-.085
|
-1.353
|
.177
|
Find a doctor or local care
|
.062
|
.035
|
.118
|
1.767
|
.078
|
Drug or symptom checker
|
-.003
|
.047
|
-.004
|
-.061
|
.951
|
Social Networks
|
.020
|
.026
|
.049
|
.800
|
.424
|
Certification
|
.013
|
.030
|
.026
|
.429
|
.669
|
Fraction of visits from Search engines
|
-.004
|
.001
|
-.210
|
-3.552
|
.000
|
Profit
|
-.053
|
.034
|
-.122
|
-1.565
|
.119
|
Non-profit governmental
|
-.107
|
.065
|
-.160
|
-1.656
|
.099
|
Specific health issues
|
-.158
|
.059
|
-.323
|
-2.687
|
.008
|
General Health
|
-.102
|
.060
|
-.215
|
-1.696
|
.091
|
Medical resources
|
-.081
|
.062
|
-.172
|
-1.317
|
.189
|
Governmental Funding
|
.158
|
.062
|
.251
|
2.526
|
.012
|
Products services shop
|
.097
|
.104
|
.211
|
.938
|
.349
|
Donations sponsors Grants
|
.052
|
.103
|
.122
|
.507
|
.613
|
Membership
|
.096
|
.109
|
.128
|
.881
|
.379
|
University and research funding
|
.178
|
.109
|
.273
|
1.635
|
.103
|
Partners/ mother profit org.
|
.133
|
.122
|
.110
|
1.087
|
.278
|
Advertisements
|
.070
|
.110
|
.087
|
.639
|
.523
|
Drug info
|
-.107
|
.079
|
-.117
|
-1.358
|
.176
|
Medical products
|
.003
|
.086
|
.002
|
.030
|
.976
|
Medical education
|
-.126
|
.073
|
-.175
|
-1.734
|
.084
|
Unique visits
|
1.395E-8
|
.000
|
.122
|
1.980
|
.049
|
Google Rank
|
.018
|
.009
|
.136
|
1.997
|
.047
|
Linked Websites
|
6.529E-6
|
.000
|
.045
|
.720
|
.472
|
Furthermore, for the purpose of examining the regression residuals for traces of heteroscedasticity and checking also the normality criterion of the residual errors, a scatterplot of the regression standardised residuals with the regression standardised predicted value for the residuals was formed. As it is obvious in the Figure 29 and more specifically in the histogram and the P-P plot, the residuals have very small deviation from the normal distribution, almost insignificant. Moreover, the scatterplot between the regression’s standardised residuals and the regression standardised predicted value for the residuals does not present any specific pattern and this strengthens more the assumption that there is no heteroskedasticity.
Figure : Examining the regression residuals for traces of heteroscedasticity and the normality criterion of the residual errors
Finally, in order to validate and test the generalizability of the linear regression model with a cross-validation analysis, the sample was split in subsamples in a 50% - 50% ratio as it is proposed by Bagley et al (2001), Godfrey (1985), Marill (2004), Schneider et al. (2010), and Field (2009). The validation results have shown that the model can predict the relationship between the dependent and the independent variables, since both subsamples had the same significance and the same significant independent variables and but it cannot predict the strength of this relationship since the R-squared of the one subsample was more than 5% different compared to the R-squared of the other subsample (Table 34).
Table : Cross validation of the “LOG10 Time spent on website” linear model
Model Summary
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
split = .00 (Selected)
|
split ~= .00 (Unselected)
|
1
|
.513
|
.212
|
.263
|
.098
|
.18467
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
split = .00 (Selected)
|
split ~= .00 (Unselected)
|
1
|
.596
|
.158
|
.355
|
.218
|
.19759
|
ANOVA
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
1.414
|
26
|
.054
|
1.595
|
.049
|
Residual
|
3.956
|
116
|
.034
|
|
|
Total
|
5.370
|
142
|
|
|
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
2.624
|
26
|
.101
|
2.585
|
.000
|
Residual
|
4.763
|
122
|
.039
|
|
|
Total
|
7.387
|
148
|
|
|
|
18.5Factors that Affect the Offering of Interactive Web Applications/ Services
Aiming to investigate the factors that affect the interactive web applications/ services’ offering from the medical websites, a logistic regression model was applied having as dependent variable the “Interactive applications/services offering” dummy variable. The independent variables/predictors were basic metrics of the medical websites performance, such as:
-
The global rank of the medical websites
-
The lifetime of the medical websites
-
The fraction of the visits to the medical website that were referred from search engines
-
The time that the users tend to spent on the medical website
-
The nature of the organization that supports/provides the medical website
-
The web presence of the medical website in social networks
-
The web presence of any certification
The hypothesis to be tested was that as the global rank of the website moves to better positions and/or the lifetime of the medical website increases and/or the time that the users spent on website increases, and/or the medical website is active in social networks, and/or the website is supported by non-for-profit organizations, the offering of interactive medical web-based services /services increases.
By examining the Logistic Regression Model Summary (Table 35), the values of the pseudo-R squared Cox and Snell R-square and Nagelkerke R-squared were taken under consideration. The Cox and Snell R-squared value reflects the improvement of the full model over the intercept model (the smaller the ratio, the greater the improvement). Since the Cox and Snell's pseudo R-squared has a maximum value that is less than one, the Nagelkerke R-square is used to adjust the value of the Cox and Snell's pseudo R-squared so that the range of possible values extends to 1.
Table : Logistic Regression Model Summary
Logistic Regression Model Summary
|
Model
|
2 Log likelihood
|
Cox & Snell R Square
|
Nagelkerke R Square
|
1
|
288.363
|
.193
|
.260
|
Moreover, the coefficients output (Table 36) shows that the variables that have significant impact on the dependent variable are:
-
The Fraction of visits from search engines
-
The Status (Online/ Offline)
-
The Web presence of a certification from an accredited organization Certification
-
The Category of the medical websites
To conclude, as the fraction of visits referred to from search engines increases, the website is online, belongs to the “drug information” category and present a certification from an accredited organization, the probability to offer interactive medical web-based services /services increases as well.
Table : Coefficients Output
Model
|
B
|
S.E.
|
Wald
|
df
|
Sig.
|
Global Rank of the medical websites
|
.000
|
.000
|
.747
|
1
|
.387
|
Fraction of visits referred from search engines
|
.033
|
.015
|
4.520
|
1
|
.034
|
Time spent on the medical website
|
.060
|
.104
|
.333
|
1
|
.564
|
Lifetime
|
.051
|
.039
|
1.718
|
1
|
.190
|
Status (Online/ Offline) of the medical website
|
1.380
|
.508
|
7.398
|
1
|
.007
|
Certification
|
.474
|
.362
|
1.710
|
1
|
.050
|
Social network’s web precense
|
.150
|
.301
|
.248
|
1
|
.619
|
Specific healthcare issues category
|
1.558
|
1.313
|
1.406
|
1
|
.236
|
General Health category
|
2.173
|
1.320
|
2.708
|
1
|
.100
|
Drug information category
|
2.947
|
1.534
|
3.690
|
1
|
.049
|
Medical Resources category
|
.860
|
1.312
|
.430
|
1
|
.512
|
Medical product category
|
-.546
|
1.711
|
.102
|
1
|
.749
|
Medical education category
|
1.126
|
1.359
|
.686
|
1
|
.407
|
Children health
|
1.743
|
1.479
|
1.388
|
1
|
.239
|
Non-for-Profit organization
|
-.073
|
.496
|
.022
|
1
|
.883
|
Profit organization
|
.044
|
.508
|
.007
|
1
|
.931
|
Constant
|
-4.051
|
1.629
|
6.182
|
1
|
.013
|
Share with your friends: |