18.4Other Factors that Affect the Excellence and the Survivability of the Medical Websites
-
-
Factors that Affect the Frequency of Web Presence of the Medical Websites in Search Engines
The frequency of web presence of the medical websites in the first pages of various search engines (Google.com, Yahoo.com etc.) can prove to be a crucial factor for their success, since medical websites can attract more users. In order to identify the factors that have a significant impact on the web presence of the medical websites in search engines, the same methodological steps will be applied, using as criterion-variable the values of the “Fraction of visits to the medical website referred to from search engines”.
By ranking the medical websites according to the fraction of their visits referred from search engines, in order to form the aforementioned final taxonomies of the “highest ranked of the highest ranked” and “lowest ranked of lowest ranked” web medical websites and choosing the 100 with the highest percentage of visits referred to from search engines and the 100 with the lowest, it is observed that almost the same factors that presented a great difference between the final taxonomies present a great difference in this case too.
Table : Metrics of the websites with the highest and lowest percentage of visits referred to from search engines
Medical websites with the highest percentage of visits referred from search engines
|
Medical websites with the lowest percentage of visits referred from search engines
|
Characteristic
|
Percentage/ Number
|
Characteristic
|
Percentage/ Number
|
Interactive service offering
|
74%
|
Interactive service offering
|
53%
|
Online websites
|
96%
|
Online websites
|
78%
|
Specific health problem
|
30%
|
Specific health problem
|
18%
|
General Health
|
31%
|
General Health
|
27%
|
Drug info
|
6%
|
Drug info
|
4%
|
Medical resources
|
22%
|
Medical resources
|
28%
|
Medical products
|
2%
|
Medical products
|
13%
|
Medical Education
|
4%
|
Medical Education
|
8%
|
Kids health
|
3%
|
Kids health
|
1%
|
Senior Health
|
2%
|
Senior Health
|
1%
|
Profit organizations
|
33%
|
Profit organizations
|
48%
|
Non-profit private organizations
|
51%
|
Non-profit private organizations
|
45%
|
Non-profit governmental
|
16%
|
Non-profit governmental
|
7%
|
Ask a doctor
|
13%
|
Ask a doctor
|
6%
|
Find a doc/local care
|
23%
|
Find a doc/local care
|
17%
|
Drug symptom checker
|
19%
|
Drug symptom checker
|
4%
|
Other
|
29%
|
Other
|
27%
|
Social network
|
54%
|
Social network
|
42%
|
Certification
|
26%
|
Certification
|
22%
|
Value for patients
|
72%
|
Value for patients
|
66%
|
Blog
|
21%
|
Blog
|
15%
|
Average global rank
|
440632
|
Average global rank
|
40668455
|
Average % of visits referred form search engines
|
37.4%
|
Average % of visits referred form search engines
|
4.6%
|
Average time on website
|
3 min
|
Average time on website
|
2 min
|
Average lifetime
|
11.4 years
|
Average lifetime
|
11.1 years
|
Average unique visits
|
890911
|
Average unique visits
|
326175
|
Average linked websites
|
927
|
Average linked websites
|
382
|
Governmental funding
|
20%
|
Governmental funding
|
8%
|
Products/services/shop
|
22%
|
Products/services/shop
|
40%
|
Donations/sponsors/support from non-profit org./grants
|
26%
|
Donations/sponsors/support from non-profit org./grants
|
18%
|
Membership/registration
|
9%
|
Membership/registration
|
7%
|
University funding/research/project funding
|
2%
|
University funding/research/project funding
|
10%
|
Partners/mother profit org.
|
7%
|
Partners/mother profit org.
|
4%
|
Advertisements
|
9%
|
Advertisements
|
9%
|
Average Google rank
|
6.3
|
Average Google rank
|
5.4
|
Average Facebook likes
|
892
|
Average Facebook likes
|
218
|
In order to examine the factors from our dataset that can affect that web presence on search engines or the fraction of visits to the websites that are referred from them a linear regression model will be implemented. Through this model, it is aimed to test the null hypothesis that the factors that presented greater difference between the medical websites with the highest percentage of visits referred to from search engines and the websites with the lowest affect indeed the dependent variable which is going to be the “Fraction of visits referred from search engines” variable. As independent variables/predictors will be used:
-
The website status (online/offline)
-
The interactive medical web-based services/applications offering
-
The web presence of the medical website in social networks
-
The global rank
-
The kind of interactive services
-
The category in which the medical website belongs to according to its content and services
-
The web presence of any kind of certification accredited to the medical website by well-known organizations
-
The number of unique visits and linked websites to the medical website
-
The nature/structure of the organization that supports and provides the medical website
-
The source of generated revenue stream from the medical website operations
According to the Aforementionedcriteria of the linear regression model, the first step of preparing the data for the application of the linear regression model is to check if the dependent variable is following a normal distribution. In our sample the “Fraction of visits referred from search engines” variable presents a high positive Skewness and Kurtosis as it is presented in Figure 24 which presents the histogram and the descriptive statistics of the variable’s values. In a standard normal distribution Skewness should be 0 and Kurtosis 3.
N
|
301
|
Missing
|
15
|
Mean
|
26.6049
|
Median
|
25.5000
|
Std. Deviation
|
10.78403
|
Variance
|
116.295
|
Skewness
|
-.038
|
Std. Error of Skewness
|
.140
|
Kurtosis
|
-.438
|
Std. Error of Kurtosis
|
.280
|
Figure : Histogram and descriptive statistics for the “Fraction of visits referred from search engines” variable
As it was also mentioned in previous sections, P-P plots and Q-Q plots are also useful in assessing if the data are following a specific distribution.
Figure : P-P and Q-Q plots of the “Fraction of visits referred from search engines” variable
Although the normality tests indicate, in the case of Kolgomorov- Smirnof test, non-normality and, in the case of Shapiro-Wilks test, normality, since the p-value of the second is insignificant at a 5% level and thus accepts the null hypothesis that there is normality, since we have a large sample with more than 200 observations (N=316) it is considered better to look at the shape of the distribution and the values of the Skewness and Kurtosis, rather than estimate their significance (Field, 2009; MVP Programs-Normality Testing Guidelines, accessed 11/1/2012). The z-score statistic for the Skewness and the Kurtosis is considered useful to be calculated since it enables us to compare the values of these metrics in different samples. According to Field (2009), “the z-score is simply a score from a distribution that has a mean of 0 and a standard deviation of 1”. In order to calculate the z-scores of the Skewness and the Kurtosis, we will use the formula below subtracting the mean of the normal distribution that is 0 and dividing the result with the standard error of the Skewness or Kurtosis.
The Zskewness = 0.2714 and the Zkurtosis = 1.56, so both values are below the threshold of 3.29 in which the values are significant at p<0.001, so we can assume normality (Field, 2009).
Table : Tests of Normality for the “Fraction of visits referred from search engines” variable
Variable
|
Kolmogorov-Smirnov
|
Shapiro-Wilk
|
Statistic
|
df
|
Sig.
|
Statistic
|
df
|
Sig.
|
LG10Global Rank
|
.066
|
301
|
.003
|
.992
|
301
|
.106
|
The next step of the data analysis process involves the assessment of the linear relationship between the dependent and the numeric independent variables by estimating Pearson’s R and Spearman’s rho correlation coefficients as it was performed in the case of the “Global Rank” dependent variable (Table 24). Variables such as the “Global Rank”, the “Google Rank”, “Lifetime” and the “Reputation” did not seem to present any significant linear relationship even after the transformation and we will avoid to use them in the analysis since the model will lose part of its explanatory strength.
Table : Pearson’s R estimation for the dependent-independent variables linear relationship
|
Fraction of visits referred from search engines
|
Percentage of total Internet users visiting the medical website
|
Fraction of visits referred from search engines
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.089*
|
Sig. (2-tailed)
|
.
|
.037
|
N
|
258
|
257
|
Percentage of total Internet users visiting the medical website
|
Spearman’s rho Correlation Coefficient
|
-.089
|
1.000
|
Sig. (2-tailed)
|
.037
|
.
|
N
|
257
|
258
|
|
Fraction of visits referred from search engines
|
Time spent on the website
|
Fraction of visits referred from search engines
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.226**
|
Sig. (2-tailed)
|
.
|
.000
|
N
|
258
|
253
|
Time spent on the website
|
Spearman’s rho Correlation Coefficient
|
-.226**
|
1.000
|
Sig. (2-tailed)
|
.000
|
.
|
N
|
253
|
258
|
|
Fraction of visits referred from search engines
|
Lifetime
|
Fraction of visits referred from search engines
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.128**
|
Sig. (2-tailed)
|
.
|
.025
|
N
|
307
|
307
|
Lifetime
|
Spearman’s rho Correlation Coefficient
|
-.128*
|
1.000
|
Sig. (2-tailed)
|
.025
|
.
|
N
|
307
|
307
|
|
Fraction of visits referred from search engines
|
Number of Unique Visits
|
Fraction of visits referred from search engines
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.178**
|
Sig. (2-tailed)
|
.
|
.005
|
N
|
258
|
242
|
Number of Unique Visits
|
Spearman’s rho Correlation Coefficient
|
-.026
|
1.000
|
Sig. (2-tailed)
|
.679
|
.
|
N
|
242
|
258
|
|
LG10 Global Rank
|
Linked Websites
|
LG10 Global Rank
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.228*
|
Sig. (2-tailed)
|
.
|
.032
|
N
|
258
|
258
|
Linked Websites
|
Spearman’s rho Correlation Coefficient
|
-.228*
|
1.000
|
Sig. (2-tailed)
|
.032
|
.
|
N
|
258
|
258
|
What seems interesting is the fact that the “Percentage of the total users visiting the medical website” has a significant negative linear correlation with the dependent variable which means that as the fraction of visits refereed from search engines increases, the total percentage of the Internet users that are visiting the website tend to decrease, and vice versa. This can be explained by taking into consideration the fact that well-known medical websites that attract a large number of Internet users as visitors/users/patients do not rely on the in web presence in search engines, but their quality is advertised through other ways such as in forums and conversations among the users or even via other websites that include links of these medical websites.
The next step of the analysis involved the implementation of the linear regression model. The R-squared value shows that the independent variables can account for 28.6% of the variation in the dependent variable so there might be also other variables that can explain the variation. ANOVA present an evaluation of the degree of prediction of the model. Since the value of the F-test is significant (p<0.01) this means that the model provides a significantly better prediction or improvement than if we had used the mean value of the dependent variable (Table 25). “Adjusted R-squared indicates the loss of predictive value for the model and it shows the variance accounted on the dependent variable if the model had been derived from the population the sample was taken” (Field, 2009). In addition to the above the value of the Durbin Watson test for serial correlation, since it is substantially above 1, shows that there is independence of errors according to the assumptions of the linear regression.
Table : Linear regression model summary for the “Percentage of visits referred from search engines” variable
Linear Regression Model Summary
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
Durbin-Watson
|
1
|
.535a
|
.286
|
.202
|
9.67895
|
1.935
|
ANOVA
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
8234.747
|
26
|
316.721
|
3.381
|
.000
|
Residual
|
20516.389
|
219
|
93.682
|
|
|
Total
|
28751.135
|
245
|
|
|
|
Examining the output of the linear regression model, it is observed that the current status of the medical website (online/offline), the governmental funding as source of income, the web presence of drug/symptom checkers as interactive medical applications, the percentage of the total Internet users that is visiting the website and the time that they tend to spend on it are significant variables/predictors that influence and have an impact on the dependent variable. The sign of the beta coefficient implies that this impact is negative in the case of time on website predictor implying that as the fraction of visits referred to from search engines increases the time that the users tend to spend on the website tends to decrease, while for the rest of the significant predictors the positive sign implies a positive relationship between them and the dependent variable.
Table : Percentage of visits referred from search engines” variable’s Linear regression outputs
Model
|
Unstandardized Coefficients
|
Standardized Coefficients
|
t
|
Sig.
|
B
|
Std. Error
|
B
|
(Constant)
|
26.286
|
6.870
|
|
3.826
|
.000
|
Service
|
2.803
|
1.608
|
.125
|
1.743
|
.083
|
Website status
|
6.341
|
2.854
|
.131
|
2.222
|
.027
|
Ask a Doctor
|
-2.525
|
2.334
|
-.072
|
-1.081
|
.281
|
Find a doctor or local care
|
-1.078
|
1.856
|
-.042
|
-.581
|
.562
|
Drug or symptom checker
|
5.659
|
2.510
|
.164
|
2.254
|
.025
|
Social Networks
|
.415
|
1.397
|
.019
|
.297
|
.767
|
Certification
|
-1.126
|
1.586
|
-.045
|
-.710
|
.478
|
Percentage of total Internet users
|
-.963
|
.484
|
-.132
|
-1.990
|
.048
|
LOG10 unique visits
|
1.503
|
.859
|
.149
|
1.749
|
.082
|
LOG10 Linked websites
|
-.846
|
1.202
|
-.060
|
-.704
|
.482
|
Specific health issues
|
-1.187
|
3.117
|
-.048
|
-.381
|
.704
|
General Health
|
-.763
|
3.147
|
-.031
|
-.242
|
.809
|
Medical resources
|
-1.966
|
3.257
|
-.077
|
-.604
|
.547
|
Profit
|
-2.791
|
1.930
|
-.123
|
-1.446
|
.150
|
Non-profit Governmental
|
-6.064
|
3.487
|
-.172
|
-1.739
|
.083
|
Governmental Funding
|
10.323
|
3.450
|
.317
|
2.992
|
.003
|
Products/services/shop
|
-1.872
|
5.237
|
-.079
|
-.358
|
.721
|
Donations/sponsors/grants
|
-2.804
|
5.234
|
-.126
|
-.536
|
.593
|
Membership
|
2.669
|
5.570
|
.070
|
.479
|
.632
|
University and research Funding
|
-1.945
|
5.512
|
-.057
|
-.353
|
.725
|
Partners/ mother profit org.
|
5.943
|
6.287
|
.098
|
.945
|
.346
|
Advertisements
|
-4.368
|
5.560
|
-.105
|
-.786
|
.433
|
Drug info
|
-7.154
|
4.265
|
-.153
|
-1.678
|
.095
|
Medical products
|
-4.999
|
4.748
|
-.082
|
-1.053
|
.294
|
Medical education
|
-3.258
|
3.800
|
-.088
|
-.857
|
.392
|
Time on website
|
-2.460
|
.453
|
-.357
|
-5.436
|
.000
|
Examining the regression residuals for traces of heteroscedasticity and checking also the normality criterion of the residual errors, the distribution of the regression residuals was examined and a scatterplot of the regression standardised residuals with the regression standardised predicted value for the residuals was formed. As it is obvious in the Figure 26 and more specifically on the histogram and the P-P plot, the residuals have very small deviation from the normal distribution almost insignificant. Moreover, the scatterplot between the regressions’s standardised residuals and the regression standardised predicted value for the residuals does not show any specific pattern and this strengthens more the assumption that there is no heteroskedasticity.
Figure : Examining heteroscedasticity and normal distribution of errors in linear regression model
In order to validate and test the generalizability of the linear regression model with a cross-validation analysis, we split the sample in subsamples in a 50% - 50% ratio as it is proposed by Bagley et al (2001), Godfrey (1985), Marill (2004), Schneider et al. (2010) and Field (2009). The validation results have shown that the model can predict the relationship between the dependent and the independent variables since both subsamples had the same significance and the same significant independent variables and it can also predict the strength of this relationship since the R-squared of the one subsample was not more than 5% different compared to the R-squared of the other subsample (Table 27).
Table : Cross validation of the linear regression model for the “Percentage of visits referred from engines” variable
Model Summary
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
split = .00 (Selected)
|
split ~= .00 (Unselected)
|
1
|
.613
|
.345
|
.376
|
.198
|
10.06533
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
split = .00 (Selected)
|
split ~= .00 (Unselected)
|
1
|
.612
|
.250
|
.375
|
.214
|
9.29657
|
ANOVA
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
5554.046
|
26
|
213.617
|
2.109
|
.005
|
Residual
|
9219.282
|
91
|
101.311
|
|
|
Total
|
14773.328
|
117
|
|
|
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
5230.055
|
26
|
201.156
|
2.327
|
.001
|
Residual
|
8729.055
|
101
|
86.426
|
|
|
Total
|
13959.110
|
127
|
|
|
|
Share with your friends: |