Following the first ranking and comparison of the medical websites, using their global rank value as a ranking and comparison criterion, the next step involved the formation of a final rank of the websites and compare the “highest ranked of the highest ranked” and the “lowest ranked of the lowest ranked”. This step aims to discover the characteristic that this newly formed classification has and the factors that affect the grade of excellence and the sustainability of the medical websites, based on the factors that presented greater difference in the Aforementioned comparison of the higher and lower ranked medical websites in a feature selection concept.
In order to investigate the factors that affect the grade of excellence and the sustainability of the medical websites, the impact of the factors that presented greater difference in the Aforementionedcomparison will be used as predictors in a linear regression analysis process with dependent variable the “Global Rank” of the medical websites, described in the following sections (Figure 19). Variables such as the “Google rank” and “The percentage of the total Internet users visiting the website” were not used in the model as they were considered a priori highly correlated to the “Global rank” variable and thus they were used directly for the ranking of the medical websites.
Step 3: Choose the factors that present greater difference and run a linear regression model to measure their impact
Step 4: Take the significant variables from the regression outcome, rank the websites according to them and choose the 100 “highest ranked” and 100 “lowest ranked” websites from its rank. From each new lists choose those that appear in higher frequency among the lists.
Significant variable 1
Significant variable 2
Significant variable 3
100 “lowest ranked” websites
100 “Highest ranked”
Websites
100 “lowest ranked”
Websites
100 “Highest ranked”
Websites
100 “lowest ranked” websites
100 “Highest ranked”
Websites
Linear Regression Model
-
Website status
-
Percentage of visits from search engines
-
Number of linked websites
-
Nature of the organization that supports the website
-
Category of the website
-
Average Reputation
-
Source of income
-
Time on website
-
Other factors
Figure : Regression model and Final Rank formation
The significant variables from the output of the regression analysis was used then to rank again the medical websites, forming rankings for its one of the regression’s model significant variables. From each one of these newly formed website ranks the 100 highest and the 100 lowest ranked medical websites were chosen, then the frequency of these websites’ web presence into these lists was assessed (Figure 18). Afterwards, the websites that appeared in all or in the most of these ranking lists were identified, and two final ranks, one for the “highest ranked of the highest ranked” and one for the “lowest ranked of the lowest ranked” websites, was formed. The difference from the above section is that now the websites were chosen and ranked according to more criteria that influence excellence, rather than only their “Global Rank” variable value.
-
-
-
-
H0: The Factors that presented greater difference between the higher and lower ranked medical websites, have an impact on the “Global Rank” variable value.
H1: The Factors that presented greater difference between the higher and lower ranked medical websites, do not have an impact on he “Global Rank” variable value.
In order to form the final classification, the hypothesis that the variables that presented great difference between the higher and the lower ranked medical websites in the previous section are having an impact on the “Global rank” variable will be examined.
In order to test these hypotheses, a linear regression model was implemented using SPSS. The dependent variable was the “Global rank” value, while the “Specific healthcare issues”, the “General healthcare issues”, the “Medical resources”, the “Linked websites”, the “Governmental Funding” and the “University Funding/Research/Project Funding”, the “Time on website”, the ”Percentage of visits referred from search engines”, “the website status”, the “Lifetime”, the “Drug/Symptom checker”, the “Unique Visits”, the “Reputation of the website among the Alexa users”, the “Profit organizations”, the “Non-profit governmental organizations”, the “Social network”, and “the Facebook Likes” variables were used as independent variables/ predictors.
Extant literature on regression models in healthcare-related aspects (Bagley et al, 2001; Bender et al, 1996; Barros et al, 2003; Godfrey et al., 1985; Marill, 2004; Schneider et al., 2010) suggests that for a linear regression model to lead into valid results, 5 steps must be taken into consideration:
-
There are sufficient events per variable. This refers to the fact that there should be enough variables so that the model fits well to the data. Moreover, a rule of thumb is proposed for the Logistic regression. This rule’s expression is that the total of the least possible outcomes divided by the number of the predictors to have a result equal or greater to ten.
-
Conformity with linear gradient for continuous variables. Extant literature suggests that any change in a continuous independent variable used in the model “should have an effect on the log-odds of a positive outcome that is of the same magnitude”, regardless of the value of the predictor variable”.
-
Test for interactions. Knowledge of the domain of the study can help analysts/researchers to identify interactions between two variables that if included can influence, the model and their significance should be measured and reported.
-
Test for collinearity. If there are variables that are highly correlated, this can affect the precision of the estimation of their contributors in the model and cause the variance of these variables to be inflated.
-
Finally the validation of the model.
These methodological steps and suggestions will be strictly followed in this research in order to form a valid and comprehensive model. Moreover, According to Field A. (2009) and Montgomery D. et al (2006), the main assumptions of the linear regression that must hold are:
-
Normality of the dependent variable. The problem of the non-normality of the independent variables can be corrected by applying the linearity assumption.
-
Linear relationship between the dependent variable and the independent variable. If there is no linear relationship between the dependent and the independent variables, then the results of the model will have very large errors the will make the model invalid.
-
There must be no serial correlation (dependence) concerning the errors. Large serial correlation of the errors means that the model needs lot of improvement and is mis-specified.
-
Homoscedasticity of the errors. If homoscedasticity checks are not taken into consideration, this can make the estimation of the standard deviation of the errors resulting in very narrow or very wide confidence intervals.
According to the aforementioned criteria, the first step of preparing the data for the linear regression process is to check if the dependent variable follows a normal distribution. In our sample, the “Global Rank” variable presents a high positive Skewness and Kurtosis as it is presented in Figure 19 which presents the Histogram and the descriptive statistics of the variable’s values. In a standard normal distribution Skewness should be 0 and Kurtosis 3. In this case, looking at the histogram and the values of Skewness and Kurtosis, we cannot assume that there is normality and a transformation of the dependent must be done by estimating the decimal logarithm of the variable’s values.
N
|
316
|
Missing
|
1
|
Mean
|
1492058.40
|
Median
|
148745.00
|
Std. Deviation
|
3622875.188
|
Variance
|
1.313E13
|
Skewness
|
4.105
|
Std. Error of Skewness
|
.137
|
Kurtosis
|
19.188
|
Std. Error of Kurtosis
|
.273
|
Figure : Histogram and descriptive statistics for “Global rank” variable
Moreover, P-P plots and Q-Q plots were also useful in assessing if the data are following a specific distribution. More specifically, according to Field (2009), a P-P Plot graph (Figure 20) “…plots the cumulative probability of a variable against the cumulative probability of a particular distribution” by ranking the data and estimating a z score, which is the score that the rank should have in the normal distribution, for each rank while a Q-Q plot “...plots the quintiles of the data set instead of every individual score in the data”. As quintiles are defined specific values that can split the data set in to equal subsets (Field, 2009).
Figure : P-P and Q-Q plots of the “Global Rank” variable
Normality cannot be assumed and a transformation of the dependent was necessary. The decimal logarithm of the variable’s values was estimated in order to transform the dependent variable in accordance with the normality criterion that in standard normal distribution s Skewness should be 0 and Kurtosis 3, in order for the regression model to produce valid results and not lose its explanatory value.
N
|
316
|
Missing
|
1
|
Mean
|
1492058.40
|
Median
|
148745.00
|
Std. Deviation
|
3622875.188
|
Variance
|
1.313E13
|
Skewness
|
4.105
|
Std. Error of Skewness
|
.137
|
Kurtosis
|
-.360
|
Std. Error of Kurtosis
|
.274
|
Figure : Log10 “Global rank” histogram and descriptive statistics
After the transformation, the histogram presents an image much closer to the normal distribution. By conducting a Kolmogorov-Smirnov and Shapiro-Wilks (Table 15) normality tests, we observe that the p-value<0.05 of these tests indicates that they present statistical significance, so both the tests accept the null hypothesis which is that the distribution of the life expectancy values deviates significantly from the normal distribution.
Table : Tests of Normality
Variable
|
Kolmogorov-Smirnov
|
Shapiro-Wilk
|
Statistic
|
df
|
Sig.
|
Statistic
|
df
|
Sig.
|
LG10Global Rank
|
.071
|
315
|
.001
|
.975
|
315
|
.000
|
Although the normality tests indicate non-normality, since we have a large sample with more than 200 observations (N=315), it is considered better to look at the shape of the distribution and the values of the Skewness and Kurtosis rather than estimate their significance (Field A., 2009; MVP Programs-Normality Testing Guidelines, accessed 11/1/2012). The z-score statistic for the Skewness and the Kurtosis is considered useful to be calculated since it enables the analysts to compare the values of these metrics in different samples. According to Field (2009), “the z-score is simply a score from a distribution that has a mean of 0 and a standard deviation of 1”. In order to calculate the z-scores of the Skewness and the Kurtosis we will use the formula below subtracting the mean of the normal distribution that is 0 and dividing the result with the standard error of the Skewness or Kurtosis.
The Zskewness = 3 and the Zkurtosis = 1.313, so both values are below the threshold of 3.29, in which the values are significant at p<0.001, so we can assume normality (Field A., 2009).
The second assumption of the linear regression that it should be checked because it can affect the validity of the model is the presence of a linear relationship between the dependent and the continuous independent variables. In order to investigate the presence of a linear relationship between the dependent and the independent variables, the required scatterplots are created formed having on Y-axis the independent variable and on X-axis the dependents (Figure 22).
Figure : Scatterplots of the relationship between the dependent and the independents
Pearson’s R correlation coefficient is a measure of the presence and strength of a linear dependence/relationship between two variables, and its value ranges between 1 and -1 (Table 17). A negative value indicate of the Pearson’s R a negative linear relationship which implies that as the value of the one variable increases the value of the other value decreases respectively. The opposite happens in the case of a positive value or in other words a positive linear relationship.
Spearman's rank correlation coefficient or Spearman's rho, is a nonparametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other (Maritz. J.S., 1981).
Although the scatterplots give a first significant impression of the presence of a linear relationship between the dependent and the independent variables, in order to have a more accurate picture the Pearson’s R for the parametric data and Spearman’s rho correlation coefficients were calculated. Parametric are considered the data where there is an assumption of an underlying normal distribution while non-parametric are considered the data where there is no assumption of an underlying normal distribution.
By examining Table 17, presenting Pearson’s R and Spearman’s rho calculations, it is observed that there is a statistically significant and strong negative linear relationship between the dependent variable which is the decimal logarithm of the “Global Rank” variable’s values and the independent variable which is the “Reputation” of the medical website among the Alexa tool users. Moreover, while estimating the strength of the linear relationship between the dependent variable and the “Time that the users tend to spend on the medical website”, it is observed a statistically significant and moderate in strength, negative relationship. A similar situation is observed concerning the importance of the linear relationship among the dependent variable and the “Lifetime”, “Percentage of total Internet users visiting the website”, “Facebook likes”, “Unique visits” and “Linked websites” variable. That practically means that as the value of global rank decreases in absolute number, the websites are becoming better ranked (the better websites are ranked closer to #1 while the scale expands to a couple of millions for the lowest ranked websites). The rest of the independent values increases as the websites are becoming better ranked, showing that they have a positive impact on this rank improvement. As we are moving towards better ranked medical websites, the number of unique visits that the website has, the number of the linked websites and the time that the users tend to spend on it, as well as its reputation among the users and the duration of the website being online increases.
The Pearson’s R and Spearman’s rho show that there is no statistically significant linear relationship between the “Fraction of visits referred from search engines” variable and the dependent, so this variable should be transformed in order to be used in the regression model without affecting the validity of the model. The “Fraction of visits referred from search engines” variable cannot present any significantly linear relationship with the dependent variable, even after logarithmic, square root or inversion transformation and if it will be used in the regression the model will lose some of its explanatory power. This was considered to be an important predictor and it was used in the model, although it might led to loss of some of its explanatory value.
Table : Dependent-independent variables linear relationship estimations
|
LG10 Global Rank
|
Reputation
|
LG10 Global Rank
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.912**
|
Sig. (2-tailed)
|
.
|
.000
|
N
|
307
|
307
|
Reputation
|
Spearman’s rho Correlation Coefficient
|
-.912**
|
1.000
|
Sig. (2-tailed)
|
.000
|
.
|
N
|
307
|
312
|
|
LG10 Global Rank
|
Fraction of visits referred from search engines
|
LG10 Global Rank
|
Pearson’s R Correlation Coefficient
|
1
|
-.083
|
Sig. (2-tailed)
|
|
.186
|
N
|
307
|
258
|
Fraction of visits referred from search engines
|
Pearson’s R Correlation Coefficient
|
-.083
|
1
|
Sig. (2-tailed)
|
.186
|
|
N
|
258
|
258
|
|
LG10 Global Rank
|
Time spent on the website
|
LG10 Global Rank
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.578**
|
Sig. (2-tailed)
|
.
|
.000
|
N
|
307
|
279
|
Time spent on the website
|
Spearman’s rho Correlation Coefficient
|
-.578**
|
1.000
|
Sig. (2-tailed)
|
.000
|
.
|
N
|
279
|
279
|
|
LG10 Global Rank
|
Lifetime
|
LG10 Global Rank
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.128**
|
Sig. (2-tailed)
|
.
|
.025
|
N
|
307
|
307
|
Lifetime
|
Spearman’s rho Correlation Coefficient
|
-.128*
|
1.000
|
Sig. (2-tailed)
|
.025
|
.
|
N
|
307
|
307
|
|
LG10 Global Rank
|
Number of Unique Visits
|
LG10 Global Rank
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.691**
|
Sig. (2-tailed)
|
.
|
.000
|
N
|
307
|
286
|
Number of Unique Visits
|
Spearman’s rho Correlation Coefficient
|
-.691**
|
1.000
|
Sig. (2-tailed)
|
.000
|
.
|
N
|
286
|
307
|
|
LG10 Global Rank
|
Linked Websites
|
LG10 Global Rank
|
Spearman’s rho Correlation Coefficient
|
1.000
|
-.428**
|
Sig. (2-tailed)
|
.
|
.000
|
N
|
307
|
307
|
Linked Websites
|
Spearman’s rho Correlation Coefficient
|
-.428**
|
1.000
|
Sig. (2-tailed)
|
.000
|
.
|
N
|
307
|
316
|
Having checked the assumptions of linearity and normality, the next step of the data analysis involved the implementation of the linear regression model. The model was formed having as dependent variable the “Log10 Global Rank” values which were the decimal logarithm of the values of the “Global Rank” variable so as to conform with the normality assumption of linear regression. The independent variables that were used in the model were:
-
the “Specific healthcare issues”, the “General healthcare issues” that are levels of the “Category in which the website belong” variable
-
the “Medical resources” variable
-
the “Linked websites” variable
-
the “Governmental Funding” and the “University Funding/Research/Project Funding” variables that are levels of the “Source of Income” variable
-
the “Time on website” variable
-
the ”Percentage of visits referred from search engines” variable
-
“the Website status” variable
-
the “Lifetime” variable
-
the “Drug/Symptom checker” variable
-
the “Unique Visits” variable
-
the “Reputation of the website among the Alexa users” variable
-
the “Profit organizations”, the “Non-profit governmental organizations” as levels of the “Nature of the organization that supports the medical website” variable
-
the “Social network” variable
-
“the Facebook Likes” variable
Table 17 presents the linear regression model summary as well the ANOVA test performed using the Statistical Package for Social Sciences (SPSS) software. The R-squared value shows that the independent variables can account for 50.7% of the variation in the dependent variable, so there may be also other variables that could be added to the model to further explain the variation. Moreover, Adjusted R-squared indicates the loss of predictive value for the model and it shows the variance accounted on the dependent variable if the model had been derived from the population the sample was taken (Field A., 2009). Furthermore, ANOVA test presents an evaluation of the degree of prediction of the model. Since the value of the F-test is significant (p<0.01), this means that the model provides a significantly better prediction or improvement than if we had used the mean value of the dependent variable (Table 18). In addition to the afore mentioned statistical test results, since the value of the Durbin Watson test, which accounts for serial correlation, is substantially above 1 this indicates that there is independence of errors according to the assumptions of the linear regression.
Table : “LOG10 Global Rank” variable’s linear regression model summary
Linear Regression Model Summary
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
Durbin-Watson
|
1
|
0.712
|
0.507
|
0.459
|
0.92787
|
1.829
|
ANOVA
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
237.566
|
26
|
9.137
|
10.613
|
.000
|
Residual
|
230.730
|
268
|
0.861
|
|
|
Total
|
468.296
|
294
|
|
|
|
By examining the output linear regression analysis’ output (Table 18), it is observed that the web presence and activity of the medical websites in social networks, the time that the users tend to spend on them, the fraction of visits to the medical websites that are/were referred to from search engines, the governmental funding as source of revenue, the reputation of the medical website, the number of unique visits and the number of the linked websites are all significant variables that have an influence/impact on the logarithmic value of the global rank value and, in general, on the global rank value itself. The negative sign of the beta coefficient of these predictors implies that as they increase, the absolute value of the global rank decreases so an increase in those predictors leads to better ranked medical websites. As the Global rank value decreases (this is means that the minimum value is the highest rank), the traffic of the website, its popularity, the time users tend to spend on the website, the number of websites that include links of the target website, and the time it appears in search engines increases as well. These findings will be used for further analysis of the sample in order to end up with two final taxonomies of the “highest ranked of the highest ranked” and “lowest ranked of the lowest ranked” websites that can be compared in order to end up with the final criteria that enhance the excellence of the medical websites.
Table : Linear Regression Model Coefficients
Model
|
Unstandardized Coefficients
|
Standardized Coefficients
|
t
|
Sig.
|
B
|
Std. Error
|
B
|
(Constant)
|
6.450
|
.573
|
|
11.262
|
.000
|
Ask a Doctor
|
-.195
|
.209
|
-.044
|
-.935
|
.351
|
Find a docor or local care
|
.101
|
.151
|
.032
|
.664
|
.507
|
Drug or symptom checker
|
-.240
|
.212
|
-.058
|
-1.132
|
.259
|
Social Networks
|
-.235
|
.117
|
-.093
|
-2.016
|
.045
|
Fraction of visits from search engines
|
-.012
|
.005
|
-.103
|
-2.183
|
.030
|
Time on website
|
-.206
|
.039
|
-.254
|
-5.274
|
.000
|
Lifetime
|
-.022
|
.014
|
-.072
|
-1.606
|
.109
|
Facebook Likes
|
-3.117E-5
|
.000
|
-.053
|
-1.075
|
.283
|
General Health
|
.161
|
.285
|
.057
|
.565
|
.572
|
Specific health issues
|
.279
|
.282
|
.095
|
.989
|
.324
|
Medical resources
|
.318
|
.292
|
.112
|
1.088
|
.277
|
Profit
|
-.205
|
.161
|
-.078
|
-1.275
|
.203
|
Non-profit governmental
|
-.066
|
.306
|
-.016
|
-.214
|
.830
|
Governmental Funding
|
-1.039
|
.299
|
-.273
|
-3.471
|
.001
|
Products/services/shop
|
.222
|
.491
|
.080
|
.452
|
.651
|
Membership
|
.237
|
.515
|
.052
|
.459
|
.646
|
Donations sponsors grants
|
.167
|
.487
|
.065
|
.342
|
.732
|
University/research funding
|
-.716
|
.512
|
-.186
|
-1.399
|
.163
|
Partners/mother profit org.
|
.030
|
.572
|
.004
|
.053
|
.958
|
Advertisements
|
.185
|
.515
|
.038
|
.358
|
.721
|
Drug info
|
-.090
|
.375
|
-.016
|
-.239
|
.812
|
Medical products
|
.754
|
.419
|
.113
|
1.801
|
.073
|
Medical education
|
.025
|
.344
|
.006
|
.073
|
.942
|
Unique visits
|
-1.622E-7
|
.000
|
-.235
|
-4.708
|
.000
|
Linked Websites
|
-9.662E-5
|
.000
|
-.110
|
-2.154
|
.032
|
Reputation
|
-9.531E-7
|
.000
|
-.221
|
-4.800
|
.000
|
Moreover, in order to examine the regression residuals for traces of heteroscedasticity and checking also the normality criterion of the residual errors, the distribution of the regression residuals was checked and a scatterplot of the regression standardised residuals with the regression standardised predicted value for the residuals was formed. As it is obvious in Figure 23, the histogram and the P-P plot, the residuals have a very small deviation from the normal distribution, which is almost insignificant. In addition to this, the scatterplot between the regression’s standardised residuals and the regression standardised predicted value for the residuals does not present any specific pattern and strengthens more the assumption that there is no heteroscedasticity.
Figure : Examining heteroscedasticity and normal distribution of errors in linear regression
The final and one of the most important steps of the linear regression modeling is the model validation. In order to validate and test the generalizability of the linear regression model with a cross-validation analysis, the sample was split into subsamples in a 50% - 50% ratio as it is proposed by Bagley et al (2001), Godfrey (1985), Field (2009) and Marill (2004). The validation results have shown that the model can predict the relationship between the dependent and the independent variables since both subsamples had the same level of significance and the same significant independent variables, but it cannot predict the strength of this relationship since the R-squared of one subsample was much higher than the R-squared of the other subsample, presenting a difference more than 5% (Table 19).
Table : Cross validation of the “LOG10 Global rank” linear model
Model Summary
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
split = .00 (Selected)
|
split ~= .00 (Unselected)
|
1
|
.739
|
.576
|
.547
|
.446
|
.91351
|
Model
|
R
|
R Square
|
Adjusted R Square
|
Std. Error of the Estimate
|
split = .00 (Selected)
|
split ~= .00 (Unselected)
|
1
|
.862
|
.375
|
.744
|
.690
|
.72006
|
ANOVA
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
117.791
|
26
|
4.530
|
5.429
|
.000
|
Residual
|
97.638
|
117
|
.835
|
|
|
Total
|
215.429
|
143
|
|
|
|
Model
|
Sum of Squares
|
df
|
Mean Square
|
F
|
Sig.
|
Regression
|
186.669
|
26
|
7.180
|
13.847
|
.000
|
Residual
|
64.293
|
124
|
.518
|
|
|
Total
|
250.962
|
150
|
|
|
|
By examining the output of the validation model coefficients from the model validation, it was observed that the same independent variables have a statistically significant impact on the dependent variable. More specifically the “Social Networks”, the “Time on website”, the “Fraction of visits referred from search engines”, the “Unique Visits”, the “Governmental Funding”, and the “Linked websites” variables are significant. The impact’s direction and power are indicated by the sign and the value of the B coefficient. The B coefficient of all the aforementioned significant predictors is negative indicating a negative relationship with the dependent variable which is the decimal logarithm of the values of the “Global rank” variable and thus we can imply that these variables will also have a negative relationship with the “Global Rank” variable. This negative relationship present in the validation model is in accordance with the results derived from the afore-implemented linear regression model.
Formation and comparison of Final Taxonomies
After having defined the significant variables through the above linear regression model, the medical websites that comprised the dataset were ranked based on these variables. Ranking lists were formed from the highest ranked towards the lowest ranked medical websites for each one of the aforementioned significant variables, plus the Google rank and the percentage of total Internet users visiting the medical website that were considered a priori highly correlated with the dependent variable, since they are estimated using almost the same methodology and metrics based on the traffic flow of the website and thus the total number of users that are visiting the websites. The “Lifetime” variable was taken also in consideration although it was not statistically significant, but it was e used in order to rank more accurately the medical websites and to narrow down the number of the websites that were found to be “equal” according to all other criteria. The factors that affect the lifetime of the medical websites will be examined later using the same methodological concept and steps in order to identify the factors that enhance the survivability of the medical websites.
From each one of these newly formed ranking lists, the 100 first medical websites were chosen and that number consisted almost 33% of the sample data and comprehensive observations were made in order to identify the websites that appear in all or the most of these categories/lists. The outcome of this process was a ranking of the 40 medical websites whole performance was exceptional according to the seven criteria/variables used. The same methodology was used for the “lowest ranked” websites resulting in a final list of the 50 “lowest ranked” in performance, according to the chosen criteria, websites. The initial selection of the “lowest ranked” medical websites according to each category/variable as well as the formation of the final rank of the “lowest ranked” websites included a greater number of medical websites (150 and 50 respectively) than the rank for the “highest ranked websites” in order to avoid any misleading results due to the unavailable or not-able-to-be-accessed piece of information.
Comparing the variable’s measurements and some estimated metrics and percentages between the “highest ranked” and “lowest ranked” medical websites (Table 20), some very interesting results arose. To begin with, almost half (41%) of the “highest ranked” medical websites are having content related to general health issues while none has content related to the promotion or review of medical products. On the other hand in the case of the “lowest ranked” medical websites the observations are totally different. The percentage of the “lowest ranked” medical websites that have content related to general health issues is 24.4%. Moreover a quite high percentage of them is offering content specialised around specific healthcare problems (24.4%) and also a quite large number of the “lowest ranked” medical websites is offering content related to the promotion of various medical products. In addition, it is remarkable also that none among the “lowest ranked” medical websites is offering information related to drugs compared with the 14% of the “highest ranked” medical websites.
Furthermore, among the “highest ranked” medical websites, the majority of them (81.4%) is offering interactive medical applications with 6.9% offering an “Ask a doctor” interactive service, 16.2% offer “Find a doctor/local care” services, 32% are offering a “Drug/ Symptom checker” service and the finally the 32% offering other kind of interactive services such as body mass calculators, online diaries for various medical conditions etc. (Table 21). In contrast, only 55% of the “lowest ranked websites” are offering interactive web applications, with only 2% offering “Drug/ Symptom checkers” which constitutes a huge difference compared with the equivalent measurement of the “highest ranked” medical websites (Table 20).
As the comparison between the two ranks of the sample’s medical websites continues it is observed that the percentage of the medical web websites provided/supported by “for profit” organizations constituted 25.5% of the “highest ranked” and 49% of the “lowest ranked” medical websites while the percentage of the websites supported by non-profit private and governmental organizations is 53.1% for “lowest ranked and 72% for the “highest ranked” medical websites. In addition to the above, 34.8% of “highest ranked” medical websites have governmental support, compared to 6.1% of the “lowest ranked” websites.
Furthermore, it is observed that the “highest ranked” medical websites have greater web presence in social networks with 60% of them offering social network contact compared to 32% of the websites included in the other group and also have a tremendous difference in “Facebook likes” score having an average of 2000 likes compared to the average of 20 “likes” of the websites included in the “lowest ranked” group. In addition to this, the websites included in the “highest ranked” rank present greater average global and Google rank and reputation, more visits refereed from search engines and greater percentage in blog/online community offering. Finally, the “highest ranked” medical websites have on average greater percentage of visits referred to from search engines and the patient/users tend to spend the double of time on them compared to the average time they tend to spend on the websites included in the “lowest ranked” category (an average time of 4 minutes compared to an average time of 2 minutes).
Table : “Highest ranked” and “Lowest ranked” medical websites metrics
“Highest ranked” medical websites Metrics
|
“Lowest ranked” medical websites Metrics
|
Characteristic
|
Percentage/ Number
|
Characteristic
|
Percentage/ Number
|
Interactive service offering
|
81.4%
|
Interactive service offering
|
55.1%
|
Online websites
|
100%
|
Online websites
|
81.6%
|
Specific health problem
|
11.6%
|
Specific health problem
|
26.5%
|
General Health
|
41.9%
|
General Health
|
24.4%
|
Drug info
|
14%
|
Drug info
|
0%
|
Medical resources
|
18.6%
|
Medical resources
|
24.4%
|
Medical products
|
0%
|
Medical products
|
12.2%
|
Medical Education
|
6.7%
|
Medical Education
|
8.1%
|
Kids health
|
2.3%
|
Kids health
|
2%
|
Senior Health
|
0%
|
Senior Health
|
2%
|
Profit organizations
|
25.5%
|
Profit organizations
|
49%
|
Non-profit private organizations
|
37.2%
|
Non-profit private organizations
|
47%
|
Non-profit governmental
|
34.8%
|
Non-profit governmental
|
6.1%
|
Ask a doctor
|
6.9%
|
Ask a doctor
|
8.1%
|
Find a doc/local care
|
16.2%
|
Find a doc/local care
|
16.3%
|
Drug symptom checker
|
32.5%
|
Drug symptom checker
|
2%
|
Other
|
32.5%
|
Other
|
26.5%
|
Social network
|
60.4%
|
Social network
|
32.6%
|
Certification
|
23.2%
|
Certification
|
22.4%
|
Value for patients
|
72%
|
Value for patients
|
75.5%
|
Blog
|
21%
|
Blog
|
14.2%
|
Average global rank
|
6597
|
Average global rank
|
5148166
|
Average reputation
|
113284.6
|
Average reputation
|
444.5
|
Average % of visits referred form search engines
|
31%
|
Average % of visits referred form search engines
|
11%
|
Fast loading time
|
27.9%
|
Fast loading time
|
2%
|
Average loading time
|
46.5%
|
Average loading time
|
0%
|
Slow loading time
|
25.6%
|
Slow loading time
|
0%
|
Average time on website
|
4 min
|
Average time on website
|
1.3 min
|
Average lifetime
|
12.8 years
|
Average lifetime
|
9.8 years
|
Average fraction of total Internet users visiting the website
|
0.7%
|
Average fraction of total Internet users visiting the website
|
0.00016%
|
Average unique visits
|
223510
|
Average unique visits
|
11658
|
Average linked websites
|
2279.2
|
Average linked websites
|
69.7
|
Governmental funding
|
40.4%
|
Governmental funding
|
6.1%
|
Products/services/shop
|
14%
|
Products/services/shop
|
32.6%
|
Donations/sponsors/support from non-profit org./grants
|
16.3%
|
Donations/sponsors/support from non-profit org./grants
|
32.6%
|
Membership/registration
|
2.38%
|
Membership/registration
|
4%
|
University funding/research/project funding
|
2.38%
|
University funding/research/project funding
|
4%
|
Partners/mother profit org.
|
9.5%
|
Partners/mother profit org.
|
4%
|
Advertisements
|
9.5%
|
Advertisements
|
10.2%
|
Average Google rank
|
7.49
|
Average Google rank
|
4.7
|
Average Facebook likes
|
2000
|
Average Facebook likes
|
20.53
|
Share with your friends: |