Application of multilevel models to the study of the individual digital divide in Russia
Introduction and prior research
The Digital Divide has emerged as an important research and policy issue during the past thirty years. It can be defined as an inequality in access to Information and Communication Technology (ICT), such as personal computers, Internet and mobile phones (Norris, 2001). The digital divide on the individual level is defined as inequality in access to and ability to use ICTs among individuals (Dewan & Riggins, 2005); it causes an inequality in the ability to obtain important information, which in turn leads to inequality in the ability to gain employment, participate in online communities, e-government, receive important health information etc. While extensive research already exists on this subject, most existing quantitative digital divide studies are limited to descriptive statistics or simple linear models.
Most of existing digital divide studies examine the difference in the access to ICTs by different demographic groups. Such studies in the USA demonstrate the existence of income, geographical and racial divides and some studies find a gender divide (Chakraborty & Bosman, 2005; Fairlie, 2004; Martin, 2003). Demunter (2005) in his study of the digital divide across countries included in the European Union concludes that age and education level predict the level of use of ICTs – individuals with higher income and higher levels of education have better access to ICTs. Gender is not a significant factor in the divide and the gender gap is widest in rural areas. Sciadas (2002) in a study of the digital divide in Canada also finds that income is a significant predictor of access to ICTs. The growth rate of Internet use is higher at lower income levels than at higher income levels. Willis and Tranter (2006) find income, age and the level of education to be predictors of the individual digital divide in Australia, whereas gender is found not to be a significant predictor.
Research on the digital divide in Russia, similarly to the overall individual divide research, is mostly qualitative in nature or includes basic descriptive statistics only. The same predictors that were found significant in studies of the digital divide in the USA, such as gender, age, and income, are identified as predictors of the digital divide in Russia (Beketov, 2009; Delitsin, 2006; Lihobabin, 2006). Individuals with lower income, older people and women have lower access to ICTs. In addition to these common problems, a language barrier affects the ability to use Internet content for most Russians – most Internet content exists in English and is inaccessible to most Russians (Beketov, 2009). Another aspect specific to the situation in Russia is a very significant regional divide (Beketov, 2009; Delitsin, 2006). A gap in the rate of use of ICTs arises between regional capitals and areas outside the capitals, with the highest rates of the ICT use concentrating in Moscow and St. Petersburg. Lack of competition in the telecommunication market and low purchasing power of the population are found to be reasons for the regional divide.
While studies described above shed some light on the problem of the individual digital divide in Russia, they are mostly qualitative and descriptive in nature. To the best of our knowledge, no detailed quantitative study addressing the problem of the individual digital divide in Russia exists. In this study we employ multilevel models to provide a comprehensive analysis of the state of the individual digital divide in Russia and assess differences among individuals in their use of Internet.
Data
We use data from the Russian Longitudinal Monitoring Survey (RLMS), a comprehensive living standards survey covering every year from 1992 to 2010. Data collection is conducted at the individual, household and communal level. The advantage of using this data source is the fact that it is nationally representative and the fact that the data provide variety of information.
The RLMS dataset includes several variables representing ownership and use of technology by individuasl. We chose to build models for the measures of the use vs. ownership of ICTs since ownership does not guarantee use and individuals gain advantage only when ICTs are actually used. Based on the high level of correlation between measures of the Internet and PC use (.85), we chose to select one of these variables - Used Internet in the last 12 months – as a target variable.
The choice of independent variables is guided by previous literature. Variables include gender, age, measures of education, marital status, nationality and the knowledge of a foreign language at the individual level; household income per capita at the household level, and the rural/urban indicator, availability of an Internet Café, steady mobile communications and several other measures at the population center (site) level. The complete list of variables included in the model is presented in Table 1. We also included several interactions in the model to determine the gender effect is different at different age levels. Originally the model included all of the independent variables listed in Table 1, but the final model includes only variables that turned out to be significant.
Table 1: Indicators used in the multilevel model
Variable name
|
Question
|
Level
|
|
Dependent Variables
|
Internet
|
Used Internet in the last 12 months?
|
Individual
|
Yes = 41% No = 59%
|
Independent Variables
|
ISGENDER - Male
|
Gender
|
Individual
|
Male = 42% Female=58%
|
ISBIRTHY - Age
|
Year of birth
|
Individual
|
Range: 18 - 104 Mean = 44
|
ISSPFLAN For_language
|
Do you speak any foreign language other than the languages of the USSR’s former republics?
|
Individual
|
Yes = 19% No = 81%
|
ISNATION -Russian
|
Nationality
|
Individual
|
Russian – 86% Non-Russian = 14%
|
ISHIEDUL
|
Highest educational level confirmed by certificate or diploma?
|
Individual
|
Incomplete secondary education = 26% Complete secondary education = 27% Completed professional school = 23% Bachelor , Masters or post graduate degree = 24%
|
ISMARIST
|
Marital status
|
Individual
|
Single = 21% Married = 53% Divorced = 13% Widowed = 13%
|
tincm_rs Income_per_cap
|
Real household income (000)/number of household members
|
Individual
|
Range: 0 - 391 Mean = 5.2
|
sett_typ
|
1=Urban 2=Small Urban 3=Rural
|
Site
|
Urban = 68% Small Urban = 6% Rural = 26%
|
Elec_inter
|
Frequent electricity interruptions
|
Site
|
Yes = 22% No = 78%
|
High_sp_itr
|
High speed Internet available
|
Site
|
Yes = 93% No = 7%
|
Inter_cafe
|
Internet Café
|
Site
|
Yes = 76% No = 24%
|
Steady_mob_comm
|
Steady Mobile connection
1=Yes 0 = No
|
Site
|
Yes = 95% No = 5%
|
Trip_to_mos_000
|
Cost of trip to Moscow (000)
|
Site
|
Mean = 3 Range: 0 - 20
|
sites
|
|
Site
|
|
Region
|
|
Region
|
|
Interactions
|
Age*Gender
|
|
|
|
Income*Gender
|
|
|
|
Foreign Language*Gender
|
|
|
|
Educational variables*Income
|
|
|
|
Urban*Gender
|
|
|
|
Urban*Income
|
|
|
|
Methodology – Multilevel models
Multilevel models are an extension of a regression model which takes into account the complex structure of the data. For example, in education, when analyzing the difference in students’ tests scores, one needs to account for the fact that the data are hierarchical – students belong to schools, schools belong to districts etc. Students in the same school tend to have some similarities due to the effect of the school; households in the same area tend to be more similar than households in different areas. In these cases the assumption of independence of observations is violated. Multilevel models take into consideration the hierarchical structure of the data, simultaneously analyze variables from different levels and correct the estimates of statistical significance, overstated by conventional regression models (Goldstein, 2002).
Multilevel models are used in many disciplines, such as education, psychology, medical research, economics etc. (Goldstein, 2007).
We build a 4-level model for the year 2010. The 4 levels include the individual level i (lowest level), the household level h, the site level s and the regional level r (highest level). The term “site” refers to a population center.
In general, the model takes the form:
where the dependent variable is the indicator of Internet use by the ith person from the hth household from the sth site from the rth region. The represent values of the wth explanatory variable measured for the ith person from the hth household, from the sth site, from the rth region. The represent intercepts for each household h from site s and region r. The represent coefficients for each household h from the site s from the region r for each independent variable. The term represents the level 1 random error that has mean 0 and variance . The term represents one independent variable, measured at the household level h at the site s from region r. The coefficients represent fixed regression coefficients, the and are random residual terms at the household level (they have mean 0 and are independent from the term) with constant variance. The terms and represent random residual terms at the site level which have mean zero and are independent from . The terms and represent random residual error terms at the regional level. This model allows the linear combination for each individual to shift from the overall linear combination by an amount of Also we allow the coefficients of some of the covariates to shift by a random amount.
Results
A final model was selected based on the lowest value of the Bayesian Deviance Information Criterion (DIC). Results of the Internet use model are presented in Table 2.
Results indicate that holding all other variables constant, married people have estimated odds of Internet use approximately 63% higher than those of single people; and those who are divorced have estimated odds of Internet use approximately 20% higher than single people. Ability to understand a foreign language, being younger, having Russian nationality, having higher income, living in an urban area and having education all positively affect the level of Internet use. Being Russian, compared to other nationalities, increases the estimated odds of Internet use by approximately 46%. The higher the level of education, the higher the level of Internet use – the magnitude of coefficients increases gradually with each educational category, compared to the level of Internet use by individuals with incomplete secondary education. The level of education has the highest influence on the Internet use among all of the predictors used in the model. Holding all other predictors constant, having at least a bachelor degree or completed professional education increases the odds of using the Internet by 9 and 2 times respectively, compared to those who have not completed secondary education.
Several population center level variables are significant in the model. The odds of using the Internet almost double for those who live in an area with an Internet café available. Those who live in an area with steady mobile communications have higher odds by 61% then those who do not. One possible explanation of this result is that steady mobile communication in the area coincides with other infrastructure developments, which in turn increase the odds of using the Internet. Living in an urban area increases the level of Internet use, compared to living in a rural area) by 57%. We used a cost of a trip to Moscow as a proxy for remoteness and not surprisingly, it is significant. The higher the cost of a trip to Moscow is, the lower the odds are of using the Internet.
Interestingly, the effect of gender is only significant in interaction with the urban effect. This means that holding all other variables constant, a positive effect of an urban area on the odds of using the Internet is higher for males than for females. This effect is true even outside the model: the rate of the Internet use for both men and women in rural and small urban areas is very low – only approximately 25% of both men and women have used the Internet in the past 12 months. Whereas in urban areas, the rate of the Internet use is higher for males than for females: 45% of women and 53% of men have used the Internet in the past 12 months.
Table 2 Results*
Fixed effects
|
Variable
|
Coefficient
|
St Error
|
t value
|
Log (Odds)
|
Foreign language
|
0.738
|
0.063
|
11.71
|
2.09
|
Age
|
-0.109
|
0.002
|
54.5
|
0.9
|
Russian
|
0.376
|
0.079
|
4.76
|
1.46
|
Higher Education 1
|
2.275
|
0.072
|
31.6
|
9.73
|
Completed Secondary Education1
|
0.353
|
0.064
|
5.52
|
1.42
|
Completed Professional education1
|
1.027
|
0.068
|
15.10
|
2.79
|
Urban 2
|
0.448
|
0.173
|
2.6
|
1.57
|
Divorced3
|
0.184
|
0.074
|
2.5
|
1.20
|
Married3
|
0.486
|
0.055
|
8.84
|
1.63
|
Male*Urban
|
0.291
|
0.057
|
5.11
|
1.34
|
Log income per capita
|
0.687
|
0.030
|
22.9
|
1.99
|
Steady mobile communication
|
0.482
|
0.153
|
3.15
|
1.62
|
Cost of trip to Moscow (000)
|
-0.50
|
0.017
|
29.4
|
0.61
|
Internet Café
|
0.687
|
0.141
|
4.87
|
1.99
|
Random effects
|
|
|
|
|
Site
|
0.357
|
0.068
|
5.25
|
1.43
|
Intercept
|
-4.478
|
0.257
|
17.42
|
0.01
|
1 Reference category – Incomplete Secondary School
2 Reference category – Rural
3 Reference category – Single
*DIC=12327, n=16922, only site random effects were significant
Interestingly, even after accounting for all above mentioned factors, regional effects became insignificant, but site effects are still significant, which indicates that location itself affects the odds of Internet use. Figure 1 shows the magnitude of this effect and the ranks of residuals for all the sites included in the dataset.
Figure 1 Site residuals ranking
Site effects holding all predictors constant are listed in Appendix 1, sorted from the highest negative to the highest positive value. The RLMS excludes site names in order to protect the privacy of the respondents, so sites are identified by region names to which they belong and site numbers.
Not surprisingly, Moscow, Moscow Oblast, St. Petersburg city are among the regions with the highest positive effect on the level of Internet use, even after accounting for all variables included in the regression model. Three Moscow Oblast sites and three sites from St. Petersburg: Leningrad Oblast: Volosovskij Rajon are in the top 15 sites with the highest positive effect on the level of the Internet use. Other sites with the highest positive effects are Vladivostok, Krasnodarskij Kraj: Kushchevskij Rajon, Khanty-Mansiiskij AO: Surgut CR and Krasnojarskij Kraj: Nazarovo CR. Interestingly, one site from Komi ASSR: Usinsk CR region has one of the highest positive effects and one site has one of the highest negative effects. This indicates a high level of diversity in terms of the Internet use within the region.
Multiple sites with the highest negative effects belong to the same regions: Volgograd Oblast: Rudnjanskij Rajon, Kalinin Oblast: Rzhev CR, Amurskaja Oblast: Arkharinhskij Rajon and Altaiskij Kraj: Biisk CR. This indicates that these regions have consistently negative effect on the level of the Internet use. Some of these sites have some of the highest cost of a trip to Moscow, which is a measure of site remoteness. However, even after accounting for this effect in the model these sites still have a high negative effect.
Discussion
We find similarities and differences between significant predictors of the use of the Internet in Russia and other countries. Similarly to studies in the US, EU, Australia and Canada we find an income divide in the rate of use of the Internet. The geographical divide we find in our study is also significant in the US. The level of education is also found to be significant in studies in the EU, Australia and Netherlands. Gender was found to be insignificant in most studies and the greatest gender differences were found in rural areas. We find the opposite result in our study: the gender divide is the greatest in urban areas with men more likely to use Internet than women. However, the effect of gender on the use of mobile phones is reversed: women are more than men likely to use mobile phones.
Our results support some of the findings in previous studies in Russia. We confirm that age, income, ability to speak a foreign language, identified as important in previous studies, are significant predictors of the Internet use. We also find a significant regional divide, identified as important in previous studies. We confirm the existence of a gender divide, but find that the effect is greatest in urban areas. While the gender effect in rural areas is smaller, the rate of the Internet use by both genders in rural areas is very low. We find other significant predictors of the Internet use not used in previous studies, such as Russian nationality, marital status and population center characteristics, such as availability of an Internet Café, steady mobile communication and remoteness. We are able to account for these differences while holding other predictors constant, extending past studies which used descriptive statistics. We are also able to identify and quantify effects of different population centers on the rate of the Internet and Mobile phone use.
This study has a number of limitations. The most significant limitation is data availability. The RLMS dataset includes a sample of 38 Russian regions that were selected out of total of 1,850 regions. This limits our ability to explore the spillover effects of population center on one another. Another data limitation is the limited availability of data related to technology use by individuals. For instance, we have very limited information on the extent of the use of the Internet by individuals.
Appendix 1: Site effects on Internet use
*Includes 10 sites with the highest positive and highest negative results
Region name
|
Site #
|
Site residual
|
Rank
|
Lipetskaya Obl: Lipetsk CR
|
72
|
-1.403
|
1
|
Kalinin Obl: Rzhev CR
|
68
|
-0.988
|
2
|
Volgograd Obl: Rudnjanskij Rajon
|
41
|
-0.984
|
3
|
Kalinin Obl: Rzhev CR
|
67
|
-0.899
|
4
|
Penzenskaya Obl: Zemetchinskij Rajon
|
121
|
-0.823
|
5
|
Amurskaja Obl: Arkharinhskij Rajon
|
96
|
-0.800
|
6
|
Kabardino-Balkarija, Zolskij Rajon
|
77
|
-0.773
|
7
|
Komi ASSR: Usinsk CR
|
89
|
-0.764
|
8
|
Volgograd Obl: Rudnjanskij Rajon
|
42
|
-0.738
|
9
|
Amurskaja Obl: Arkharinhskij Rajon
|
99
|
-0.713
|
10
|
|
|
|
|
|
|
|
|
Moscow City
|
138
|
0.557
|
138
|
Khanty-Mansiiskij AO: Surgut CR
|
162
|
0.686
|
142
|
Moscow Obl
|
152
|
0.726
|
143
|
Komi ASSR: Usinsk CR
|
91
|
0.774
|
144
|
St. Petersburg: Leningrad Oblast: Volosovskij Rajon
|
8
|
0.813
|
145
|
Krasnojarskij Kraj: Nazarovo CR
|
76
|
0.821
|
146
|
St. Petersburg
|
141
|
0.878
|
147
|
St. Petersburg: Leningrad Obl: Volosovskij Rajon
|
6
|
1.014
|
148
|
Moscow Obl
|
178
|
1.169
|
149
|
St. Petersburg: Leningrad Obl: Volosovskij Rajon
|
4
|
1.169
|
150
|
Moscow Obl
|
143
|
1.177
|
151
|
References
Beketov, H. B. 2009. The role of the digital diversity and the digital divide in Russian development.
Chakraborty, J., & Bosman, M. M. 2005. Measuring the Digital Divide in the United States: Race, Income, and Personal Computer Ownership. The Professional Geographer, 57(3): 395-410.
Delitsin, L. L. 2006. The problem of digital divide and development potential of the Internet in Russia.
Demunter, C. 2005. The Digital Divide in Europe. Statistics in Focus, 38: 1-7.
Dewan, S., & Riggins, F. 2005. The Digital Divide: Current and Future research Directions. Journal of the Association for Information Systems, 6(12): 298-337.
Fairlie, R. W. 2004. Race and the Digital Divide. Contributions to Economic Analysis & Policy, 3(1): 1-38.
Goldstein, H. 2002. Multilevel Statistical Models: A Hodder Arnold Publication.
Goldstein, H. 2007. Becoming familiar with multilevel modeling. Significance, 4(3): 133-135.
Lihobabin, M. Y. 2006. Gender Determinants of Information Societies.
Martin, S. P. 2003. Is the Digital Divide Really Closing? A Critique of Inequality Measurement in a Nation Online. IT & Society, 1(4): 1-13.
Norris, P. 2001. Digital Divide? Civic Engagement, Information Poverty and the Internet in the Democratic Societies. New York: Cambridge University Press.
Willis, S., & Tranter, B. 2006. Beyond the 'Digital Divide'. Journal of Sociology, 42(1): 43-59.
Share with your friends: |