Methylome-wide comparison of human genomic DNA extracted from whole blood and from EBV transformed lymphocyte cell lines
Karolina Åberg (PhD)a*; Amit N. Khachane (PhD)a; Gábor Rudolf (PhD)a; Srilaxmi Nerella (MS)a; Douglas A. Fugman (PhD)b; Jay A. Tischfield (PhD)b; Edwin J.C.G. van den Oord (PhD)a
a Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, PO Box 980533, Richmond ,VA 23298, USA; b Department of Genetics, Rutgers University, 145 Bevier Road, Piscataway, NJ 08854, USA.
*Correspondence to: Karolina Åberg, Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, 1112 East Clay Street, P.O. Box 980533, Richmond, VA 23298.
Lymphocytes cell lines (LCLs) were established by separating lymphocytes from whole blood by centrifugation on a Nycoprep density gradient (Axis-Shield, Oslo, Norway) and transformed utilizing Epstein-Barr virus (EBV) isolated from B958 cell line (in house preparation) according to standard operating procedures in place at the Rutgers University Cell and DNA repository (RUCDR). Briefly the lymphocyte layer from the Nycoprep gradient was washed, re-suspended in culture medium with 25% fetal calf serum(FCS), RPMI-1640, 0.1% phytoheaglutinin (PHA), and incubated at 37°C, 5% CO2 in a humidified incubator with the EBV. Cultures were maintained by twice weekly examination and medium supplementation with 15% FCS/RPMI-1640 as needed. After 5-6 weeks when the transformed LCLs cultures exhibited the desired density of healthy aggregates, they were transferred to larger flasks for expansion for DNA extraction and cryopreservation. For all samples included in this investigation DNA was extracted from the LCLs at the same passage as when they were cryopreserved. DNA samples were also extracted directly from aliquots of WB from the same samples as were used to create the LCLs. DNA was extracted from LCL cultures or WB using AutoPURE LS auotomated DNA extractors (Qiagen) utilizing standard PureGene extraction reagents1,2. This is an inorganic, salt-precipitation (i.e., phenol free) method that eliminates the hazards of phenol exposure and alleviates environmental concerns. All buffers and reagents are standardized and meet Qiagen’s strict quality control procedures. DNA quality, of DNA from WB and from LCL, was verified using restriction enzyme digestions and agarose gel electrophoresis, PCR, and by UV spectroscopy according to RUCDR standard operating procedures. RUCDR maintains secure, state-of-the-art facilities where each operation is computerized to minimize sample mislabeling and/or cross-contamination.
Probes that have low variation between the two blood samples but a high variation between samples from different individuals will have a high probe correlation indicating a variable methylation site. A low probe correlation, the variation between the two samples from the same individual is high, is likely to indicate a methodological issue, such as a failing probe or an empty probe (a probe located in a genomic region without any methylation sites). To identify the variable methylation sites, we use a previously developed procedure 3. In short, the array signal yijk for biosample i on probe j and replicate number k can be written as:
yijk = mj + aij + eijk (Equation 1)
where mj is the average signal at probe j, aij the biosample specific deviation at probe j, and eij the measurement error for biosample i on probe j for replicate k. In this study, we obtained two replicates, k=1..2, and calculated for a given probe j the Pearson (product moment) correlation between the two replicates using the data from all biosamples. This correlation is labeled the “probe correlation”. It can be shown that the correlation for probe j across the two replicates equals:
where VAR(A)j and VAR(E)j are the variances of the methylation signals and error, respectively. This probe correlation is an index of the signal-to-error ratio, as it equals the biological variation in methylation signals across biosamples divided by the total variance.
The sample correlation for a given biosample i equals the correlation between the two replicates calculated across the data from all probes. Using assumptions similar to those upon which equation (Equation 2) is based, the sample correlation for biosample i measured on two occasions equals:
where VAR(M)i is the variance in methylation signals across all probes for biosample i and VAR(E)i is the variance in the measurement error across all probes for biosample i. If measurement error is large relative to differences among probes in their methylation status, in addition to observing low probe correlations, we would expect the sample correlations to be low.
Inter-correlation between adjacent probes
To investigate the methylation pattern, we combined highly inter-correlated adjacent probes into blocks. Differences in block structures indicate differences in the methylation pattern between WB DNA and LCL DNA. To create these blocks, we used a two-step algorithm. Starting with the first two probes in the p-telomer on each chromosome, we first calculated the inter-correlation between adjacent probes and kept adding probes to that “block” until the average inter-correlation dropped below a threshold of 0.5. The idea is that the methylation signal will span a larger chromosomal region but that altered methylation patterns may cause the inter-correlation to drop below our threshold, thereby producing multiple blocks. As poor probes (i.e., probes with a large measurement error) will also “break up” methylation patterns, we used a second step. In this second step, we calculated the average inter-correlation between probes in adjacent blocks. If the adjacent blocks were no further apart than 500 bp and their average inter-correlation was higher than our threshold of 0.5, we combined them again into a single block. The R script for block construction and the blocks created in this investigations are made available through the authors web sites http://www.people.vcu.edu/~kaaberg/