An assessment of nucleic acid amplification testing for active mycobacterial infection



Download 12.08 Mb.
Page9/56
Date28.05.2018
Size12.08 Mb.
#51325
1   ...   5   6   7   8   9   10   11   12   ...   56


Summary of the process used to identify and select studies for the impact of early versus delayed treatment for TB

Figure 9 Summary of the process used to identify and select studies for the impact of early versus delayed treatment for TB



Summary of the process used to identify and select studies for the impact of inappropriate antibiotic use

Figure 10 Summary of the process used to identify and select studies for the impact of inappropriate antibiotic use


Data extraction and analysis


A profile of key characteristics was developed for each included study (see Appendix F). Each study profile provides the level of evidence, design and quality of the study, authors, publication year, location, criteria for including/excluding patients, study population characteristics, type of intervention, comparator intervention and/or reference standard (where relevant), and outcomes assessed. Studies that could not be retrieved or that met the inclusion criteria but contained insufficient or inadequate data for inclusion are listed in Appendix G. Definitions of all technical terms and abbreviations are provided in the Glossary (page 28). Descriptive statistics were extracted or calculated for all safety and effectiveness outcomes in the individual studies.

Assessing diagnostic accuracy


To assess the diagnostic accuracy of NAAT, studies were only included if they provided data that could be extracted into a classic 2×2 table (Table 11), in which the results of the index diagnostic test or the comparator were cross-classified against the results of the reference standard (Armitage, Berry & Matthews 2002; Deeks 2001), and Bayes’ Theorem was applied:

Table 11 Diagnostic accuracy data extraction for NAAT



-

-

Reference standard

(culture ± DST)-

-

-

-

Disease +

Disease –

-

Index test (NAAT)

Test +

true positive

false positive

Total test positive

Or comparator (AFB)

Test –

false negative

true negative

Total test negative

-

-

Total with MTB or NTM

Total without MTB or NTM

-

AFB = acid-fast bacilli; DST = drug susceptibility testing; MTB = Mycobacterium tuberculosis; NAAT = nucleic acid amplification testing; NTM = non-tuberculous mycobacteria
Primary measures

Test sensitivity was calculated as the proportion of people with MTB or NTM infections (as determined by the reference standard) who had a positive test result using AFB and/or NAAT:

Sensitivity (true positive rate) = number with true positive result / total with MTB or NTM infections

Test specificity was calculated as the proportion of people without infection (as determined by reference standard) who had a normal test result using AFB and/or NAAT:

Specificity (true negative rate) = number with true negative result / total without MTB or NTM infections

The 95%CI was calculated by exact binomial methods.

Positive and negative likelihood ratios (LR+ and LR–) were also reported. These ratios measure the probability of the test result in patients with MTB or NTM infections compared with those without.

LR+ = sensitivity / 1 – specificity

LR– = 1 – sensitivity / specificity

An LR of 1 means that the test does not provide any useful diagnostic information, whereas LR+ > 5 and LR– < 0.2 can suggest strong diagnostic ability (MSAC 2005).

Summary measures

Diagnostic test accuracy meta-analysis was undertaken to assess the accuracy of NAAT compared with AFB microscopy in the diagnosis of MTB or NTM infections, compared with culture, using Stata version 12 (Stata Corporation 2013). Only studies that provided raw (2×2) data were included. Summary receiver–operator characteristic (SROC) curves, forest plots and LR scattergrams were generated using the ‘midas’ command in Stata, which requires a minimum of four studies for analysis and calculates summary operating sensitivity and specificity (with confidence and prediction contours in SROC space). Heterogeneity was calculated using the formula I2 = 100% x (Q – df)/Q, where Q is Cochran's heterogeneity statistic and df is the degrees of freedom (Higgins et al. 2003). Summary estimates for sensitivity, specificity, LR+ and LR– were also calculated. Confidence intervals were computed assuming asymptotic normality after a log transformation for variance parameters and for LR+ and LR–.

Subgroup analyses were performed for results according to specimen type, incidence of TB in the study population and the presence of an HIV infection.

Where meta-analysis could not be performed, the median (range) sensitivity and specificity values were calculated.

Appraisal of the evidence


Appraisal of the evidence was conducted in three stages:

Stage 1: Appraisal of the applicability and quality of individual studies included in the review (strength of the evidence).

Stage 2: Appraisal of the precision, size of effect and clinical importance of the results for primary outcomes in individual studies—used to determine the safety and effectiveness of the intervention.

Stage 3: Integration of this evidence for conclusions about the net clinical benefit of the intervention in the context of Australian clinical practice.


Stage 1: strength of the evidence


The evidence presented in the selected studies was assessed and classified using the dimensions of evidence defined by the National Health and Medical Research Council (NHMRC 2000).

These dimensions (Table 12) consider important aspects of the evidence supporting a particular intervention and include three main domains: strength of the evidence, size of the effect and relevance of the evidence. The first domain is derived directly from the literature identified as informing a particular intervention; the last two each require expert clinical input as part of its determination.

Table 12 Evidence dimensions

Type of evidence

Definition

Strength of the evidence:

Level
Quality

Statistical precision


The study design used, as an indicator of the degree to which bias has been eliminated by design a

The methods used by investigators to minimise bias within a study design

The p-value or, alternatively, the precision of the estimate of the effect. It reflects the degree of certainty about the existence of a true effect


Size of effect

The distance of the study estimate from the ‘null’ value and the inclusion of only clinically important effects in the confidence interval

Relevance of evidence

The usefulness of the evidence in clinical practice, particularly the appropriateness of the outcome measures used

a See Table 13

The three sub-domains (level, quality and statistical precision) are collectively a measure of the strength of the evidence.

The ‘level of evidence’ reflects the effectiveness of a study design to answer a particular research question. Effectiveness is based on the probability that the design of the study has reduced or eliminated the impact of bias on the results. The NHMRC evidence hierarchy provides a ranking of various study designs (‘levels of evidence’) by the type of research question being addressed (Table 13).

Table 13 Designations of levels of evidence according to type of research question (including table notes)



Level

Intervention a

Diagnostic accuracy b

I c

A systematic review of level II studies

A systematic review of level II studies

II

A randomised controlled trial

A study of test accuracy with: an independent, blinded comparison with a valid reference standard d, among consecutive persons with a defined clinical presentation e

III-1

A pseudo-randomised controlled trial

(i.e. alternate allocation or some other method)



A study of test accuracy with: an independent, blinded comparison with a valid reference standard d, among non-consecutive persons with a defined clinical presentation e

III-2

A comparative study with concurrent controls:

▪ non-randomised, experimental trial f

▪ cohort study

▪ case-control study

▪ interrupted time series with a control group


A comparison with reference standard that does not meet the criteria required for level II and III-1 evidence

III-3

A comparative study without concurrent controls:

▪ historical control study

▪ two or more single arm study g

▪ interrupted time series without a parallel control group



Diagnostic case-control study e

IV

Case series with either post-test or pre-test/post-test outcomes

Study of diagnostic yield (no reference standard) h

Source: Merlin, Weston & Tooher (2009)

Explanatory notes:



a Definitions of these study designs are provided on pages 7-8 in ‘How to use the evidence: assessment and application of scientific evidence’ (NHMRC 2000) and in the accompanying Glossary.

b These levels of evidence apply only to studies assessing the accuracy of diagnostic or screening tests. To assess the overall effectiveness of a diagnostic test there also needs to be a consideration of the impact of the test on patient management and health outcomes (MSAC 2005; Sackett & Haynes 2002). The evidence hierarchy given in the ‘Intervention’ column should be used when assessing the impact of a diagnostic test on health outcomes relative to an existing method of diagnosis/comparator test(s). The evidence hierarchy given in the ‘Screening’ column should be used when assessing the impact of a screening test on health outcomes relative to no screening or alternative screening methods.

c A systematic review will only be assigned a level of evidence as high as the studies it contains, excepting where those studies are of level II evidence. Systematic reviews of level II evidence provide more data than the individual studies and any meta-analyses will increase the precision of the overall results, reducing the likelihood that the results are affected by chance. Systematic reviews of lower level evidence present results of likely poor internal validity, and thus are rated on the likelihood that the results have been affected by bias rather than whether the systematic review itself is of good quality. Systematic review quality should be assessed separately. A systematic review should consist of at least two studies. In systematic reviews that include different study designs, the overall level of evidence should relate to each individual outcome/result, as different studies and study designs might contribute to each different outcome.

d The validity of the reference standard should be determined in the context of the disease under review. Criteria for determining the validity of the reference standard should be pre-specified. This can include the choice of the reference standard(s) and its timing in relation to the index test. The validity of the reference standard can be determined through quality appraisal of the study (Whiting et al. 2003).

e Well-designed population based case-control studies (e.g. screening studies where test accuracy is assessed on all cases, with a random sample of controls) do capture a population with a representative spectrum of disease and thus fulfil the requirements for a valid assembly of patients. However, in some cases the population assembled is not representative of the use of the test in practice. In diagnostic case-control studies a selected sample of patients already known to have the disease is compared with a separate group of normal/healthy people known to be free of the disease. In this situation patients with borderline or mild expressions of the disease, and conditions mimicking the disease are excluded, which can lead to exaggeration of both sensitivity and specificity. This is called spectrum bias or spectrum effect because the spectrum of study participants will not be representative of patients seen in practice (Mulherin & Miller 2002).

f This also includes controlled before-and-after (pre-test/post-test) studies, as well as adjusted indirect comparisons (i.e. use A vs B and B vs C, to determine A vs C with statistical adjustment for B).

g Comparing single arm studies i.e. case series from two studies. This would also include unadjusted indirect comparisons (i.e. use A vs B and B vs C, to determine A vs C but where there is no statistical adjustment for B).

h Studies of diagnostic yield provide the yield of diagnosed patients, as determined by an index test, without confirmation of the accuracy of this diagnosis by a reference standard. These may be the only alternative when there is no reliable reference standard.

Note A: Assessment of comparative harms/safety should occur according to the hierarchy presented for each of the research questions, with the proviso that this assessment occurs within the context of the topic being assessed. Some harms (and other outcomes) are rare and cannot feasibly be captured within RCTs, in which case lower levels of evidence may be the only type of evidence that is practically achievable; both physical and psychological harms may need to be addressed by different study designs; harms from diagnostic testing include the likelihood of false positive and false negative results; harms from screening include the likelihood of false alarm and false reassurance results.

Note B: When a level of evidence is attributed in the text of a document, it should also be framed according to its corresponding research question, e.g. level II intervention evidence; level IV diagnostic evidence; level III-2 prognostic evidence.

Note C: Each individual study that is attributed a ‘level of evidence’ should be rigorously appraised using validated or commonly used checklists or appraisal tools to ensure that factors other than study design have not affected the validity of the results.

Source: Hierarchies adapted and modified from: Bandolier editorial (1999); NHMRC (1999); Phillips et al. (2001).

Individual studies assessing diagnostic effectiveness were graded according to pre-specified quality and applicability criteria (MSAC 2005), as shown in Table 14.

Table 14 Grading system used to rank included studies

Validity criteria

Description

Grading system

Appropriate comparison

Did the study evaluate a direct comparison of the index test strategy versus the comparator test strategy?

C1 direct comparison

CX other comparison



Applicable population

Did the study evaluate the index test in a population that is representative of the subject characteristics (age and sex) and clinical setting (disease prevalence, disease severity, referral filter and sequence of tests) for the clinical indication of interest?

P1 applicable

P2 limited

P3 different population


Quality of study

Was the study designed to avoid bias?

High quality = no potential for bias based on pre-defined key quality criteria

Medium quality = some potential for bias in areas other than those pre-specified as key criteria

Poor quality = poor reference standard and/or potential for bias based on key pre-specified criteria



Q1 high quality

Q2 medium quality

Q3 poor reference standard

poor quality

or insufficient information


The appraisal of intervention studies pertaining to treatment safety and effectiveness was undertaken using the Downs and Black (1998) checklist, which was used for trials and cohort studies. Studies of diagnostic accuracy were assessed using the QUADAS-2 quality assessment tool (Whiting et al. 2011), whereas SRs included in the last step of the linked analysis were assessed with the PRISMA checklist (Liberati et al. 2009).

Stage 2: precision, size of effect and clinical importance


Statistical precision was determined using statistical principles. Small CIs and p-values give an indication as to the probability that the reported effect is real and not attributable to chance (NHMRC 2000). Studies need to be appropriately powered to ensure that a real difference between groups will be detected in the statistical analysis.

For intervention studies it was important to assess whether statistically significant differences between patients receiving intervention and comparator were also clinically important. The size of the effect needed to be determined, as well as whether the 95%CI included only clinically important effects.

The outcomes being measured in this report were assessed as to whether they were appropriate and clinically relevant (NHMRC 2000).

Stage 3: Assessment of the body of evidence


Appraisal of the body of evidence was conducted along the lines suggested by the NHMRC in their guidance on clinical practice guideline development (NHMRC 2009). The five components considered essential by the NHMRC when judging the body of evidence are the:

  • evidence-base—which includes the number of studies sorted by their methodological quality and relevance to patients

  • consistency of the study results—whether the better quality studies had results of a similar magnitude and in the same direction i.e. homogenous or heterogeneous findings

  • potential clinical impact—appraisal of the precision, size and clinical importance or relevance of the primary outcomes used to determine the safety and effectiveness of the test

  • generalisability of the evidence to the target population

  • applicability of the evidence—integration of this evidence for conclusions about the net clinical benefit of the intervention in the context of Australian clinical practice.

A matrix for assessing the body of evidence for each research question, according to the components above, was used for this assessment (Table 15).

Table 15 Body of evidence matrix



Component

A

Excellent

B

Good

C

Satisfactory

D

Poor

Evidence-base a

One or more level I studies with a low risk of bias or several level II studies with a low risk of bias

One or two level II studies with a low risk of bias, or an SR or several level III studies with a low risk of bias

One or two level III studies with a low risk of bias, or level I or II studies with a moderate risk of bias

Level IV studies, or level I to III studies/SRs with a high risk of bias

Consistency b

All studies consistent

Most studies consistent and inconsistency may be explained

Some inconsistency reflecting genuine uncertainty around clinical question

Evidence is inconsistent

Clinical impact

Very large

Substantial

Moderate

Slight or restricted

Generalisability

Population(s) studied in body of evidence are the same as target population

Population(s) studied in the body of evidence are similar to target population

Population(s) studied in body of evidence differ to target population for guideline but it is clinically sensible to apply this evidence to target population c

Population(s) studied in body of evidence differ from target population and it is hard to judge whether it is sensible to generalise to target population

Applicability

Directly applicable to Australian healthcare context

Applicable to Australian healthcare context with few caveats

Probably applicable to Australian healthcare context with some caveats

Not applicable to Australian healthcare context

SR = systematic review; several = more than two studies

a Level of evidence determined from the NHMRC evidence hierarchy (see Table 13).

b If there is only one study, rank this component as ‘not applicable’.

c For example, results in adults that are clinically sensible to apply to children OR psychosocial outcomes for one cancer that may be applicable to patients with another cancer.

Source: Adapted from NHMRC (2009)




Download 12.08 Mb.

Share with your friends:
1   ...   5   6   7   8   9   10   11   12   ...   56




The database is protected by copyright ©ininet.org 2024
send message

    Main page