Zhang M., Pandolfino J.E., Zhou X. et al. Assessing different diagnostic tests for gastroesophageal reflux disease: a systematic review and network metaanalysis / Ther Adv Gastroenterol 2019, Vol.12:1–17.
Assessing different diagnostic tests for gastroesophageal reflux disease: a systematic review and network metaanalysis
Mengyu Zhang1,iD, John E. Pandolfino2, Xuyu Zhou3, Niandi Tan1, Yuwen Li1, Minhu Chen1,iD, Yinglian Xiao1,*
1 Department of Gastroenterology, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China.
AbstractBackground: The aim of the current systematic review and network meta-analysis (NMA) was to assess the diagnostic characteristics of the gastroesophageal reflux disease questionnaire (GERDQ), proton-pump inhibitor (PPI) test, baseline impedance, mucosal impedance, dilated intercellular spaces (DIS), salivary pepsin, esophageal pH/pH impedance monitoring and endoscopy for gastroesophageal reflux disease (GERD).
Methods: We searched PubMed and the Cochrane Controlled Trial Register database (from inception to 10 April 2018) for studies assessing the diagnostic characteristics of the GERDQ, PPI test, baseline impedance, mucosal impedance, DIS, or salivary pepsin and esophageal pH/ pH impedance monitoring/endoscopy in patients with GERD. Direct pairwise comparison and a NMA using Bayesian methods under random effects were performed. We also assessed the ranking probability.
Results: A total of 40 studies were identified. The NMA found no significant difference among the baseline impedance, mucosal impedance, and esophageal pH/pH impedance monitoring and endoscopy in terms of both sensitivity and specificity. It was also demonstrated that the salivary pepsin detected by the Peptest device had comparable specificity to esophageal pH/pH impedance monitoring and endoscopy. Results of ranking probability indicated that esophageal pH/pH impedance monitoring and endoscopy had highest sensitivity and specificity, followed by mucosal impedance and baseline impedance, whereas GERDQ had the lowest sensitivity and PPI test had the lowest specificity.
Conclusions: In a systematic review and NMA of studies of patients with GERD, we found that baseline impedance and mucosal impedance have relatively high diagnostic performance, similar to esophageal pH/pH impedance monitoring and endoscopy.
Keywords: baseline impedance, dilated intercellular space, GERDQ, mucosal impedance, network meta-analysis, proton-pump inhibitor test, salivary pepsin
* Correspondence to: Yinglian Xiao. Department of Gastroenterology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China, xyingl@ mail.sysu.edu.cn
IntroductionGastroesophageal reflux disease (GERD) is one of the most common healthcare issues, with an estimated worldwide prevalence of up to 33%1,2 resulting in a heavy economic burden of approximately $13.0 billion/year to the healthcare system in the USA alone, due to the different diagnostic testing and overuse of proton-pump inhibitors (PPIs) to a great extent.3 Albeit with the advances of diagnostic tests for GERD, the lack of a ‘gold standard’ has made identifying patients with GERD one of the biggest dilemmas in clinical practice.
Although GERD is generally empirically diagnosed based on typical reflux symptoms (heartburn and regurgitation),4,5 the sensitivity and specificity of the symptom-based diagnosis of GERD is limited due to the complex symptom spectrum for GERD.6 Patients with suspected GERD symptoms are often first tested for a response to PPI therapy, which definitely results in unnecessary overuse of PPIs because of its high placebo effect and low specificity.7 So far, upper endoscopy and esophageal pH/pH impedance testing are usually performed to detect GERD complications, as well as documentation of the presence of reflux for an objective GERD diagnosis.8 In order to develop a better understanding of the pathophysiology and improve appropriate GERD diagnosis, several new diagnostic tests, such as baseline impedance, esophageal mucosal impedance, salivary pepsin, and histopathology have been developed in recent years.9
To the best of our knowledge, there has been little published information regarding the comparison of diagnostic performance among individual tests. Therefore, we performed a systematic review and network meta-analysis (NMA) to assess the diagnostic characteristics of the GERDQ questionnaire, PPI test, baseline impedance, mucosal impedance, dilated intercellular spaces (DIS), salivary pepsin, esophageal pH/pH impedance monitoring and endoscopy for GERD.
An electronic and manual search of PubMed and the Cochrane Controlled Trial Register database for relevant articles from inception to April 2018 was performed by two authors independently. The combination of keywords and free text including: GERD Questionnaires; omeprazole, lansoprazole, pantoprazole, rabeprazole, esomeprazole, and PPI test; baseline impedance; mucosal impedance; dilated intercellular spaces; pepsin; esophageal pH/pH impedance monitoring; and endoscopy were used as search terms of different diagnostic tests for GERD. Additional search terms were: GERD or GORD, expanded to diagnosis, screening, reproducibility of results, sensitivity and specificity, false-negative reactions, false-positive reactions, predictive value, accuracy, and likelihood ratio.10
The searches were limited to English- or Chinese-language studies performed in adults (age > 18 years). Only data accessible in peerreviewed journals were included to minimize potential sources of bias and inaccuracy.11
Inclusion criteria and exclusion criteria
Studies were screened for inclusion and final decisions on exclusion were made by two authors independently. Studies assessing the diagnostic characteristics of the GERDQ, PPI test, baseline impedance, mucosal impedance, DIS, salivary pepsin, esophageal pH/pH impedance monitoring, or endoscopy in adults with presumptive GERD were screened to be included. Studies were excluded if they focused only on children (age < 18 years) or patients who have specific diseases (such as cardiovascular disease) or who have had operations, if they focused exclusively on patients with extraesophageal GERD symptoms (such as asthma or laryngitis).
Data from the included studies were extracted by two authors independently. Information extracted included patient characteristics, study design, setting, gold standard and diagnostic modalities, and definitions of outcomes. Numbers were extracted directly from the tables or derived from percentages if only the total number of patients was available. Discrepancies were resolved by discussion until consensus was achieved for all data.
Quality assessment of studies
The quality of all included studies was assessed by researchers, according to the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool.12 The QUADAS-2 tool included the following four key domains: patient selection, index test, reference standard, and flow of patients through the study, and timing of the index tests and reference standard (flow and timing). Review Manager 5 (RevMan 5.2.3, Cochrane Collaboration, Oxford, UK) statistical computing software was used to carry out quality assessment and investigation of publication bias.
The positive standard of diagnostic tests for GERD
The following standards for GERD were used in the current study, all of which were based on commonly accepted measures (Table 1).
Upper endoscopy or esophageal pH/pH impedance monitoring. Esophageal mucosal breaks on upper endoscopy suggest the presence of GERD. GERD was diagnosed in studies when patients had esophagitis of any grade in one of the commonly used classification systems, such as the Los Angeles or Hetzel–Dent grading systems.
Ambulatory esophageal pH/pH impedance monitoring is generally considered to provide the most objective evidence for pathologic reflux. The results were considered abnormal based on criteria defined in the individual studies. Whenever possible, we chose definitions that would reasonably be interpreted as abnormal in clinical practice, including: (a) acid exposure time (AET) ⩾ 3.2%–5.5% of the monitoring time; (b) DeMeester score ⩾ 14; (c) positive symptom reflux association, such as symptom associated probability (SAP) ⩾ 95% or symptom index (SI) ⩾ 50%.
Baseline impedance. The baseline impedance was assessed on the pH/impedance system, and the cut-off value ranging from 2100 Ω to 2292 Ω was established and used in individual studies.
Salivary pepsin. Salivary pepsin was detected and quantitatively/semiquantitatively measured by non-invasive rapid salivary pepsin lateral flow device (LFD) (Peptest, RDBiomed, Hull, UK). The cut-off value was used based on criteria defined in individual studies.
Dilated intercellular space. The quantitative or semiquantitative measurement of DIS under light microscopy was performed, and the cut-off value was used based on criteria defined in individual studies.
GERDQ. GERDQ was considered positive if the score was >8.
Proton-pump inhibitor test. The definition of ‘a positive PPI test’ was based on criteria defined in individual studies. Whenever possible, we chose definitions that would reasonably be interpreted as representing success in clinical practice, and ‘complete relief of heartburn’ is the most commonly adopted criteria.
Mucosal impedance. The cut-off value of mucosal impedance was also established and used in the individual studies.
Firstly, traditional pairwise meta-analyses were performed for studies to compare different diagnostic modalities using the Stata version 12.0 software (StataCorp, College Station, TX, USA). The pooled estimates of odd ratios (ORs) and 95% confidence intervals (CIs) for sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR−), and diagnostic odds ratio of GERD were calculated if there were at least four studies included that had no threshold effect. Area under receiver operating characteristic (AUROC) was also calculated. When the AUROC is closer to 1, the clinical value is greater. When the AUROC is between 0.5 and 0.7, the clinical value is lower. An AUROC value > 0.7 indicates that the clinical value is good. Heterogeneity among studies was tested using the I2 and Chi-square tests.53 Secondly, the evidence network structure was drawn using the R version 3.5.1 statistical computing software and network package. Each node represents different diagnostic tests, with the node size reflecting the number of patients, and the thickness of lines between nodes indicating the number of included studies. Thirdly, Bayesian network meta-analyses were performed to combine the effective sizes of direct and indirect comparisons. Lack of autocorrelation and convergence were checked and confirmed by four chains and a 20,000-simulation burn-in phase; finally, direct probability statements were derived from an additional 50,000-simulation phase.54 The consistency between direct and indirect evidence was assessed with the node-splitting method, and the consistency or inconsistency model was selected accordingly.55 The ranking probability was then used to calculate the probability of each diagnostic test being the most effective diagnostic method based using a Bayesian approach, and the bar charts of the ranking probability were also produced; the larger the value is, the better the rank of the diagnostic test.56,57 Comparison-adjusted funnel plots were performed to detect the small study effects on data.56,58 R (version 3.5.1) package GeMTC was used for this network meta-analysis.
ResultsStudy search flow
A total of 6223 potentially relevant studies were initially retrieved and identified, of which 705 duplicate studies were excluded. Of the 5518 citations, 5450 citations were ruled out during the first screen by abstract review. A total of 68 studies were then evaluated for eligibility by fulltext review. After full-text review, studies unrelated to diagnostic tests for GERD (n = 23), studies not in English/Chinese language (n = 2), and studies not eligible for enrollment (n = 3) were excluded. Altogether, a total of 40 published studies meeting the predetermined inclusion criteria were identified 13–52 (Figure 1).
Figure 1. The PRISMA study search flow.
Study characteristics and qualities
Of 40 included studies, the evaluation of esophageal impedance for GERD diagnosis was performed in 4 studies, salivary pepsin by Peptest in 2 studies, DIS in 5 studies, GERDQ in 6 studies, and PPI test in 23 studies when compared with esophageal pH/pH impedance monitoring or endoscopy. One study compared the diagnostic accuracy of GERDQ and PPI test. The clinical information of the included studies is shown in Table 1. The diagnostic characteristic of diagnostic tests for GERD varied across studies (Table 2). The evaluation of the risk of bias and applicability concerns using the QUADAS-2 was shown in Figure 2 and Supplementary Figure 1.
Pairwise meta-analysis for diagnostic tests for GERD
A direct pairwise meta-analysis of the diagnostic performance of six different tests for GERD diagnosis was conducted. The results revealed that the baseline impedance, GERDQ and PPI test exhibited lower sensitivity and specificity when compared with esophageal pH/pH impedance monitoring or endoscopy. We also calculated the AUROC for each diagnostic test and found that the esophageal impedance and PPI test were higher than 0.70, indicating that they had relatively high diagnostic value (Supplementary Table 1). The pairwise meta-analysis of DIS could not be performed successfully due to a threshold effect. The pairwise meta-analysis of salivary pepsin and mucosal impedance could not be performed either because there were only two studies included.
Evidence network of diagnostic tests for GERD
The evidence network structure included seven diagnostic tests. The highest number of evaluable patients performed the esophageal pH/pH impedance monitoring or endoscopy, and most studies compared PPI test with esophageal pH/pH impedance monitoring or endoscopy for GERD diagnosis (Figure 3).The effect of the direct comparison of different tests with esophageal pH/pH impedance monitoring or endoscopy had similar effect on the entire network metaanalysis (Supplementary Figure 2).
Main results of network meta-analysis of diagnostic tests for GERDFigure 2. The evaluation of risks of bias of included studies.
The NMA found no significant difference among the baseline impedance, mucosal impedance, and esophageal pH/pH impedance monitoring or endoscopy in terms of both sensitivity and specificity. It was also demonstrated that the salivary pepsin detected by Peptest had comparable specificity with esophageal pH/pH impedance monitoring or endoscopy (Figure 4).
Ranking probability of diagnostic tests for GERD
Ranking probability indicated that esophageal pH/pH impedance monitoring and/or endoscopy had the highest sensitivity, followed by baseline impedance or mucosal impedance, PPI test, salivary pepsin or DIS, and GERDQ [Figure 5(a)]. Moreover, esophageal pH/pH impedance monitoring and/or endoscopy also had the highest specificity, followed by the mucosal impedance, baseline impedance, DIS or salivary pepsin, GERDQ, and PPI test [Figure 5(b)]. The ranking probability of positive-predictive value and negative-predictive value was also provided in Figure 5(c) and 5(d).
Assessment of publication bias
The results of assessment of publication bias demonstrated symmetrical distribution, indicating no small sample effect or publication bias in this NMA (Supplementary Figure 3).
Figure 3. The evidence network structure.
Figure 4. The forest plots based on sensitivity, specificity, positive-predictive value and negative-predictive value of different diagnostic tests for GERD.
Figure 5. The ranking probability based on sensitivity, specificity, positive-predictive value and negativepredictive value of different diagnostic tests for GERD.
A: esophageal pH/pH impedance monitoring and/or endoscopy; B: baseline impedance; C: salivary pepsin; D: DIS; E: GERDQ; F: PPI test; G: mucosal impedance.
DIS, dilated intercellular spaces; GERD, gastroesophageal reflux disease; GERDQ, GERD questionnaire; PPI, proton-pump inhibitor.
DiscussionWe present the first systematic review and NMA comparing the diagnostic performance of GERDQ questionnaire, PPI test, baseline impedance, mucosal impedance, DIS, salivary pepsin and esophageal pH/pH impedance monitoring/endoscopy for GERD. The NMA and ranking probabilities reveal that the mucosal impedance and baseline impedance had comparable sensitivity and specificity with esophageal pH/pH impedance monitoring or endoscopy. GERDQ had the lowest sensitivity, and PPI test had the lowest specificity.
The results of direct pairwise comparison and NMA shows that esophageal reflux monitoring or endoscopy is superior to other diagnostic tests for GERD, which is in accordance with the recommendation in current guidelines.59 In the current meta-analysis, the criteria for abnormal reflux varied across the included studies which may add clinical heterogeneity to our analysis to some extent. However, this heterogeneity couldn’t be avoided, since the understanding toward GERD pathophysiology and the diagnostic criteria for GERD had developed and changed during the past several decades. For instance, the latest Lyon Consensus lists conclusive evidence, including Los Angeles grade C and D erosive esophagitis from upper endoscopy or AET > 6% from pH/ pH impedance monitoring for the definitive diagnosis of GERD.60 However, most previous studies diagnosed GERD based on lower AET thresholds and esophageal mucosal breaks, regardless of grades. Further studies are needed to compare different grades of esophagitis and different criteria for reflux for the diagnosis and management of GERD.
We also found that the esophageal mucosal impedance as well as the baseline impedance had comparable diagnostic performance with esophageal reflux monitoring and endoscopy. Baseline impedance reflects the integrity of the esophageal mucosa,61 with low values observed in patients having GERD, and associated with increased acid reflux, as well as DIS.62,63 It is reported that mucosal impedance can help differentiate GERD from eosinophilic esophagitis (EoE), achalasia, and healthy controls with higher specificity (95%) when compared with reflux testing (64%).50 Given its high diagnostic performance, the esophageal mucosal impedance, as well as the baseline impedance, may become promising diagnostic tools for GERD in the future. However, normative values for them still need to be determined. Also, mucosa impedance is currently not commercialized and widely used despite high diagnostic utility.
Additionally, the salivary pepsin detection was initially proposed as a non-invasive method for the diagnosis of GERD. Our results demonstrated that the measurement of salivary pepsin detected by the Peptest device had comparable specificity to esophageal pH/pH impedance monitoring or endoscopy. Nevertheless, Sifrim and coworkers failed to reproduce good specificity of salivary pepsin to diagnose GERD, and they found that salivary pepsin could not differentiate GERD from functional heartburn.64 Therefore, salivary pepsin detection cannot be recommended for clinical application at present. Besides, our results found that the measurement of DIS only had only modest diagnostic characteristics. It has been reported that esophageal mucosal changes such as DIS may help differentiate GERD from other disorders.65,66 Even so, the measurement of DIS on electron microscopy is not ready for clinical practice yet.
There are limitations to these findings. First, esophageal pH/pH impedance monitoring and endoscopy were examined together in the current NMA, which were found to have highest sensitivity and specificity. However, upper endoscopy alone has, actually, low sensitivity for GERD, so we should interpret results with caution. Second, we only analyzed GERDQ, while there are several other questionnaires for GERD diagnosis. However, all of them have different scoring systems and putting them together may lead to high heterogeneity in the NMA. And the GERDQ is a questionnaire derived from validated questionnaires including the Reflux Disease Questionnaire, Gastrointestinal Symptom Rating Scale, and the GERD Impact Scale. It is the most commonly used questionnaire for GERD. Thus, we had to choose only one questionnaire to be examined. Also, we did not include the postreflux swallow induced peristaltic wave (PSPW) index in the current NMA because studies evaluating the diagnostic value of the PSPW index are limited at present; future NMA studies can include it, since it is also a promising metric for GERD diagnosis. Third, the rankings and probabilities can sometimes be misleading, ignoring the basic principles of certainty evaluation in evidence during NMA. We should analyze the results of NMA with caution and take the results of pairwise comparison into account. Furthermore, there exists high heterogeneity in the study populations, including various baseline characteristics, different procedure of the testing, and different outcome measure, etc., which may make the results incomparable, to an extent. Finally, the results of pairwise comparison and NMA were associated with wide confidence intervals and some included studies were of moderate-to-low quality.
In conclusion, the current systematic review and NMA shows that esophageal mucosal impedance and baseline impedance has a high diagnostic performance similar to esophageal reflux monitoring or endoscopy. The future direction of GERD management should be developing and improving techniques with high diagnostic performance; not only for a precision phenotype definition but also for a tailored treatment strategy.
Mengyu Zhang: data acquisition and analysis, manuscript drafting; John E Pandolfino: critical revision of the manuscript; Xuyu Zhou: statistical analysis; Niandi Tan: data acquisition and analysis; Yuwen Li: data acquisition and analysi;, Minhu Chen and Yinglian Xiao: study design, data analysis, study supervision, and finalizing the manuscript.
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was supported by grants from the National Natural Science Foundation of China (81770544).
Conflict of interest statement
The authors declare that there is no conflict of interest.
Mengyu Zhang iD ht tps://orcid.org/0000-0002-2838-1103
Minhu Chen iD ht tps://orcid.org/0000-0001-9925-135X
Supplemental material for this article is available online.
Назад в раздел
Популярно о болезнях ЖКТ читайте в разделе "Пациентам"
Информация на сайте www.gastroscan.ru предназначена для образовательных и научных целей. Условия использования.