Is the Bland-Altman plot method useful without inferences for accuracy, precision, and agreement?
DOI:
https://doi.org/10.11606/s1518-8787.2024058005430Keywords:
Confidence Intervals, Statistical Inference, Data Interpretation, Statistical, Regression AnalysisAbstract
OBJECTIVE: This study aims to propose a comprehensive alternative to the Bland-Altman plot method, addressing its limitations and providing a statistical framework for evaluating the equivalences of measurement techniques. This involves introducing an innovative three-step approach for assessing accuracy, precision, and agreement between techniques, which enhances objectivity in equivalence assessment. Additionally, the development of an R package that is easy to use enables researchers to efficiently analyze and interpret technique equivalences. METHODS: Inferential statistics support for equivalence between measurement techniques was proposed in three nested tests. These were based on structural regressions with the goal to assess the equivalence of structural means (accuracy), the equivalence of structural variances (precision), and concordance with the structural bisector line (agreement in measurements obtained from the same subject), using analytical methods and robust approach by bootstrapping. To promote better understanding, graphical outputs following Bland and Altman’s principles were also implemented. RESULTS: The performance of this method was shown and confronted by five data sets from previously published articles that used Bland and Altman’s method. One case demonstrated strict equivalence, three cases showed partial equivalence, and one showed poor equivalence. The developed R package containing open codes and data are available for free and with installation instructions at Harvard Dataverse at https://doi.org/10.7910/DVN/AGJPZH. CONCLUSION: Although e asy t o c ommunicate, t he w idely c ited a nd a pplied B land a nd Altman plot method is often misinterpreted, since it lacks suitable inferential statistical support. Common alternatives, such as Pearson’s correlation or ordinal least-square linear regression, also fail to locate the weakness of each measurement technique. It may be possible to test whether two techniques have full equivalence by preserving graphical communication, in accordance with Bland and Altman’s principles, but also adding robust and suitable inferential statistics. Decomposing equivalence into three features (accuracy, precision, and agreement) helps to locate the sources of the problem when fixing a new technique.
References
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb;327(8476):307-10. https://doi.org/10.1016/S0140-6736(86)90837-8
Pesola GR, O’Donnell P, Pesola GR Jr, Pesola HR, Chinchilli VM, Magari RT, et al. Comparison of the ATS versus EU Mini Wright peak flow meter in normal volunteers. J Asthma. 2010 Dec;47(10):1067-71. https://doi.org/10.3109/02770903.2010.514639
Misyura M, Sukhai MA, Kulasignam V, Zhang T, Kamel-Reid S, Stockley TL. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing. J Clin Pathol. 2018 Feb;71(2):117-24. https://doi.org/10.1136/jclinpath-2017-204520
Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998 Oct;26(4):217-38. https://doi.org/10.2165/00007256-199826040-00002
Shimada K, Kario K, Kushiro T, Teramukai S, Ishikawa Y, Kobayashi F, et al. Differences between clinic blood pressure and morning home blood pressure, as shown by Bland-Altman plots, in a large observational study (HONEST study). Hypertens Res. 2015 Dec;38(12):876-82. https://doi.org/10.1038/hr.2015.88
Lo WL, Zhao JL, Chen L, Lei D, Huang DF, Tong KF. Between-days intra-rater reliability with a hand held myotonometer to quantify muscle tone in the acute stroke population. Sci Rep. 2017 Oct;7(1):14173. https://doi.org/10.1038/s41598-017-14107-3
Aasvee K, Rasmussen M, Kelly C, Kurvinen E, Giacchi MV, Ahluwalia N. Validity of self-reported height and weight for estimating prevalence of overweight among Estonian adolescents: the Health Behaviour in School-aged Children study. BMC Res Notes. 2015 Oct;8(1):606. https://doi.org/10.1186/s13104-015-1587-9
Jones M, Dobson A, O’Brian S. A graphical method for assessing agreement with the mean between multiple observers using continuous measures. Int J Epidemiol. 2011 Oct;40(5):1308-13. https://doi.org/10.1093/ije/dyr109
Taffé P, Halfon P, Halfon M. A new statistical methodology overcame the defects of the Bland-Altman method. J Clin Epidemiol. 2020 Aug;124:1-7. https://doi.org/10.1016/j.jclinepi.2020.03.018
Parker RA, Scott C, Inácio V, Stevens NT. Using multiple agreement methods for continuous repeated measures data: a tutorial for practitioners. BMC Med Res Methodol. 2020 Jun;20(1):154. https://doi.org/10.1186/s12874-020-01022-x
Creasy MA. Confidence limits for the gradient in the linear functional relationship. J R Stat Soc B. 1956;18(1):65-9. https://doi.org/10.1111/j.2517-6161.1956.tb00211.x.
Zou GY. Confidence interval estimation for the Bland-Altman limits of agreement with multiple observations per individual. Stat Methods Med Res. 2013 Dec;22(6):630-42. https://doi.org/10.1177/0962280211402548
Carkeet A. Exact parametric confidence intervals for Bland-Altman limits of agreement. Optom Vis Sci. 2015 Mar;92(3):e71-80. https://doi.org/10.1097/OPX.0000000000000513
Taffé P. Assessing bias, precision, and agreement in method comparison studies. Stat Methods Med Res. 2020 Mar;29(3):778-96. https://doi.org/10.1177/0962280219844535
Christensen HS, Borgbjerg J, Børty L, Bøgsted M. On Jones et al.’s method for extending Bland-Altman plots to limits of agreement with the mean for multiple observers. BMC Med Res Methodol. 2020 Dec;20(1):304. https://doi.org/10.1186/s12874-020-01182-w
Watson PF, Petrie A. Method agreement analysis: a review of correct methodology. Theriogenology. 2010 Jun;73(9):1167-79. https://doi.org/10.1016/j.theriogenology.2010.01.003
Giavarina D. Understanding Bland Altman analysis. Biochem Med (Zagreb). 2015 Jun;25(2):141-51. https://doi.org/10.11613/BM.2015.015
Frost J. Chebyshev’s theorem in statistics. 2021 [cited 2023 Aug, 30] Available from: https://statisticsbyjim.com/basics/chebyshevs-theorem-in-statistics
Savage RI. Probability inequalities of the Tchebycheff type. J Res Natl Bur Stand, B Math Math Rhys. 1961;65B(3):211. https://doi.org/10.6028/jres.065B.020
Silveira PS, Siqueira JO. R package: eirasagree; 2021 [cited 2023 Aug, 2]. Available from: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/AGJPZH
Isaac PD. Linear regression, structural relations, and measurement error. Psychol Bull. 1970;74(3):213-8. https://doi.org/10.1037/h0029777
Thoresen M, Laake P. On the simple linear regression model with correlated measurement errors. J Stat Plan Inference. 2007;137(1):68-78. https://doi.org/10.1016/j.jspi.2005.09.001
Hedberg EC, Ayers S. The power of a paired t-test with a covariate. Soc Sci Res. 2015 Mar;50:277-91. https://doi.org/10.1016/j.ssresearch.2014.12.004
Shukla GK. Some Exact tests of hypotheses about Grubbs’s estimators. Biometrics. 1973 6;29(2):373. https://doi.org/10.2307/2529399
Glaister P. 85.13 Least squares revisited. Math Gaz. 2001;85(502):104-7. https://doi.org/10.2307/3620485
Oldham PD. A note on the analysis of repeated measurements of the same subjects. J Chronic Dis. 1962 Oct;15(10):969-77. https://doi.org/10.1016/0021-9681(62)90116-9
Linnet K. Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin Chem. 1998 May;44(5):1024-31. https://doi.org/10.1093/clinchem/44.5.1024
Kummell CH. Reduction of observation equations which contain more than one observed quantity. Analyst (Lond). 1879;6(4).
Strike PW, editor. Statistical methods in laboratory medicine. Chapter 11: A primer on control and interpretation. Oxford: Butterworth-Heinemann; 1991.
Efron B. Bootstrap methods: another look at the Jackknife. Ann Stat. 2007;7(1). https://doi.org/10.1214/aos/1176344552
Shoukri MM. Measures of Interobserver: agreement and reliability. 2nd ed. Boca Raton: CRC; 2010.
NCSS Statistical Software. Deming regression. 2023 [cited 2023 Sep, 5]. Available from:
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Deming_Regression.pdf
Antonakis J, Bendahan S, Jacquart P, Lalive R. On making causal claims: a review and recommendations; 2010. https://doi.org/10.1016/j.leaqua.2010.10.010
McCartin BJ. A geometric characterization of linear regression. Statistics. 2003;37(2):101-17. https://doi.org/10.1080/0223188031000112881
Roberts S. Book review: Statistical thinking in epidemiology. By Y.-K. Tu and M. Gilthorpe. Boca Raton: CRC; 2011. Aust N Z J Stat. 2012;54(4):508-9. https://doi.org/10.1111/j.1467-842X.2012.00675.x
Anscombe FJ. Graphs in statistical analysis. Am Stat. 1973;27(1):17-21.
Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999 Jun;8(2):135-60. https://doi.org/10.1177/096228029900800204
Videira RL, Vieira JE. What rules of thumb do clinicians use to decide whether to antagonize nondepolarizing neuromuscular blocking drugs? Anesth Analg. 2011 Nov;113(5):1192-6. https://doi.org/10.1213/ANE.0b013e31822c986e
Altman DG, Bland JM. Measurement in medicine: The analysis of method comparison studies. JSTOR: Journal of the Royal Statistical Society. Series D (The Statistician). 1983 Sep;32(3):307-317.
Altman DG, Bland JM, Gallacher J, Sweetnam PM, Yarnell JWG, Rogers S. Comparison of methods of measuring blood pressure. JSTOR. 1986 Sep;40(3):274-7. https://doi.org/10.1136/jech.40.3.274
Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obster Gynecol. 2003 Jul;22(1):85-93. https://doi.org/10.1002/uog.122
Datta D. blandr: a Bland-Altman method comparison package for R. GitHub; 2017 [cited 2023 Aug, 30]. Available from: https://github.com/deepankardatta/blandr
Peng M, Taffé P, Williamson T. MethodCompare: bias and precision plots. 2022 [cited 2023 Aug, 30]. Available https://cran.r-project.org/web/packages/MethodCompare
Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: analysis of agreement in method comparison studies; 2020 [cited 2023 Aug, 30]. Available from: https://cran.r-project.org/web/packages/MethComp
Potapov S, Model F, Schuetzenmeister A, Manuilova E, Dufey F, Raymaekers J. mcr: Method comparison regression; 2023 [cited 2023 Aug, 30]. Available from: https://cran.r-project.org/web/packages/mcr
Gerard PD, Smith DR, Weerakkody G. Limits of retrospective power analysis. J Wildl Manage. 1998 Apr;62(2):801-7. https://doi.org/10.2307/3802357
Budd JR, Durham AP, Gwise TE, Iriarte B, Kallner A, Linnet K, et al., editors. EP09A3-Measurement procedure comparison and bias estimation using patient samples: approved guideline. 3rd ed. Pittsburgh: The Clinical and Laboratory Standards Institute; 2013.
Linnet K. Necessary sample size for method comparison studies based on regression analysis. Clinical Chemistry. 1999 June;45(6):882-94. https://doi.org/10.1093/clinchem/45.6.882
Published
Issue
Section
License
Copyright (c) 2024 Paulo Sergio Panse Silveira, Joaquim Edson Vieira, José de Oliveira Siqueira
This work is licensed under a Creative Commons Attribution 4.0 International License.