Is the Bland-Altman plot method useful without inferences for accuracy, precision, and agreement?

Paulo Sergio Panse Silveira; Joaquim Edson Vieira; José de Oliveira Siqueira

doi:10.11606/s1518-8787.2024058005430

Authors

Paulo Sergio Panse Silveira Universidade de São Paulo. Faculdade de Medicina. Departamento de Patologia. São Paulo, SP, Brasil https://orcid.org/0000-0003-4110-1038
Joaquim Edson Vieira Universidade de São Paulo. Faculdade de Medicina. Departamento de Cirurgia. São Paulo, SP, Brasil https://orcid.org/0000-0002-6225-8985
José de Oliveira Siqueira Universidade de São Paulo. Faculdade de Medicina. Departamento de Patologia. São Paulo, SP, Brasil https://orcid.org/0000-0002-3357-8939

DOI:

https://doi.org/10.11606/s1518-8787.2024058005430

Keywords:

Confidence Intervals, Statistical Inference, Data Interpretation, Statistical, Regression Analysis

Abstract

OBJECTIVE: This study aims to propose a comprehensive alternative to the Bland-Altman plot method, addressing its limitations and providing a statistical framework for evaluating the equivalences of measurement techniques. This involves introducing an innovative three-step approach for assessing accuracy, precision, and agreement between techniques, which enhances objectivity in equivalence assessment. Additionally, the development of an R package that is easy to use enables researchers to efficiently analyze and interpret technique equivalences. METHODS: Inferential statistics support for equivalence between measurement techniques was proposed in three nested tests. These were based on structural regressions with the goal to assess the equivalence of structural means (accuracy), the equivalence of structural variances (precision), and concordance with the structural bisector line (agreement in measurements obtained from the same subject), using analytical methods and robust approach by bootstrapping. To promote better understanding, graphical outputs following Bland and Altman’s principles were also implemented. RESULTS: The performance of this method was shown and confronted by five data sets from previously published articles that used Bland and Altman’s method. One case demonstrated strict equivalence, three cases showed partial equivalence, and one showed poor equivalence. The developed R package containing open codes and data are available for free and with installation instructions at Harvard Dataverse at https://doi.org/10.7910/DVN/AGJPZH. CONCLUSION: Although e asy t o c ommunicate, t he w idely c ited a nd a pplied B land a nd Altman plot method is often misinterpreted, since it lacks suitable inferential statistical support. Common alternatives, such as Pearson’s correlation or ordinal least-square linear regression, also fail to locate the weakness of each measurement technique. It may be possible to test whether two techniques have full equivalence by preserving graphical communication, in accordance with Bland and Altman’s principles, but also adding robust and suitable inferential statistics. Decomposing equivalence into three features (accuracy, precision, and agreement) helps to locate the sources of the problem when fixing a new technique.

References

Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb;327(8476):307-10. https://doi.org/10.1016/S0140-6736(86)90837-8

Pesola GR, O’Donnell P, Pesola GR Jr, Pesola HR, Chinchilli VM, Magari RT, et al. Comparison of the ATS versus EU Mini Wright peak flow meter in normal volunteers. J Asthma. 2010 Dec;47(10):1067-71. https://doi.org/10.3109/02770903.2010.514639

Misyura M, Sukhai MA, Kulasignam V, Zhang T, Kamel-Reid S, Stockley TL. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing. J Clin Pathol. 2018 Feb;71(2):117-24. https://doi.org/10.1136/jclinpath-2017-204520

Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998 Oct;26(4):217-38. https://doi.org/10.2165/00007256-199826040-00002

Shimada K, Kario K, Kushiro T, Teramukai S, Ishikawa Y, Kobayashi F, et al. Differences between clinic blood pressure and morning home blood pressure, as shown by Bland-Altman plots, in a large observational study (HONEST study). Hypertens Res. 2015 Dec;38(12):876-82. https://doi.org/10.1038/hr.2015.88

Lo WL, Zhao JL, Chen L, Lei D, Huang DF, Tong KF. Between-days intra-rater reliability with a hand held myotonometer to quantify muscle tone in the acute stroke population. Sci Rep. 2017 Oct;7(1):14173. https://doi.org/10.1038/s41598-017-14107-3

Aasvee K, Rasmussen M, Kelly C, Kurvinen E, Giacchi MV, Ahluwalia N. Validity of self-reported height and weight for estimating prevalence of overweight among Estonian adolescents: the Health Behaviour in School-aged Children study. BMC Res Notes. 2015 Oct;8(1):606. https://doi.org/10.1186/s13104-015-1587-9

Jones M, Dobson A, O’Brian S. A graphical method for assessing agreement with the mean between multiple observers using continuous measures. Int J Epidemiol. 2011 Oct;40(5):1308-13. https://doi.org/10.1093/ije/dyr109

Taffé P, Halfon P, Halfon M. A new statistical methodology overcame the defects of the Bland-Altman method. J Clin Epidemiol. 2020 Aug;124:1-7. https://doi.org/10.1016/j.jclinepi.2020.03.018

Parker RA, Scott C, Inácio V, Stevens NT. Using multiple agreement methods for continuous repeated measures data: a tutorial for practitioners. BMC Med Res Methodol. 2020 Jun;20(1):154. https://doi.org/10.1186/s12874-020-01022-x

Creasy MA. Confidence limits for the gradient in the linear functional relationship. J R Stat Soc B. 1956;18(1):65-9. https://doi.org/10.1111/j.2517-6161.1956.tb00211.x.

Zou GY. Confidence interval estimation for the Bland-Altman limits of agreement with multiple observations per individual. Stat Methods Med Res. 2013 Dec;22(6):630-42. https://doi.org/10.1177/0962280211402548

Carkeet A. Exact parametric confidence intervals for Bland-Altman limits of agreement. Optom Vis Sci. 2015 Mar;92(3):e71-80. https://doi.org/10.1097/OPX.0000000000000513

Taffé P. Assessing bias, precision, and agreement in method comparison studies. Stat Methods Med Res. 2020 Mar;29(3):778-96. https://doi.org/10.1177/0962280219844535

Christensen HS, Borgbjerg J, Børty L, Bøgsted M. On Jones et al.’s method for extending Bland-Altman plots to limits of agreement with the mean for multiple observers. BMC Med Res Methodol. 2020 Dec;20(1):304. https://doi.org/10.1186/s12874-020-01182-w

Watson PF, Petrie A. Method agreement analysis: a review of correct methodology. Theriogenology. 2010 Jun;73(9):1167-79. https://doi.org/10.1016/j.theriogenology.2010.01.003

Giavarina D. Understanding Bland Altman analysis. Biochem Med (Zagreb). 2015 Jun;25(2):141-51. https://doi.org/10.11613/BM.2015.015

Frost J. Chebyshev’s theorem in statistics. 2021 [cited 2023 Aug, 30] Available from: https://statisticsbyjim.com/basics/chebyshevs-theorem-in-statistics

Savage RI. Probability inequalities of the Tchebycheff type. J Res Natl Bur Stand, B Math Math Rhys. 1961;65B(3):211. https://doi.org/10.6028/jres.065B.020

Silveira PS, Siqueira JO. R package: eirasagree; 2021 [cited 2023 Aug, 2]. Available from: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/AGJPZH

Isaac PD. Linear regression, structural relations, and measurement error. Psychol Bull. 1970;74(3):213-8. https://doi.org/10.1037/h0029777

Thoresen M, Laake P. On the simple linear regression model with correlated measurement errors. J Stat Plan Inference. 2007;137(1):68-78. https://doi.org/10.1016/j.jspi.2005.09.001

Hedberg EC, Ayers S. The power of a paired t-test with a covariate. Soc Sci Res. 2015 Mar;50:277-91. https://doi.org/10.1016/j.ssresearch.2014.12.004

Shukla GK. Some Exact tests of hypotheses about Grubbs’s estimators. Biometrics. 1973 6;29(2):373. https://doi.org/10.2307/2529399

Glaister P. 85.13 Least squares revisited. Math Gaz. 2001;85(502):104-7. https://doi.org/10.2307/3620485

Oldham PD. A note on the analysis of repeated measurements of the same subjects. J Chronic Dis. 1962 Oct;15(10):969-77. https://doi.org/10.1016/0021-9681(62)90116-9

Linnet K. Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies. Clin Chem. 1998 May;44(5):1024-31. https://doi.org/10.1093/clinchem/44.5.1024

Kummell CH. Reduction of observation equations which contain more than one observed quantity. Analyst (Lond). 1879;6(4).

Strike PW, editor. Statistical methods in laboratory medicine. Chapter 11: A primer on control and interpretation. Oxford: Butterworth-Heinemann; 1991.

Efron B. Bootstrap methods: another look at the Jackknife. Ann Stat. 2007;7(1). https://doi.org/10.1214/aos/1176344552

Shoukri MM. Measures of Interobserver: agreement and reliability. 2nd ed. Boca Raton: CRC; 2010.

NCSS Statistical Software. Deming regression. 2023 [cited 2023 Sep, 5]. Available from:

https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/PASS/Deming_Regression.pdf

Antonakis J, Bendahan S, Jacquart P, Lalive R. On making causal claims: a review and recommendations; 2010. https://doi.org/10.1016/j.leaqua.2010.10.010

McCartin BJ. A geometric characterization of linear regression. Statistics. 2003;37(2):101-17. https://doi.org/10.1080/0223188031000112881

Roberts S. Book review: Statistical thinking in epidemiology. By Y.-K. Tu and M. Gilthorpe. Boca Raton: CRC; 2011. Aust N Z J Stat. 2012;54(4):508-9. https://doi.org/10.1111/j.1467-842X.2012.00675.x

Anscombe FJ. Graphs in statistical analysis. Am Stat. 1973;27(1):17-21.

Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999 Jun;8(2):135-60. https://doi.org/10.1177/096228029900800204

Videira RL, Vieira JE. What rules of thumb do clinicians use to decide whether to antagonize nondepolarizing neuromuscular blocking drugs? Anesth Analg. 2011 Nov;113(5):1192-6. https://doi.org/10.1213/ANE.0b013e31822c986e

Altman DG, Bland JM. Measurement in medicine: The analysis of method comparison studies. JSTOR: Journal of the Royal Statistical Society. Series D (The Statistician). 1983 Sep;32(3):307-317.

Altman DG, Bland JM, Gallacher J, Sweetnam PM, Yarnell JWG, Rogers S. Comparison of methods of measuring blood pressure. JSTOR. 1986 Sep;40(3):274-7. https://doi.org/10.1136/jech.40.3.274

Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obster Gynecol. 2003 Jul;22(1):85-93. https://doi.org/10.1002/uog.122

Datta D. blandr: a Bland-Altman method comparison package for R. GitHub; 2017 [cited 2023 Aug, 30]. Available from: https://github.com/deepankardatta/blandr

Peng M, Taffé P, Williamson T. MethodCompare: bias and precision plots. 2022 [cited 2023 Aug, 30]. Available https://cran.r-project.org/web/packages/MethodCompare

Carstensen B, Gurrin L, Ekstrøm CT, Figurski M. MethComp: analysis of agreement in method comparison studies; 2020 [cited 2023 Aug, 30]. Available from: https://cran.r-project.org/web/packages/MethComp

Potapov S, Model F, Schuetzenmeister A, Manuilova E, Dufey F, Raymaekers J. mcr: Method comparison regression; 2023 [cited 2023 Aug, 30]. Available from: https://cran.r-project.org/web/packages/mcr

Gerard PD, Smith DR, Weerakkody G. Limits of retrospective power analysis. J Wildl Manage. 1998 Apr;62(2):801-7. https://doi.org/10.2307/3802357

Budd JR, Durham AP, Gwise TE, Iriarte B, Kallner A, Linnet K, et al., editors. EP09A3-Measurement procedure comparison and bias estimation using patient samples: approved guideline. 3rd ed. Pittsburgh: The Clinical and Laboratory Standards Institute; 2013.

Linnet K. Necessary sample size for method comparison studies based on regression analysis. Clinical Chemistry. 1999 June;45(6):882-94. https://doi.org/10.1093/clinchem/45.6.882

Is the Bland-Altman plot method useful without inferences for accuracy, precision, and agreement?

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Language