Understanding Severe Asthma Through Small and Big Data in Spanish Hospitals: The PAGE Study
Melero Moreno C1,2, Almonacid Sánchez C3, Bañas Conejero D4, Quirce S5,6, Álvarez Gutiérrez FJ7, Cardona V8, Sánchez-Herrero MG4, Soriano JB6,9,10,11, on behalf of the PAGE Study Group*
1Hospital Universitario La Princesa, Madrid, Spain
2Hospital Universitario 12 de Octubre, Madrid, Spain
3Hospital Universitario de Toledo, Toledo, Spain
4Specialty Care Medical Department, GlaxoSmithKline, Madrid, Spain
5Hospital Universitario La Paz, IdiPAZ, Madrid, Spain
6CIBER of Respiratory Diseases (CIBERES), Madrid, Spain
7Hospital Universitario Virgen del Rocío, Sevilla, Spain
8Hospital Universitario Vall d’Hebron, Barcelona, Spain
9Hospital Universitario de La Princesa, Madrid, Spain
10Facultad de Medicina, Universidad Autónoma de Madrid, Madrid, Spain
11Instituto de Salud Carlos III (ISCIII), Madrid, Spain
*See Online Appendix for a full list of collaborators.
J Investig Allergol Clin Immunol 2023; Vol 33(5)
Background: Data on the prevalence of severe asthma (SA) are limited. Electronic health records (EHRs) offer a unique research opportunity to test machine learning (ML) tools in epidemiological studies. Our aim was to estimate the prevalence of SA among asthma patients seen in hospital asthma units, using both ML-based and traditional research methodologies. Our secondary objective was to describe patients with nonsevere asthma (NSA) and SA over a follow-up of 12 months.
Methods: PAGE is a multicenter, controlled, observational study conducted in 36 Spanish hospitals and split into 2 phases: a cross-sectional phase for estimation of the prevalence of SA and a prospective phase (3 visits in 12 months) for the follow-up and characterization of SA and NSA patients. A substudy with ML was performed in 6 hospitals. Our ML tool uses EHRead technology, which extracts clinical concepts from EHRs and standardizes them to SNOMED CT.
Results: The prevalence of SA among asthma patients in Spanish hospitals was 20.1%, compared with 9.7% using the ML tool. The proportion of SA phenotypes and the features of patients followed up were consistent with previous studies. The clinical predictions of patients’ clinical course were unreliable, and ML found only 2 predictive models with discriminatory power to predict outcomes.
Conclusion: This study is the first to estimate the prevalence of SA in hospitalized asthma patients and to predict patient outcomes using both standard and ML-based research techniques. Our findings offer relevant insights for further epidemiological and clinical research in SA.
Key words: Severe asthma, Prevalence, Big data, Machine learning, Natural language processing, Predictive models