A novel approach for phenotypic characterization of sleep disorders
Abstract
Obstructive sleep apnea (OSA) is associated with increased morbidity and mortality, afflicting nearly one billion individuals worldwide. Despite its prevalence, accurate and widely available assessments remain an open challenge. The standard approach involves individuals visiting a sleep lab where a broad set of physiological signals are monitored while they cycle through sleep stages. Clinicians reduce this data to a single numerical measure of the frequency of apneas and hypopneas per hour used to diagnose OSA severity. This apnea-hypopnea index (AHI) is an observer-dependent, time-consuming, and expensive measure that relies on clinical expertise and sophisticated medical equipment. To treat the affected population, there is a pressing need for novel insights into assessing OSA in repeatable ways that can be deployed inside and outside the clinic. Modern Artificial Intelligence techniques offer an opportunity to reexamine the rich physiological measures captured during sleep lab sessions to develop a more accurate and usable OSA assessment. Aggregating a large (n=10,000) dataset from a historical clinical polysomnography (PSG) registry, including data from electronic health records combined with raw high-density time-series electroencephalography (EEG), electrocardiogram (ECG), airflow, oximetry, CO2 monitoring and thoracoabdominal effort introduced significant challenges, including identification of artifacts, historical variability in scoring standards, and the array of diagnoses derived from patient-clinic interactions. Moreover, the dataset introduced unique challenges and characteristics attributable to its real-world context. We describe the processes implemented for data curation and organization, aiming for a wide range of age, even distribution of sex and enriched under-represented minorities, facilitating the conduct of quasi-experimental research within this dataset. Such an approach underscores the dataset’s distinct features and leverages its diversity to derive novel insights into sleep disorders. We used cloud technology to manage and curate these data at scale, enabling the development of advanced time series foundational models to autonomously annotate events like desaturations, hypopneas, and apneas and create informative data embeddings. A comparative analysis with prospective study datasets highlighted significant discrepancies, including variations in demographic distributions, comorbidity profiles, longitudinal data continuity, and AHI metrics. These differences emphasize the value of real-world clinical data in providing a comprehensive representation of patient conditions and outcomes, thereby offering deeper insights into the clinical epidemiology of sleep disorders. Our proposed approach for automating polysomnography (PSG) analysis aims to employ time series foundational models for embeddings generation, which we anticipate will lead to a more efficient and informative characterization of the pathophysiology of sleep-disordered breathing. Moreover, the breadth of PSG studies offers the field of AI an unparalleled opportunity to craft models that decipher human physiological patterns, with potential applications extending beyond sleep medicine to disciplines such as neurology, pulmonary, psychiatry, and cardiology. We propose to address the multi-dimensional nature of sleep dysfunction by combining commonly available wearables and ambient sensors into a multi-modal OSA assessment that can be broadly fielded and would be transformative for healthcare delivery and value-based care. Contemporary time series models are generally limited to forecasting, classification, and anomaly detection. We introduce a novel task for these models: to predict one physiological signal based on others, varying in scale and temporal distance.