Research Topics


Our current emphasis is on


Theoretical and statistical machine learning

We develop theory, methods and algorithms for machine learning, data science and artificial intelligence. Our current emphasis is on causal inference, deep learning, time series, ensemble learning, reinforcement learning and multi-armed bandits, recovering missing data.


MedAdvance (Click [here] for details)

Current medical practice is driven by the experience of clinicians, by the difficulties of integrating enormous amounts of complex and heterogeneous static and dynamic data and by clinical guidelines designed for the "average" patient. MedAdvance aims to transform medical practice by developing novel, specially-crafted machine learning theories, methods and systems aimed at extracting actionable intelligence from the wide variety of information that is becoming available (in electronic health records and elsewhere) and enabling every aspect of medical care to be personalized to the patient at hand. The intended outputs of MedAdvance include clinical decision support systems for personalized risk assessment, diagnosis, prognosis and treatment; disease atlases which contain electronic representations of data-driven medical knowledge; and early warning systems - all intended to assist clinicians to make the best choices for the particular patient at hand. The construction of these outputs will require novel models for learning the non-stationary trajectory of disease progression, new learning architectures to learn from multiple heterogeneous time series data, novel conceptions of causal inference to learn individualized treatment effects in the absence of counterfactuals, novel methods for learning from data that is missing but not at random etc. MedAdvance will revolutionize healthcare for many types of populations and for patients suffering from many diseases (including cancer, cardiovascular disease, cystic fibrosis, etc.) and will make medicine more systematic, consistent and effective. MedAdvance also aims to transform the process of medical discovery from using the data to test hypotheses suggested by clinicians and researchers to using the data to create and test hypotheses suggested by the data itself, thereby leading to new theories of disease, discovery of new risk factors and new modes of treatment.


Examples of Current Projects

ForecastICU: Timely Prognosis and Intervention Management for Critically Ill Patients

Timely risk assessment for critical care patients is crucial for early intensive care unit (ICU) admission and efficient therapeutic interventions; more than 200,000 in patient exhibit cardiac arrests each year in the US, many of which can be prevented by accurate prognosis. In this project, we use the electronic health records (EHR) to learn personalized Hidden Markov Models for the patients' physiological behavior over time, and exploit these models to construct a precise risk scoring system that informs the clinicians on which patients are vulnerable to clinical deterioration, when should they go to the ICU, and what type of intervention they need.


Hippolyta: An Integrated Tool for Data-driven Breast Cancer Risk Assessment and Screening Management

Early detection of breast cancer is essential for efficient subsequent treatments. Current screening guidelines are designed for the "average woman", and hence they suffer from poor accuracy and cost-effectiveness for many subgroups of women. Hippolyta is an integrated tool for data-driven breast cancer risk stratification and screening management that uses a Hidden Markov Model for the patient's longitudinal breast cancer state trajectory and Partially Observable Markov Decision Processes for learning the customized optimal screening policy for "every woman".


CFCare: Integrated Patient Care and Management for Cystic Fibrosis

Cystic Fibrosis (CF) is an inherited genetic disease that leads to different forms of lung dysfunction, leading to respiratory failure. Despite the recent advances in CF therapeutic management, only half of the current CF patients are expected to live more than 40 years old. Due to the vast heterogeneity of CF patients, different treatment decisions may be suitable for different patients at different times. In this project, we use the UK CF Trust registry data in order to learn a comprehensive model for a CF patient trajectory. The model incorporates genetic data, information on microbiological infections, onsets of complications, and indicators of lung function one model based on which clinicians can obtain predictions on a patient's response to different treatments applied at different points of time. Our model is intended to guide individual-level short-term and long-term treatment and surgical decisions, with the hope of learning the optimal treatment plan for every patient.