Publications
A machine learning approach to predict self-protecting behaviors during the early wave of the COVID-19 pandemic
(joint with Liyousew G. Borga, Samuel Greiff, Claus Vögele, Conchita D’Ambrosio)
In Scientific Reports - Nature, 2023
Using a unique harmonised real‐time data set from the COME-HERE longitudinal survey that covers five European countries (France, Germany, Italy, Spain, and Sweden) and applying a non-parametric machine learning model, we identify the main individual and macro-level predictors of self-protecting behaviours against the coronavirus disease 2019 (COVID-19) during the first wave of the pandemic. Exploiting the interpretability of a Random forest algorithm via Shapely values, we find that a higher regional incidence of COVID-19 triggers higher levels of self-protective behaviour, as does a stricter government policy response. The level of individual knowledge about the pandemic, confidence in institutions, and population density also ranks high among the factors that predict self-protecting behaviours. We also identify a steep socioeconomic gradient with lower levels of self-protecting behaviours being associated with lower income and poor housing conditions. Among socio-demographic factors, gender, marital status, age, and region of residence are the main determinants of self-protective measures.
Working papers
Vulnerability to Poverty: An Explainable Machine Learning Approach
(joint with C. D’Ambrosio)
Working paper, 2023
Building on the definition of vulnerability as expected poverty, we train supervised machine-learning algorithms and a baseline OLS using the German Socio-Economic Panel (version 37) data for years 1984-2020 under two scenarios: 1) considering only cross-sectional data; 2) using over time information on the relative position of the household in the income distribution. Random Forest (RF), Gradient boosted trees (GBT), and Neural Networks (NN) predict the vulnerable group on average by 20%, 15.3%, and 12,3% more than the OLS in the first scenario. The hit rate and the overall accuracy of all vulnerability estimates increase in the second scenario, but the sensitivity gains shrink to 15.6%, 11.5%, and 6.6%, respectively. With Shapely values from the RF model, we explain the sources of vulnerability and their evolution. We find that weak ties to the labour market, single-person households, the number of dependants in the family, living in East Germany, and the sociodemographic characteristics of household head are associated with vulnerability to poverty.
Predicting Material and Social Deprivations in EU with ML
Working paper, 2023
Using the European Union Statistics on Income and Living Conditions (EU-SILC) microdata and applying machine learning (ML) algorithms, I explore the questions: 1) How accurately can one classify unseen individuals’ deprivations status given their observable personal, household, and country-specific factors? 2) What is the performance of targeting subsets of features, such as sociodemographic, socioeconomic, health, and location, to identify the deprived? 3) What are the key predictors and their partial effects? Key results of the empirical analysis demonstrate that the relative accuracy gained by using the sophisticated tree-based ML algorithm is positive and significant compared to that of the standard Generalised linear model (7.3% relative gain with the Extreme gradient boosted trees and 5.9% with the Random forests). Socioeconomic factors yield a classification accuracy as close as when the whole set of features is considered. Feature importance and partial effect analysis identified with Shapley’s value reveal insightful relationships consistent with theoretical and empirical evidence.
Theses
Essays on the Prediction and Measurement of Individual Well-being
PhD Dissertation, University of Luxembourg (2023)
Supervisor: Prof. Dr. Conchita D’Ambrosio