Young Data Science Researcher Seminar Zurich

×

Modal title

Modal content

Please subscribe here if you would you like to be notified about these events via e-mail. Moreover you can also subscribe to the iCal/ics Calender.

Spring Semester 2023

Date / Time Speaker Title Location
9 March 2023
16:00-17:00
Yixin Wang
University of Michigan
Details

Young Data Science Researcher Seminar Zurich

Title Representation Learning: A Causal Perspective
Speaker, Affiliation Yixin Wang, University of Michigan
Date, Time 9 March 2023, 16:00-17:00
Location Zoom Call
Abstract Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data like images and texts. Ideally, such a representation should efficiently capture non-spurious features of the data. It shall also be disentangled so that we can interpret what feature each of its dimensions captures. However, these desiderata are often intuitively defined and challenging to quantify or enforce. In this talk, we take on a causal perspective of representation learning. We show how desiderata of representation learning can be formalized using counterfactual notions, enabling metrics and algorithms that target efficient, non-spurious, and disentangled representations of data. We discuss the theoretical underpinnings of the algorithm and illustrate its empirical performance in both supervised and unsupervised representation learning. This is joint work with Michael Jordan: https://arxiv.org/abs/2109.03795
Representation Learning: A Causal Perspectiveread_more
Zoom Call
20 March 2023
15:00-17:00
Sara Magliacane
University of Amsterdam and MIT-IBM Watson AI Lab
Zijian Guo
Rutgers University
Details

Young Data Science Researcher Seminar Zurich

Title MS New Researchers Group, Young Data Science Researcher Seminar Zürich, and the YoungStatS Project: Distribution generalization and causal inference
Speaker, Affiliation Sara Magliacane, University of Amsterdam and MIT-IBM Watson AI Lab
Zijian Guo, Rutgers University
Date, Time 20 March 2023, 15:00-17:00
Location I
Abstract Zijian Guo, Rutgers University Title: Statistical Inference for Maximin Effects: Identifying Stable Associations Across Multiple Studies Abstract: Integrative analysis of data from multiple sources is critical to making generalizable discoveries. Associations that are consistently observed across multiple source populations are more likely to be generalized to target populations with possible distributional shifts. In this paper, we model the heterogeneous multi-source data with multiple high-dimensional regressions and make inferences for the maximin effect (Meinshausen, B{\"u}hlmann, AoS, 43(4), 1801--1830). The maximin effect provides a measure of stable associations across multi-source data. A significant maximin effect indicates that a variable has commonly shared effects across multiple source populations, and these shared effects may be generalized to a broader set of target populations. There are challenges associated with inferring maximin effects because its point estimator can have a non-standard limiting distribution. We devise a novel sampling method to construct valid confidence intervals for maximin effects. The proposed confidence interval attains a parametric length. This sampling procedure and the related theoretical analysis are of independent interest for solving other non-standard inference problems. Using genetic data on yeast growth in multiple environments, we demonstrate that the genetic variants with significant maximin effects have generalizable effects under new environments. Sara Magliacane (University of Amsterdam and MIT-IBM Watson AI Lab) Title: Causality-inspired ML: what can causality do for ML? Abstract: Applying machine learning to real-world cases often requires methods that are robust w.r.t. heterogeneity, missing not at random or corrupt data, selection bias, non i.i.d. data etc. and that can generalize across different domains. Moreover, many tasks are inherently trying to answer causal questions and gather actionable insights, a task for which correlations are usually not enough. Several of these issues are addressed in the rich causal inference literature. On the other hand, often classical causal inference methods require either a complete knowledge of a causal graph or enough experimental data (interventions) to estimate it accurately. Recently, a new line of research has focused on causality-inspired machine learning, i.e. on the application ideas from causal inference to machine learning methods without necessarily knowing or even trying to estimate the complete causal graph. In this talk, I will present an example of this line of research in the unsupervised domain adaptation case, in which we have labelled data in a set of source domains and unlabelled data in a target domain ("zero-shot"), for which we want to predict the labels. In particular, given certain assumptions, our approach is able to select a set of provably "stable" features (a separating set), for which the generalization error can be bound, even in case of arbitrarily large distribution shifts. As opposed to other works, it also exploits the information in the unlabelled target data, allowing for some unseen shifts w.r.t. to the source domains. While using ideas from causal inference, our method never aims at reconstructing the causal graph or even the Markov equivalence class, showing that causal inference ideas can help machine learning even in this more relaxed setting.
MS New Researchers Group, Young Data Science Researcher Seminar Zürich, and the YoungStatS Project: Distribution generalization and causal inferenceread_more
I
JavaScript has been disabled in your browser