Statistics research seminar

Modal title

Modal content

Spring Semester 2024

Date / Time

Speaker

Title

Location

7 March 2024
16:15-17:15

Elliot Young
The University of Cambridge

Event Details

Research Seminar in Statistics

Title	Sandwich Boosting for accurate estimation in partially linear models for grouped data
Speaker, Affiliation	Elliot Young, The University of Cambridge
Date, Time	7 March 2024, 16:15-17:15
Location	HG G 19.1
Abstract	We study partially linear models in settings where observations are arranged in independent groups but may exhibit within-group dependence. Existing approaches estimate linear model parameters through weighted least squares, with optimal weights (given by the inverse covariance of the response, conditional on the covariates) typically estimated by maximising a (restricted) likelihood from random effects modelling or by using generalised estimating equations. We introduce a new ‘sandwich loss’ whose population minimiser coincides with the weights of these approaches when the parametric forms for the conditional covariance are well-specified, but can yield arbitrarily large improvements in linear parameter estimation accuracy when they are not. Under relatively mild conditions, our weighted least squares (within a double machine learning framework) estimated coefficients are asymptotically Gaussian and enjoy minimal variance among estimators with weights restricted to a given class of functions, when user-chosen regression methods are used to estimate nuisance functions. We further expand the class of functional forms for the weights that may be fitted beyond parametric models by leveraging the flexibility of modern machine learning methods within a new gradient boosting scheme for minimising the sandwich loss. We demonstrate the effectiveness of both the sandwich loss and what we call ‘sandwich boosting’ in a variety of settings with simulated and real-world data.

Sandwich Boosting for accurate estimation in partially linear models for grouped dataread_more

HG G 19.1

21 March 2024
16:15-17:15

Bryon Aragam
The University of Chicago Booth School of Business

Event Details

Research Seminar in Statistics

Title	Research Seminar on Statistics - FDS Seminar joint talk: Statistical aspects of nonparametric latent variable models and causal representation learning
Speaker, Affiliation	Bryon Aragam, The University of Chicago Booth School of Business
Date, Time	21 March 2024, 16:15-17:15
Location	HG D 1.2
Abstract	One of the key paradigm shifts in statistical machine learning over the past decade has been the transition from handcrafted features to automated, data-driven representation learning. A crucial step in this pipeline is to identify latent representations from observational data along with their causal structure. In many applications, the causal variables are not directly observed, and must be learned from data, often using flexible, nonparametric models such as deep neural networks. These settings present new statistical and computational challenges that will be focus of this talk. We will re-visit the statistical foundations of nonparametric latent variable models as a lens into the problem of causal representation learning. We discuss our recent work on developing methods for identifying and learning causal representations from data with rigourous guarantees, and discuss how even basic statistical properties are surprisingly subtle. Along the way, we will explore the connections between causal graphical models, deep generative models, and nonparametric mixture models, and how these connections lead to a useful new theory for causal representation learning.

Research Seminar on Statistics - FDS Seminar joint talk: Statistical aspects of nonparametric latent variable models and causal representation learningread_more

HG D 1.2

26 April 2024
15:15-16:15

Richard De Veaux
Williams College

Event Details

Research Seminar in Statistics

Title	The Seven Deadly Sins of Data Science
Speaker, Affiliation	Richard De Veaux, Williams College
Date, Time	26 April 2024, 15:15-16:15
Location	HG G 19.1
Abstract	As we are all too aware, organizations accumulate vast amounts of data from a variety of sources nearly continuously. Big data and data science advocates promise the moon and the stars as you harvest the potential of all these data. And now, AI threatens our jobs and perhaps our very existence. There is certainly a lot of hype. There’s no doubt that some savvy organizations are fueling their strategic decision making with insights from big data, but what are the challenges? Much can wrong in the data science process, even for trained professionals. In this talk I'll discuss a wide variety of case studies from a range of industries to illustrate the potential dangers and mistakes that can frustrate problem solving and discovery -- and that can unnecessarily waste resources. My goal is that by seeing some of the mistakes I (and others) have made, you will learn how to better take advantage of data insights without committing the "Seven Deadly Sins."

The Seven Deadly Sins of Data Scienceread_more

HG G 19.1

16 May 2024
15:15-16:15

Jiwei Zhao
University of Wisconsin–Madison

Event Details

Research Seminar in Statistics

Title	A Semiparametric Perspective on Unsupervised Domain Adaptation
Speaker, Affiliation	Jiwei Zhao, University of Wisconsin–Madison
Date, Time	16 May 2024, 15:15-16:15
Location	HG G 19.1
Abstract	In studies ranging from clinical medicine to policy research, complete data are usually available from a population P, but the quantity of interest is often sought for a related but different population Q. In this talk, we consider the unsupervised domain adaptation setting under the label shift assumption. In the first part, we estimate a parameter of interest in population Q by leveraging information from P, where three ingredients are essential: (a) the common conditional distribution of X given Y, (b) the regression model of Y given X in P, and (c) the density ratio of the outcome Y between the two populations. We propose an estimation procedure that only needs some standard nonparametric technique to approximate the conditional expectations with respect to (a), while by no means needs an estimate or model for (b) or (c); i.e., doubly flexible to the model misspecifications of both (b) and (c). In the second part, we pay special attention to the case that the outcome Y is categorical. In this scenario, traditional label shift adaptation methods either suffer from large estimation errors or require cumbersome post-prediction calibrations. To address these issues, we propose a moment-matching framework for adapting the label shift, and an efficient label shift adaptation method where the adaptation weights can be estimated by solving linear systems. We rigorously study the theoretical properties of our proposed methods. Empirically, we illustrate our proposed methods in the MIMIC-III database as well as in some benchmark datasets including MNIST, CIFAR-10, and CIFAR-100.

A Semiparametric Perspective on Unsupervised Domain Adaptationread_more

HG G 19.1

Notes: the highlighted event marks the next occurring event and if you want you can subscribe to the iCal/ics Calender.

Archive: SS 24 AS 23 SS 23 AS 22 SS 22 AS 21 SS 20 AS 19 SS 19 AS 18 SS 18 AS 17 SS 17 AS 16 SS 16 AS 15 SS 15 AS 14 SS 14 AS 13 SS 13 AS 12 SS 12 AS 11 SS 11 AS 10 SS 10 AS 09