Seminar overview

×

Modal title

Modal content

Autumn Semester 2016

Date & Time Speaker Title Location
Thr 15.09.2016
16:15-17:00
Emmanuel Lesaffre
L-BioStat, Leuven
Abstract
We propose a novel multivariate multilevel model that expresses both the mean and covariance structure as a multivariate mixed effects model. We called this the multilevel covariance regression (MCR) model. Two versions of this model are presented. In the first version the covariance matrix of the multivariate response is allowed to depend on covariates and random effects. In this model the random effects of the covariance part are assumed to be independent of random effects of the mean structure. In the second model this assumption is relaxed by allowing the two types of random effects to be dependent. The motivating data set is obtained from the RN4CAST (Sermeus et al. 2011) FP7 project which involves 33,731 registered nurses in 2,169 nursing units in 486 hospitals in 12 European countries. As response we have taken the three classical burnout dimensions (Maslach and Jackson, 1981) extracted from a 22-item questionnaire, i.e. emotional exhaustion (EE), depersonalization (DP) and personal accomplishment (PA). There are four levels in the total data set: nurses, nursing units, hospitals and (for the whole data set) countries. The first model is applied to the total data set, while the second model is applied to only the Belgian part of the data. The two models address the following nurse research questions simultaneously: 1) how much variation of burnout could be explained by the level-specific fixed and random effects? 2) do the variances and correlations among burnout stay constant across level-specific characteristics and units at each level? The two models are explored with respect to their statistical properties, but are also compared on the Belgian part of the study. We opted for the Bayesian approach to estimate the parameters of the model. To this end we made use of the JAGS Markov chain Monte Carlo program through the R package rjags.
ZüKoSt Zürcher Kolloquium über Statistik
Modeling multivariate multilevel continuous responses with a hierarchical regression model for the mean and covariance matrix applied to a large nursing data set
HG G 19.1
Fri 16.09.2016
15:15-16:00
Venkat Chandrasekaran
California Institute of Technology, USA
Abstract
Regularization techniques are widely employed in the solution of inverse problems in data analysis and scientific computing due to their effectiveness in addressing difficulties due to ill-posedness. In their most common manifestation, these methods take the form of penalty functions added to the objective in optimization-based approaches for solving inverse problems. The purpose of the penalty function is to induce a desired structure in the solution, and these functions are specified based on prior domain-specific expertise. We consider the problem of learning suitable regularization functions from data in settings in which prior domain knowledge is not directly available. Previous work under the title of 'dictionary learning' or 'sparse coding' may be viewed as learning a polyhedral regularizer from data. We describe generalizations of these methods to learn semidefinite regularizers by computing structured factorizations of data matrices. Our algorithmic approach for computing these factorizations combines recent techniques for rank minimization problems along with operator analogs of Sinkhorn scaling. The regularizers obtained using our framework can be employed effectively in semidefinite programming relaxations for solving inverse problems. (Joint work with Yong Sheng Soh)
Research Seminar in Statistics
Learning Semidefinite Regularizers via Matrix Factorization
HG G 19.1
Fri 23.09.2016
15:15-16:00
Helen Odgen
University of Southampton, UK
Abstract
Many statistical models have likelihoods which are intractable: it is impossible or infeasibly expensive to compute the likelihood exactly. In such settings, a common approach is to replace the likelihood with an approximation, and proceed with inference as if the approximate likelihood were the exact likelihood. For example, in latent variable models, where the likelihood is an integral over the latent variables, a Laplace approximation to the likelihood is often used in place of the exact likelihood to do inference. I will describe general conditions which guarantee that this naive inference with an approximate likelihood has the same first-order asymptotic properties as inference with the exact likelihood, and discuss in detail the implications of these results for inference using a Laplace approximation to the likelihood in generalized linear mixed models.
Research Seminar in Statistics
Inference with approximate likelihoods
HG G 19.1
Thr 13.10.2016
16:15-17:00
Torsten Hothorn
Universität Zürich
Abstract
Transformation models are a surprisingly large and useful class of models for conditional and also unconditional distributions. Many known transformation models, for example the Cox proportional hazards model or proportional odds logistic regression, have been known for decades in survival or categorical data analysis. The strong connections between these models and other commonly used procedures, for example normal or binary linear models, are not very well known. It is very stimulating, both from an intellectual and a practical point of view, to interpret such classical models as transformation models and therefore as models for describing distributions instead of means. We will look at a cascade ranging from very simple to rather complex unconditional and conditional transformation models theoretically and practically. The R add-on package "mlt" (Most Likely Transformations) allows fitting many of such transformation models in the maximum likelihood framework and will be used to illustrate how one can estimate and analyse interesting transformations models in R. References http://dx.doi.org/10.1111/rssb.12017 http://arxiv.org/abs/1508.06749 http://CRAN.R-project.org/package=mlt https://cran.r-project.org/web/packages/mlt.docreg/vignettes/mlt.pdf
ZüKoSt Zürcher Kolloquium über Statistik
Understanding and Applying Transformation Models
HG G 19.1
Thr 20.10.2016
16:15-17:00
Thomas Hofmann
ETHZ Zürich
Abstract
This talk will provide an overview over recent trends in deep learning for natural language understanding. The focus will be on the structure and architecture of the network models used in this area, which in the last years has seen significant advances and innovations. In passing, the talk will also give a cursory introduction to key problems in language understanding.
ZüKoSt Zürcher Kolloquium über Statistik
Natural Language Understanding by Deep Networks
HG G 19.1
Fri 28.10.2016
15:15-16:00
Samantha Leorato
Università Tor Vergata, Roma
Abstract
Given a continuous random variable Y and a random vector X defined on the same probability space, the conditional distribution function (CDF) and the conditional quantile function (CQF) give rise to two competing approaches to the estimation of the conditional distribution of Y given X. One approach -- distribution regression -- is based on direct estimation of the conditional distribution function (CDF); the other approach -- quantile regression -- is instead based on direct estimation of the conditional quantile function (CQF). Since the CDF and the CQF are generalized inverses of each other, estimates of any functional of the distribution may be obtained by appropriately transforming the direct estimates of the CDF and the CQ. Similarly, indirect estimates of the CQF and the CDF may be obtained by taking the generalized inverse of the direct estimates. Contrary to the QR estimator, that typically refers to a conditional ALAD estimator, there is no unique choice for the DR estimator. One possibility is to define a binary choice model for any given threshold $y$ and the corresponding dummy variable $\{Y\leq y\}$. This choice is particularly suited to comparisons with the QR estimator, since, in the unconditional case, the two approaches are equivalent. Our paper focuses on comparing QR and DR approaches, and their performances in terms of efficiency, both asymptotically and for finite samples. Asymptotic efficiency is measured by asymptotic MSE of the rescaled estimators of the CDF (or of the CQF), where asymptotic MSE is the sum of the asymptotic variance and of the squared asymptotic bias. Asymptotic bias is allowed to be nonzero, thus taking into account some form of \emph{local} misspecification of either the QR or the DR models. For the asymptotic variance, we show that the choice of the link function used for DR estimation matters, and that under the most popular error distributions (i.e. logistic and normal) the QR is uniformly more efficient (in expectation). The finite sample performance is assessed by an extensive Monte Carlo exercise.
Research Seminar in Statistics
Distribution and Quantile Regressions
HG G 19.1
Wed 02.11.2016
16:15-17:00
Søren Højsgaard
Aalborg University, DK
Abstract
Mixed models in R (www.r-project.org) are currently usually handled with the \verb'lme4' package. Until recently, inference (hypothesis test) in linear mixed models with \verb'lme4' was commonly based on the limiting $\chi^2$ distribution of the likelihood ratio statistic. The \verb'pbkrtest' package provides two alternatives: 1) A Kenward-Roger approximation for calculating (or estimating) the numerator degrees of freedom for an "F-like" test statistic. 2) $p$-values based on simulating the reference distribution of the likelihood ratio statistic via parametric bootstrap. In the talk, I will illustrate the package through various examples, and discuss some directions for further developments.
ZüKoSt Zürcher Kolloquium über Statistik
Inference in mixed models in R - beyond the usual asymptotic likelihood ratio test
HG G 26.1
Fri 04.11.2016
15:15-16:00
Davy Paindaveine
Universität Brüssel
Abstract
We revisit, in an original and challenging perspective, the problem of testing the null hypothesis that the mode of a directional signal is equal to a given value. Motivated by a real data example where the signal is weak, we consider this problem under asymptotic scenarios for which the signal strength goes to zero at an arbitrary rate eta_n. Both under the null and the alternative, we focus on rotationally symmetric distributions. We show that, while they are asymptoti- cally equivalent under fixed signal strength, the classical Wald and Watson tests exhibit very different (null and non-null) behaviours when the signal becomes arbitrarily weak. To fully characterize how challenging the problem is as a function of eta_n, we adopt a Le Cam, convergence-of-statistical-experiments, point of view and show that the resulting limiting experiments crucially depend on eta_n. In the light of these results, the Watson test is shown to be adaptively rate-consistent and essentially adaptively Le Cam optimal. Throughout, our theoretical findings are illustrated via Monte Carlo simulations. The practical relevance of our results is also shown on the real data example that motivated the present work.
Research Seminar in Statistics
CANCELED: !!! Inference on the mode of weak directional signals: A Le Cam perspektive on hypothesis testing near singularities
HG G 19.2
Thr 17.11.2016
16:15-17:00
Gilles Monneret
Université Pierre et Marie Curie, Paris
Abstract
Gene network inference from transcriptomic data is a recent and major methodological challenge, usually based on partial correlations within a Gaussian graphical model framework. Recent methodological advances that fully exploit both observational and interventional (i.e., knock-out or knock-down) data go one step further by enabling the inference of causal networks. I will start with a method first proposed by Rau and al.(2013), which is based on Bayesian networks and can use a mix of observationnal and interventionnal data, even with several interventions in the same time. To do so, we use a MCMC procedure that work on the space of topological orders that lead to a posterior probability of ordering. We can then compute, for example, a mean networks for our data. In a second time, I will define a novel causal test to identify marginal causality for each of the interaction pairs. The proposed procedure is very fast and can be applied to thousands of genes simultaneously, which allows the pre-selection of a group of genes of interest for downstream causal network inference around an interventional gene. I will show that we obtain results very similar to differential analysis currently used in genomics. I will illustrate these two method with an application on one example using biological data.
Research Seminar in Statistics
Identification of causal relationships in gene networks, from observational and interventional expression data
HG G 19.1
Fri 18.11.2016
15:15-16:00
Gabor Lugosi
Universitat Pompeu Fabra
Abstract
Given n independent, identically distributed copies of a random variable, one is interested in estimating the expected value. Perhaps surprisingly, there are still open questions concerning this very basic problem in statistics. In this talk we are primarily interested in non-asymptotic sub-Gaussian estimates for potentially heavy-tailed random variables. We discuss various estimates and extensions to high dimensions. We apply the estimates for statistical learning and regression function estimation problems. The methods improve on classical empirical minimization techniques. This talk is based on joint work with Emilien Joly, Luc Devroye, Matthieu Lerasle, Roberto Imbuzeiro Oliveira, and Shahar Mendelson.
Research Seminar in Statistics
How to estimate the mean of a random variable?
HG G 19.1
Fri 02.12.2016
15:15-16:00
Martyn Plummer
IARC Lyon, France
Abstract
We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher information matrices may fail to be invertible along other competing submodels. Such singular models do not obey the regularity conditions underlying the derivation of Schwarz’s Bayesian information criterion (BIC) and the penalty structure in BIC generally does not reflect the frequentist large sample behaviour of their marginal likelihood. Although large sample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. Guided by examples such as determining the number of components of mixture models, the number of factors in latent factor models or the rank in reduced rank regression, we propose a resolution to this paradox and give a practical extension of BIC for singular model selection problems.
Research Seminar in Statistics
A Bayesian Information Criterion for Singular Models
HG G 19.1
Thr 08.12.2016
16:15-17:00
Nicolas Städler
F. Hoffmann-La Roche Ltd, Basel
Abstract
Our aim at Roche is for every person who needs our products to be able to access and benefit from them. Market access, that is the coverage and reimbursement of our products by payers, is a crucial success factor in achieving this goal. As healthcare spendings are accelerating payers and public health authorities are carefully assessing benefits of new drugs over and above drugs already on the market. Health Technology Assessment (HTA) agencies have therefore adopted stringent product evaluation strategies and their expectations in terms of evidence on effectiveness of a new product very often exceed those required for regulatory approval. In this talk I will present work-in-progress examples where we use advanced statistics to inform robust payer evidence. Firstly, I will discuss surrogate endpoint validation and show how in some cases this is a useful approach to make predictions on how effects measured on biomarkers or on surrogate endpoints translate into effects which are considered payer relevant. Secondly, I will discuss network meta-analysis and explain how we used this approach in chronic lymphocytic leukemia to inform payers on the comparative effectiveness of our product to others on the market. I will further discuss our ideas on how to extend network meta-analysis to also include non-randomized trials. Finally, I will discuss extrapolation of survival curves as a key ingredient to calculate the so-called Incremental Cost Effectiveness Ratio (ICER) which serves many payers as an important reference value in their decision making. I will discuss the limitations of classical parametric extrapolation and I will show how we use advanced techniques based on mixture models to improve extrapolation and to obtain more accurate estimates of the ICER.
ZüKoSt Zürcher Kolloquium über Statistik
Opportunities and Challenges of Statistics in Health Technology Assessment
HG G 19.1
Tue 13.12.2016
15:15-16:00
William Aeberhard
Dalhousie University, Halifax
Abstract
State-space models (SSMs) encompass a wide range of popular models encountered in various fields such as mathematical finance, control engineering and ecology. SSMs are essentially characterized by a hierarchical structure, with latent (unobserved) variables governed by Markovian dynamics. Classical estimation of fixed parameters in these models, for instance by maximizing an approximated marginal likelihood, is known to be highly sensitive to the correct specification of the model. This sensitivity is all the more so problematic since assumptions about latent variables cannot be verified by the data analyst. Motivated by the highly non-linear models used for fish stock assessments, we introduce robust estimators for general SSMs which remain stable under deviations from the assumed model. The implementation relies on Laplace's method, where automatic differentiation allows the user to robustly fit such a model in a matter of minutes. A real-life fish stock assessment example illustrates the reliable inference these estimators can yield and how robustness weights can be used as diagnostic tools.
Research Seminar in Statistics
Robust fitting of state-space models with application to fish stock assessments
HG G 19.2
JavaScript has been disabled in your browser