Seminar overview

×

Modal title

Modal content

Spring Semester 2011

Date & Time Speaker Title Location
Thr 03.03.2011
16:15-17:45
Peter Bühlmann
Seminar für Statistik, ETH Zürich
Abstract
This tutorial surveys methodology and aspects of theory for high-dimensional statistical inference when the number of variables or features greatly exceeds sample size. In the high-dimensional setting, major challenges include designing computational algorithms that are feasible for large-scale problems, assigning statistical error rates (e.g., p-values), and developing theoretical insights about the limits of what is possible. We will present some of the most important recent developments and discuss their implications for statistical practice.
ZüKoSt Zürcher Kolloquium über Statistik
High-dimensional statistics: a 90 minutes tutorial
HG G 19.1
Thr 10.03.2011
16:15-17:30
Thomas Kneib
Universität Oldenburg
Abstract
Sample selection models attempt to correct for non-randomly selected data in a two-model hierarchy where, on the first level, a binary selection equation determines whether a particular observation will be available for the second level, i.e. in the outcome equation. Ignoring the non-random selection mechanism that is induced by the selection equation may result in biased estimation of the coefficients in the outcome equation. In the application that motivated this research, we analyse relief supply in earthquake-affected communities in Pakistan, where the decision to deliver goods represents the dependent variable in the selection equation whereas factors that determine the amount of goods supplied are analysed in the outcome equation. In this application, the inclusion of spatial effects is necessary since the available covariate information on the community level is rather scarce. Moreover, the high temporal dynamics underlying the immediate delivery of relief supply after a natural disaster calls for non-linear, time varying effects. We propose a geoadditive sample selection model that allows us to address these issues in a general Bayesian framework with inference being based on Markov chain Monte Carlo simulation techniques and apply it to the relief supply data from Pakistan.
ZüKoSt Zürcher Kolloquium über Statistik
Bayesian Geoadditive Sample Selection Models
HG G 19.1
Fri 11.03.2011
15:15-16:30
Elvezio Ronchetti
University of Geneva
Abstract
Indirect inference provides a broad class of estimators and testing procedures that can be used to carry out inference in complex models, where e.g. the likelihood function is not available in closed form. These techniques have now been successfully applied in a variety of fields, including engineering, biostatistics, and finance. Given a model and the data, an estimate of the parameter of an auxiliary (simpler) model is first obtained. Then, pseudo-data are simulated from the original (complex) model and the auxiliary estimate is computed on the pseudo-data. Finally, the estimate of the parameter of the original model is obtained by minimizing a distance between the auxiliary estimates computed on the data and on the pseudo-data. In this talk we address two important issues. First it is known that classical (especially over-identification) tests based on the asymptotic theory have a poor finite sample accuracy. Therefore, we introduce new accurate parameter and over-identification tests for indirect inference which exhibit an excellent finite sample behavior. Secondly, we address the robustness issue of these procedures by providing estimators and tests for indirect inference which are not unduly influenced by small deviations from the assumed model. By combining these two properties, we obtain accurate and reliable procedures for indirect inference. The theoretical results are illustrated in various models, including nonlinear regression, Poisson regression with over-dispersion, and diffusion models. Joint work with Veronika Czellar, HEC Paris.
Research Seminar in Statistics
Accurate and Robust Indirect Inference
HG G 19.1
Fri 01.04.2011
15:15-16:30
Nicolas Vayatis
CMLA de Cachan
Abstract
In a world of overwhelming information, ranking has become one of the most critical tasks for high level data processing. In the talk, we focus on the problem of learning to rank high dimensional vectors based on past data with binary feedback, say positive and negative. From a mathematical point of view, the problem is to find a real-valued scoring function which leads to the highest possible density of positive observations among the highest values of the scoring function. We discuss the nature of optimal scoring functions, performance measures, statistical aspects and practical algorithms inspired from decision trees and random forests.
Research Seminar in Statistics
Nonparametric scoring and ranking trees
HG G 19.1
Tue 05.04.2011
16:00-17:15
Volkan Cevher
EPFL
Abstract
Special Talk, invited and organized by Prof. Andreas Krause, Learning and Adaptive Systems, Department of Computer Science, ETH Zurich We develop a principled way of identifying probability distributions whose independent and identically distributed (iid) realizations are compressible, i.e., can be approximated as sparse. We focus on the context of Gaussian random underdetermined linear regression (GULR) problems, where compressibility is known to ensure the success of estimators exploiting sparse regularization. We prove that many of the conventional priors revolving around probabilistic interpretations of the p-norm (p<=1) regularization algorithms are in fact incompressible in the limit of large problem sizes. To show this, we identify nontrivial undersampling regions in GULR where the simple least squares solution almost surely outperforms an oracle sparse solution, when the data is generated from a prior such as the Laplace distribution. We provide rules of thumb to characterize large families of compressible and incompressible priors based on their second and fourth moments. Generalized Gaussians and generalized Pareto distributions serve as running examples for concreteness. We then conclude with a study of the statistics of wavelet coefficients of natural images in the context of compressible priors. Bio: Prof. Volkan Cevher received his BSc degree (valedictorian) in Electrical Engineering from Bilkent University in 1999, and his PhD degree in Electrical and Computer Engineering from Georgia Institute of Technology in 2005. He held Research Scientist positions at University of Maryland, College Park during 2006-2007 and at Rice University during 2008-2009. Currently, he is an Assistant Professor at Ecole Polytechnique Federale de Lausanne with joint appointment at the Idiap Research Institute and a Faculty Fellow at Rice University. His research interests include signal processing theory, machine learning, graphical models, and information theory. Time and location: April 5 2011, 4pm in HG E 22 More information: Andreas Krause, http://las.ethz.ch Volkan Cevher, http://lions.epfl.ch
Research Seminar in Statistics
Compressible priors for high-dimensional statistics
HG E 22
Fri 08.04.2011
15:15-16:30
Stanislav Anatolyev
New Economic School, Moscow
Abstract
Sequential procedures of testing for structural stability do not provide enough guidance on the shape of boundaries that are used to decide on acceptance or rejection, requiring only that the overall size of the test is asymptotically controlled. We introduce and motivate a reasonable criterion for a shape of boundaries which requires that the test size be uniformly distributed over the testing period. Under this criterion, we numerically construct boundaries for most popular sequential tests that are characterized by a test statistic behaving asymptotically either as a Wiener process or Brownian bridge. We handle this problem both in a context of retrospecting a historical sample and in a context of monitoring newly arriving data. We tabulate the boundaries by Ötting them to certain áexible but parsimonious functional forms. Interesting patterns emerge in an illustrative application of sequential tests to the Phillips curve model. Key words: Structural stability; sequential tests; CUSUM; retrospection; monitoring; boundaries; asymptotic size. _
Research Seminar in Statistics
Sequential testing with uniformly distributed size
HG G 19.1
Fri 13.05.2011
15:15-16:30
Michael Chichignoud
Université de Provence, Marseille
Abstract
We study the nonparametric regression model ((X1, Y1),...(Xn ,Yn)), where (Xi)i is the deterministic design and (Yi)i is a sequence of independent variables. Assume that the density Yi is known and can be written as g(., f(Xi)), which depends on a regression function f at the point Xi. This function is assumed smooth, i.e. belonging at a Hölder ball. The aim is to estimate the regression function at a given point y from the observations (pointwise estimation) and to find the optimal estimator (in the sense of rates of convergence) for each density g. We use the locally parametric approach to construct a new local bayesian estimator. Under some conditions on g, we propose an adaptive procedure based on the so-called Lepski’s method (adaptive selection of the bandwidth) which allows us to construct an optimal adaptive bayesian estimator.
Research Seminar in Statistics
Statistical performances of a bayesian estimator
HG G 19.2
Thr 19.05.2011
16:15-17:30
Mario Fritz
MPI Saarbrücken
Abstract
Visual recognition is one of the key technologies for future applications in robotics, surveillance, media retrieval, personal assistance, etc. Despite the dramatic progress over the last decade, many fundamental questions remain unanswered. In my talk I will elaborate on one of those questions which is how to best encode visual information in order to facilitate robust and scalable recognition. Recently, sparse coding approaches have shown superior performance in comparison to the predominant vector quantization paradigm. We have proposed a probabilistic version of such coding schemes in a bayesian setting. Based on Latent Dirichlet Allocation (LDA) we have presented a latent additive feature model that has shown state-of-the-art performance in visual category recognition and detection as well as treatment of transparent objects. Most recently, the approach has been extended in a hierarchical fashion in order to provide a joint inference scheme in a multi-layered representation. The talk will give a brief introduction to the research area, describe the outlined approach in detail and show its merits on real-world data.
ZüKoSt Zürcher Kolloquium über Statistik
Latent Additive Feature Models for Visual Recognition
HG G 19.1
Thr 26.05.2011
16:15-17:30
Robert G. Staudte
La Trobe University, Melbourne
Abstract
In the beginning was R.A. Fisher, who created variance stabilization and saw that it was good. Many statistical descendants have found it to be a powerful tool in applications. A recent book Kulinskaya, Morgenthaler and Staudte (Wiley, 2008) formalized this powerful tool to define statistical evidence as the transformed statistic, because it allowed for simple calibration and interpretation, as well as combination of evidence from different studies in a metaanalysis. It turns out that the Key function of their book that arises in variance stabilization has a remarkably strong link to the Kullback-Leibler symmetrized divergence. This result leads to some surprising global approximations and new variance stabilized statistics.
Research Seminar in Statistics
The relationship between variance stabilized statistics and the Kullback-Leibler symmetrized divergence.
HG G 19.1
Tue 31.05.2011
15:15-16:30
Steve Scott
Google
Abstract
A multi-armed bandit is a sequential experiment with the goal of accumulating the largest possible reward from a payoff distribution with unknown parameters that are learned through experimentation. This article describes a heuristic for managing multi-armed bandits called randomized probability matching, which randomly allocates observations to arms according the Bayesian posterior probability that each arm is optimal. Advances in Bayesian computation have made randomized probability matching easy to apply to virtually any payoff distribution. This flexibility frees the experimenter to work with payoff distributions that correspond to certain classical experimental designs that have the potential to outperform "optimal" sequential methods.
ZüKoSt Zürcher Kolloquium über Statistik
A Modern Bayesian Look at the Multi-Armed Bandit
HG G 19.1
Thr 16.06.2011
15:15-16:30
Jonas Peters
Max-Planck-Campus, Tübingen
Abstract
This work addresses the following question: Under what assumptions on the data generating process can one infer the causal graph from the joint distribution? Constraint-based methods like the PC algorithm assume the Markov condition and faithfulness. These two conditions relate conditional independences and the graph structure, which allows to infer properties of the graph from conditional independences that can be found in the joint distribution. These methods, however, encounter the following difficulties: (1) One can discover causal structures only up to Markov equivalence classes, in particular one cannot distinguish between X -> Y and Y -> X. (2) Conditional independence testing is very difficult in practice. (3) When the process is not faithful, the results may be wrong, but the user does not realize it. We propose an alternative by defining dentifiable Functional Model classes (IFMOCs) and provide the example of additive noise models with additional constraints (e.g. X3=f(X1,X2)+N, but N should not be Gaussian when f is linear). Based on these classes we develop a causal inference method that overcomes some of the difficulties from before: (1) One can identify causal relationships even within an equivalence class. (2)Intuitively, fitting the model is in a sense easier than conditional independence testing. (3) We do not require faithfulness, but rather impose a model class on the data. When the model assumptions are violated, however, (e.g. the data do not follow the considered IFMOC or some of the variables are unobserved), the method would output "I do not know" rather than giving wrong answers.We regard our work as being theoretical. Although results on simulated data and on some real world data sets look promising, extensive experiments on real systems are necessary to verify the proposed principles.
Research Seminar in Statistics
Causal Inference using Identifiable Functional Model Classes.
HG G 19.1
JavaScript has been disabled in your browser