Seminar overview

×

Modal title

Modal content

Autumn Semester 2011

Date & Time Speaker Title Location
Fri 21.10.2011
15:15-16:30
Marco Scarsini
LUISS - Libera Università Internazionale degli Studi Sociali Guido Carli
Abstract
We compare estimators of the (essential) supremum and the integral of a function f defined on a measurable space when f may be observed at a sample of points in its domain, possibly with error. The estimators compared vary in their levels of stratification of the domain, with the result that more refined stratification is better with respect to different criteria. The emphasis is on criteria related to stochastic orders. For example, rather than compare estimators of the integral of f by their variances (for unbiased estimators), or mean square error, we attempt the stronger comparison of convex order when possible. For the supremum, the criterion is based on the stochastic order of estimators. with Larry Godstein and Yosi Rinott
Research Seminar in Statistics
Stochastic comparisons of stratified sampling techniques for some Monte Carlo estimators
HG G 19.1
Thr 27.10.2011
16:15-17:30
Tanja Stadler
ETH Zürich, Institute of Integrative Biology
Abstract
Phylogenetic trees of present-day species allow the inference of the rate of speciation and extinction which led to the present-day diversity. Classically, inference methods assume a constant rate of diversification. I will present a new inference methodology which can estimate changes in diversification rates through time, can detect mass extinction events, and can account for density-dependent speciation. The method is based on an in-depth analysis of a birth-death process with birth and death parameters being a function of time and / or the number of alive individuals. I use the method for testing the hypothesis of accelerated mammalian diversification following the extinction of the dinosaurs (65 Ma); none of the analyzed mammalian phylogenies showed a change in diversification rates at 65 Ma. Application of the method to bird data (Dendroica) reveals a density-dependent speciation process, agreeing with previous studies. The new method further allows to quantify the extinction rate which is estimated to be significantly larger than zero for these birds. The methods can easily be applied to other phylogenies using the R package TreePar available on CRAN.
ZüKoSt Zürcher Kolloquium über Statistik
Recovering macroevolutionary processes using phylogenetic methods
HG G 19.1
Thr 03.11.2011
16:15-17:30
Niel Hens
Universität Hasselt
Abstract
Hepatitis A is one of the most common vaccine-preventable infectious diseases causing significant though usually self-limiting morbidity and mortality (especially in developing country settings). Vaccination of individuals implemented for more than ten years according to various, mostly targeted strategies, together with improved sanitary conditions, have contributed to a substantial reduction of the economic burden associated with disease management. We aim to document and analyse the evolving epidemiology in Belgium, and use it as an example to demonstrate novel methods for the estimation of infectious disease parameters, while accounting for vaccine-unrelated time heterogeneity and vaccine uptake. Using two age-specific seroprevalence datasets from 1993 and 2002, respectively, we show how to estimate important epidemiological parameters in a time heterogeneous setting. More specifically, using a semi-parametric proportional hazards model we show how the time heterogeneous transmission parameters and consequently the basic reproduction number can be estimated. We supplement the analysis of serial seroprevalence data with an analysis of final size data on a series of recent hepatitis A clusters. In the absence of knowledge about the number of initial cases, several authors inferred the effective reproduction number based on final size data. We extend these approaches taking into account data complexities such as truncation, censoring and heterogeneity. Moreover, using a spatial analysis we are able to link these results to the results obtained from analysing serial seroprevalence data. The basic reproduction number has been shown to decrease over the past few decades and is currently estimated at about 1. The effective reproduction number is estimated to be smaller than one, even in provinces where this number is relatively higher due to a greater presence of second-generation immigrants, who maintain strong links with HAV endemic countries. In conclusion, hepatitis A is no longer endemic in Belgium likely due to improved sanitary conditions. This changing situation also indicates that susceptible people, especially children, who travel to countries where hepatitis A is still endemic should be preferred recipients of the vaccine.
ZüKoSt Zürcher Kolloquium über Statistik
The Statistical Analysis of Serial Seroprevalence and Final Size Data to Estimate Infectious Disease Parameters for Hepatitis A in Belgium
HG G 19.1
Fri 11.11.2011
15:15-16:30
Florian Frommlet
Universität Wien
Abstract
In many research areas today the number of features p for which data is collected is much larger than the sample size n based on which inference is made. This is especially true for genetical applications like QTL mapping or genome wide association studies (GWAS). Sparsity is a key notion to be able to perform statistical analysis when p >> n. It means that the number of true signals is small compared with the sample size. This talk will focus on certain modifications of Schwarz's Bayesian information criterion (mBIC and mBIC2) which have been developed to perform model selection under sparsity. These selection criteria are designed in such a way that in case of orthogonal regressors mBIC controls the family wise error rate, while mBIC2 controls the false discovery rate. After introducing the notion of asymptotic Bayes optimality under sparsity (ABOS) we will present recent results concerning some classical multiple testing procedures: While the Bonferroni procedure is ABOS only in case of extreme sparsity, it turns out that the Benjamini Hochberg procedure nicely adapts to the unknown level of sparsity. These results can be translated for mBIC and mBIC2 in the context of model selection. While the theory has been developed so far only for the case of orthogonal designs, simulation studies indicate that good properties of mBIC and mBIC2 also hold in more general situations. We will discuss the case of densely spaced markers in QTL mapping with experimental populations, where specific theory has been developed how to consider the correlation structure of markers. Finally we will present results from a comprehensive simulation study based on real SNP data, which illustrate the relevance of our approach to analyze GWAS data.
Research Seminar in Statistics
Modifications of BIC for model selection under sparsity: Theory and applications in genetics
HG G 19.1
Thr 17.11.2011
16:15-17:30
Erik van Zwet
Universität Leiden
Abstract
Researchers often want to know if one thing causes another. Statisticians respond that it is possible to test for association, but that association does not imply causation -- at least, not without further assumptions. Causal inference is about exploring what happens if we are willing to make such additional assumptions. Unfortunately, with its particular notation and terminology, causal inference seems very different from standard statistics. Judea Pearl, who wrote a book on causality, even states: "Almost by definition, causal and statistical concepts do not mix". I respectfully disagree, and in this talk I will argue that causal and statistical concepts mix very well.
ZüKoSt Zürcher Kolloquium über Statistik
Introduction to Causal Inference
HG G 19.1
Fri 18.11.2011
15:15-16:30
Davy Paindaveine
Universität Brüssel
Abstract
We consider semiparametric location-scatter models for which the p-variate observation is obtained as X = ΛZ + μ, where μ is a p-vector, Λ is a full-rank p × p matrix, and the (unobserved) random p-vector Z has marginals that are centered and mutually independent but are otherwise unspecified. As in blind source separation and independent component analysis (ICA), the parameter of interest throughout the paper is Λ. On the basis of n i.i.d. copies of X, we develop, under a symmetry assumption on Z, signed-rank one-sample testing and estimation procedures for Λ. We exploit the uniform local and asymptotic normality (ULAN) of the model to define signed-rank procedures that are semiparametrically efficient under correctly specified densities. Yet, as usual in rank-based inference, the proposed procedures remain valid (correct asymptotic size under the null, for hypothesis testing, and root-n consistency, for point estimation) under a very broad range of densities. We derive the asymptotic properties of the proposed procedures and investigate their finite-sample behavior through simulations.
Research Seminar in Statistics
Semiparametrically Efficient Inference Based On Signed Ranks In Symmetric Independent Component Models
HG G 19.1
Fri 25.11.2011
15:15-16:30
Yanyuan Ma
Texas A&M University, Department of Statistics
Abstract
We provide a novel and completely different approach to dimension reduction problems from the existing literature. We cast the dimension reduction problem in a semiparametric estimation framework and derive estimating equations. Viewing this problem from the new angle allows us to derive a rich class of estimators, and obtain the classical dimension reduction techniques as special cases in this class. The semiparametric approach also reveals that in the inverse regression context while keeping the estimation structure intact, the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression. The semiparametric estimators without these common assumptions are illustrated through simulation studies and a real data example.
Research Seminar in Statistics
A Semiparametric View to Dimension Reduction
HG G 19.1
Thr 01.12.2011
16:15-17:30
Steffen Unkel
Open University, UK
Abstract
The relative frailty variance among survivors provides a readily interpretable measure of how the heterogeneity of a population, as represented by a frailty model, evolves over time. In the first part of this talk, a new measure for assessing the temporal variation in the strength of association in bivariate current status data is proposed. This novel measure is relevant for shared frailty models. We show that this measure is particularly convenient,owing to its connection with the relative frailty variance and its interpretability in suggesting appropriate frailty models. We introduce a method of estimation and standard errors for this measure. We discuss its properties and compare it to two existing measures of association applicable to current status data. Small sample performance of the measure in realistic scenarios is investigated using simulations. In the second part of this talk, we investigate the possible shapes of the relative frailty variance function for the purpose of model selection, and review available frailty distribution families in this context. Several new families of frailty distributions are introduced, including simple but flexible time-varying frailty models. The methods are illustrated with bivariate serological survey data on different pairs of infections.
ZüKoSt Zürcher Kolloquium über Statistik
On assessing time-varying association for shared frailty models with bivariate current status data
HG G 19.1
Fri 02.12.2011
15:15-16:30
Genton Marc G.
Texas A&M University, Department of Statistics
Abstract
In many statistical experiments, the observations are functions by nature, such as temporal curves or spatial surfaces/images, where the basic unit of information is the entire observed function rather than a string of numbers. For example the temporal evolution of several cells, the intensity of medical images of the brain from MRI, the spatio-temporal records of precipitation in the U.S., or the output from climate models, are such complex data structures. Our interest lies in the visualization of such data and the detection of outliers. With this goal in mind, we have defined functional boxplots and surface boxplots. Based on the center outwards ordering induced by band depth for functional data or surface data, the descriptive statistics of such boxplots are: the envelope of the 50% central region, the median curve/image and the maximum non-outlying envelope. In addition, outliers can be detected in a functional/surface boxplot by the 1.5 times the 50% central region empirical rule, analogous to the rule for classical boxplots. We illustrate the construction of a functional boxplot on a series of sea surface temperatures related to the El Nino phenomenon and its outlier detection performance is explored by simulations. As applications, the functional boxplot is demonstrated on spatio-temporal U.S. precipitation data for nine climatic regions and on climate general circulation model (GCM) output. Further adjustments of the functional boxplot for outlier detection in spatio-temporal data are discussed as well. The talk is based on joint work with Ying Sun.
Research Seminar in Statistics
Functional Boxplots for Visualization of Complex Curve/Image Data: An Application to Precipitation and Climate Model Output
HG G 19.1
Thr 08.12.2011
16:15-17:30
Marcel Dettling
Zürcher Hochschule für angewandte Wissenschaften, Winterthur
Abstract
Im Kanton Luzern wird der Schadstoffeintrag in den Sempachersee überwacht. Dazu werden ständige Durchflussmessungen vorgenommen, während die Schadstoffkonzentrationen nur periodisch gemessen werden. Die nicht beobachteten Konzentrationen werden nach statistischer Analyse mit einem Regressionsansatz geschätzt, und schliesslich zu einer totalen Fracht aggregiert. Dabei machte das von Gewässer- und Bodenfachleuten ursprünglich verwendete nichtlineare Regressionsmodell "Probleme". Es musste in einem ersten Schritt durch Startwertschätzung, Einschränkung von Parameterbereichen und einer Entkoppelung durch Reparametrisierung einsatzfähig getrimmt werden. Ebenso wurde darauf ein neuartiger zweiter Ansatz entwickelt, welcher mit einer einfachen Heuristik das vorliegende Anwendungsproblem "linearisiert". Dies mindert einerseits die technisch-mathematischen Herausforderungen, und erlaubt andererseits das Einbeziehen von zusätzlichen Einflussgrössen, sowie eine auf Bootstrap basierende Fehlerrechnung. Im Vortrag werden die beiden Ansätze einander gegenübergestellt und verglichen. Ebenso werden die Herausforderungen in der praktischen Umsetzung aufgezeigt und die vorgenommenen Verbesserungen in kurzen Theorieblöcken erläutert.
ZüKoSt Zürcher Kolloquium über Statistik
Modellierung von Schadstofffrachten in Gewässern um den Sempachersee
HG G 19.1
Thr 15.12.2011
16:15-17:30
Johannes Textor
Universität Utrecht
Abstract
A causal diagram (also called Bayesian network, graphical model, or DAG) encodes assumptions about causal relationships between a set of observed and unobserved variabels of interest. Provided that the encoded assumptions are correct, one can use the causal diagram to determine whether and how it is possible to estimate a causal effect of interest from observed (non-experimental) data by means of covariate adjustment. This is a key methodological issue in empirical disciplines like epidemiology, psychology, and the social sciences. Depending on the type of causal effect to be estimated (e.g. total, direct, or mediated effect), there exist different criteria for deciding on the validity of adjustment for a given covariate set. Unfortunately, most of the existing criteria are not complete in the sense that an adjustment may still be valid even if criterion is violated. Moreover, they lead to exponential time algorithms, which may be prohibitive for even moderately sized diagrams. We propose a new criterion that unifies several existing criteria, and is both sound and complete. Moreover, we discuss under which circumstances it is possible to efficiently enumerate all minimal covariate set in a causal diagram that fulfill the criterion. Finally, we shortly describe our implementation of this method in the open source tool DAGitty (www.dagitty.net), and outline its relevance in current epidemiological research.
Research Seminar in Statistics
Using Causal Diagrams to Dissect Causal from Biasing Effects
HG G 19.1
Fri 20.01.2012
15:15-16:30
Boaz Nadler
Weizmann Institute of Science, Israel
Abstract
Roy's largest root is one of the four most common tests in multivariate analysis of variance (MANOVA), with applications in many other problems, including signal detection in noise, and canonical correlation analysis. The other three popular tests, namely Wilks Lambda, Hotelling-Lawley trace and Pillai-Bartlett trace, have been thoroughly studied, and accurate F-type approximations to their distributions have been derived. In contrast, accurate and tractable approximations to the distribution of Roy's largest root test have so far resisted such analysis and remained an open problem for several decades. In this talk, I'll derive a simple yet accurate approximation for the distribution of Roy's largest root test, in the extreme case of a rank-one alternative, also known as concentrated non-centrality, where the difference between groups is concentrated in a single direction, or similarly only a single signal is present. Our results allow power calculations for Roy's test, and provide a lower bound on the minimal number of samples required to detect a given group difference, or a given signal strength. Joint work with Iain Johnstone (Stanford).
Research Seminar in Statistics
On Roy's largest root test for signal detection in noise, MANOVA and canonical correlation analysis
HG G 19.1
JavaScript has been disabled in your browser