Seminar overview

×

Modal title

Modal content

Autumn Semester 2009

Date & Time Speaker Title Location
Fri 11.09.2009
15:15-16:15
Nicolai Meinshausen
University of Oxford, Oxford UK
Abstract
In higher-dimensional regression and classification, there is a natural tradeoff between simplicity of the algorithm and its predictive power. Simple procedure like trees are intuitive to understand, yet they are clearly beaten in terms of predictive accuracy by more complex methods like tree ensembles, including Random Forests. I will share some thoughts and notes on this and show that a convex relaxation to an optimal partitioning of the data yields a new algorithm, which I call group aggregation. Predictions are simply weighted averages over empirical group means, for suitably selected groups of observations. The group selection amounts to a quadratic programming problem and can be solved effciently. Even though the algorithm contains no explicit tuning parameters in the most simple version, group aggregation gets close to Random Forests predictive power, while maintaining or surpassing the simplicity of trees. An application to emulation in climate modeling will be discussed.
Research Seminar in Statistics
Trees, Forests and Group aggregation
HG G 19.1
Fri 18.09.2009
15:15-16:15
Valerie Isham
University College London UK
Abstract
The Susceptible-Infected-Removed (SIR) epidemic model is a fundamental model for the spread of infection in a homogeneously-mixing population. It is a special case of a more general stochastic rumour model in which there is an extra interaction. Thus, not only does an ignorant (susceptible) contacted by a spreader (infective) become a spreader, and spreaders may "forget" the rumour and become stiflers (removals), but also spreaders may become stiflers if they attempt to spread the rumour to a spreader or stifler (who will have already heard it). For both epidemics and rumours, there is particular interest in using a random network to represent population structure, with applications to the spread of infection or information on social networks. The talk will discuss a) the effect of the population size on thresholds for epidemic/rumour spread, and b) the effect of different network structures.
Research Seminar in Statistics
Rumours and epidemics on random networks
HG G 19.1
Thr 01.10.2009
16:15-17:30
Jörg Rahnenführer
Technische Universität Dortmund
Abstract
Bioinformatics research in the post-genomic era has to cope with a flood of high-dimensional data sets. The ultimate goal is a personalized medicine that uses measurements from individual patients for an improved diagnosis and therapy of diseases. The high complexity and noise levels in the data require the development and application of suitable statistical models and algorithmic procedures. However, to answer biologically relevant questions, expertise in statistics and computer science has to be combined with meaningful biological modelling. The result of a typical microarray experiment is a long list of genes with corresponding expression measurements. The interpretation of such high-dimensional data is difficult, both in terms of statistics and regarding biology and medicine. A modern, popular and promising approach for a meaningful dimension reduction is to integrate into the analysis biological a priori knowledge in the form of predefined functional gene groups, for example based on the Gene Ontology (GO). Instead of identifying important single genes, relevant groups of genes with a common biological function are detected. We present two applications for this approach, cancer classification and survival prognosis. In the first part, we describe the general procedure for scoring the statistical significance of gene groups and therefore the impact of corresponding biological processes on cancer classification. In addition, we demonstrate how this approach can be improved by integrating information on the relationships between gene groups. In the second part, we show how gene groups can be used for building survival prediction models based on the Cox regression model. We apply several feature selection procedures in order to generate predictive models for future patients. We show that adding gene groups as covariates to survival models built from single genes improves interpretability while prediction performance remains stable.
ZüKoSt Zürcher Kolloquium über Statistik
Improved interpretation of microarray data with gene groups: Cancer classification and survival prognosis
HG G 19.1
Fri 02.10.2009
15:15-16:15
Jörg Rahnenführer
Technische Universität Dortmund
Abstract
Human tumors are often associated with typical genetic events like tumor-specific chromosomal alterations. The identification of characteristic pathogenic routes in such tumors can improve the prediction of (disease-free) survival times und thus helps in choosing the optimal therapy. In recent years we have developed a biostatistical model for estimating the most likely pathways of chromosomal alterations from cross-sectional data. In this model progression is described by the irreversible, typically sequential, accumulation of somatic changes in cancer cells. The model was validated both statistically and clinically in various ways. We have also introduced a method to determine the optimal number of tree components based on a new BIC criterion. The new model is characterized by a high level of interpretability. Further, it allows the introduction of a genetic progression score (GPS) that quantifies univariately the progression status of a disease. Progression of a single patient along such a model is typically correlated with increasingly poor prognosis. Using Cox regression models we could demonstrate that the GPS is a medically relevant prognostic factor that can be used to discriminate between patient subgroups with different expected clinical outcome. Both for prostate cancer patients and for patients with different types of brain tumors a higher GPS is correlated with shorter time to relapse or death. The clinical relevance of such a disease progression model depends on the stability of the statistical model estimation process and on the predictive power of the derived progression score regarding survival times. Simulation studies show that the topology of our model can not always be estimated precisely. We present a study for determining the necessary sample size for recovering a true relationship between genetic progression and disease-free survival times. All studies are performed with the new R package Rtreemix for the estimation of such progression models.
Research Seminar in Statistics
Statistical methods for estimating cancer progression from genetic measurements
HG G 19.1
Thr 22.10.2009
16:15-17:30
Paul Fearnhead
Lancaster University, UK
Abstract
We describe an efficient algorithm for Bayesian analysis of multiple changepoint models. In many scenarios it enables iid samples from the posterior distribution. Approximate versions (which introduce negligble error) have a computational cost that is linear in the number of observations - and thus can be applied to large data sets (such as arise in modern bioinformatic applications). The method is demonstrated on applications that range from inference about the divergence of Salmonella Typhi and Paratyphi A, to inference about the Isochore structure of the human genome.
ZüKoSt Zürcher Kolloquium über Statistik
Efficient Bayesian analysis of multiple changepoint models
HG G 19.1
Fri 23.10.2009
15:15-16:15
Paul Fearnhead
Lancaster University UK
Abstract
We present a general approach for performing sequential importance sampling for general diffusion models. This method avoids any time-discretisation approximation, and thus enables unbiased estimates of expectations of functions of the diffusion. It can be derived by considering simple sequential importance samplers for discrete-time approximations to diffusions, together with the tricks of Rao-Blackwellisation and retrospective sampling. The approach is related to recent work on unbiased estimation (and perfect simulation) of diffusions, but extends considerably the class of diffusion models that can be considered. It is also related to work onunbiased estimation by Wagner(1989). The links to these previous works will be discussed.
Research Seminar in Statistics
Sequential Importance Sampling for General Diffusions
HG G 19.1
Thr 05.11.2009
16:15-17:30
Juliane Schäfer
UniSpital Basel, Institut für klinische Epidemiologie und Biostatistik
Abstract
HIV may accelerate the loss of renal function. Evidence on the protective effect of combination antiretroviral therapy (cART) on renal function is conflicting due to the limitations of past studies to adequately model risk factors and cART components known to be related to renal function. We estimate glomerular filtration rate (GFR) with the Modification of Diet in Renal Disease (MDRD) Study equation and consider linear mixed effects models to characterize change over time and the factors that influence change, such as exposure to antiretrovirals. I will present results from this case study and share some thoughts on statistical model building.
ZüKoSt Zürcher Kolloquium über Statistik
Predictors for change in glomerular filtration rate in HIV-infected individuals: the Swiss HIV Cohort Study
HG G 19.1
Thr 12.11.2009
16:15-17:30
Jelle Goeman
Universität Leiden
Abstract
We propose three-sided testing, a testing framework for simultaneous testing of inferiority, equivalence and superiority in clinical trials, based on the partitioning principle. Like the usual two-sided testing approach, this approach is completely symmetric in the two treatments compared. Still, because the hypotheses of inferiority and superiority are tested with one-sided tests, the proposed approach has more power than the two-sided approach to infer non-inferiority. Applied to the classical point null hypothesis of equivalence, the three sided testing approach shows that it is sometimes possible to make an inference on the sign of the parameter of interest, even when the null hypothesis itself could not be rejected. Relationships with confidence intervals are explored, and the effectiveness of the three-sided testing approach is demonstrated in a number of recent clinical trails.
ZüKoSt Zürcher Kolloquium über Statistik
Three-sided Hypothesis Testing: Simultaneous Testing of Superiority, Equivalence and Inferiority
HG G 19.1
Fri 13.11.2009
15:15-16:15
Jelle Goeman
University Leiden
Abstract
We present a general sequentially rejective multiple testing procedure for multiple hypothesis testing. Many well known familywise error (FWER) controlling methods can be constructed as special cases of this procedure, among which are the procedures of Holm, Shaffer and Hochberg, parallel and serial gatekeeping procedures, modern procedures for multiple testing in graphs, resampling based multiple testing procedures, and even the closed testing and partitioning procedures. It is possible to prove that sequentially rejective multiple testing procedures strongly control the FWER if they fulfill simple criteria of monotonicity of the critical values and weak FWER control in each single step. The sequential rejection principle thus gives a novel theoretical perspective on many well-known multiple testing procedures, emphasizing the sequential aspect. Its main practical usefulness is for the development of multiple testing procedures for null hypotheses, possibly logically related, that are structured in a graph. We illustrate the general procedure with many examples of graph-based and other procedures.
Research Seminar in Statistics
The Sequential Rejection Principle of Familywise Error Rate Control
HG G 19.1
Thr 19.11.2009
16:15-17:30
Reinhard Furrer
Universität Zürich, Institut für Mathematik
Abstract
While the cause of projected global climate change is largely undisputed, many details about un certainties, the inter-relation of the climate models or about regional projections still need to be addressed. We present a (Bayesian) hierarchical framework to synthesize multi-model climate projections aiming to address the aforementioned open questions. This flexible statistical technique can be applied to current or future projections, regionally or globally, and is based on the assumption that spatial patterns of climate projections can be separated into a large scale signal related to the true forced climate signal and a small scale signal stemming from model bias and internal variability. The different scales are represented via a dimension reduction technique in a hierarchical Bayes model. Posterior probabilities are obtained using a Markov chain Monte Carlo simulation technique. The method presented here takes into account uncertainty due to the use of structurally different climate models and provides PDFs of localized climate change that are nevertheless coherent with the distribution of climate change in neighboring locations.
ZüKoSt Zürcher Kolloquium über Statistik
Hierarchical framework for multi-model climate projections
HG G 19.1
Fri 04.12.2009
15:15-16:00
Richard Samworth
Cambridge University, UK
Abstract
Research Seminar in Statistics
Maximum likelihood estimation of a multidimensional log-concave density
HG G 19.1
Fri 04.12.2009
16:20-17:05
Ya'acov Ritov
Hebrew University, Jerusalem
Abstract
We consider the maximal a-posteriori path estimator of an HMM process. We show that this estimator may be unreasonable when the state space is non-finite, or the process is in continuous time. We argue that this sheds a doubt on the usefulness of the concept in the standard finite state space in discrete time HMM model. We will then discuss some results concerning the well behavior of the a-posteriori probability of the a state given the data.
Research Seminar in Statistics
A map to nowhere
HG G 19.1
JavaScript has been disabled in your browser