Seminar overview

×

Modal title

Modal content

Spring Semester 2014

Date & Time	Speaker	Title	Location
Thr 20.02.2014 16:15-17:00	Felix Franke BSSE, ETH Zürich	Abstract One of the workhorses of neuroscience are extracellular recordings. Here, one or multiple electrodes are brought close to the cell bodies of neurons to measure their electrical activity. Neural activity can then be related to behavioral parameters or external stimuli to infer the function and mechanics of the underlying neural networks. However, despite its importance and considerable amount of research directed towards it, extracting neural activity from extracellular recordings, a process called "spike sorting", remains one of the bottlenecks of neuroscience and many laboratories still rely on the use custom made software with a large human component in the analysis. This not only costs expensive human resources but manual spike sorting was shown to lead to high error rates and, dependent on who did the analysis, idiosyncratic biases in the resulting data. Furthermore, since the amount of recorded data is increasing dramatically, in the near future, manual spike sorting will not be an viable option anymore. I will discuss the approaches to solve this problem taken by our lab, highlight their problems and hint at a potential better solution. ZüKoSt Zürcher Kolloquium über Statistik The curse of dimensionality in neuroscience. How to extract single neuronal activity from multi-electrode recordings.	HG G 19.1
Fri 21.02.2014 15:15-16:00	Valen Johnson Texas A&M University	Abstract Uniformly most powerful Bayesian tests are defined and compared to classical uniformly most powerful tests. By equating the rejection regions of these two tests, an equivalence between Bayes factors based on these tests and frequentist p-values is illustrated. The implications of this equivalence for the reproducibility of scientific research are examined. Approximately uniformly most powerful Bayesian tests are described for t tests, and the power of these tests are compared to ideal Bayes factors (defined by determining the best test alternative for each true value of the parameter), as well as to Bayes factors obtained using the true parameter value as the alternative. Interpretations and asymptotic properties of these Bayes factors are also discussed. Research Seminar in Statistics Uniformly most powerful Bayesian tests and the reproducibility of scientific research	HG G 19.1
Fri 28.02.2014 15:15-16:00	Tom Claassen Radboud University Nijmegen, The Netherlands	Abstract Causal discovery lies at the heart of most scientific research today. It is the science of identifying presence or absence of cause-effect relations between certain variables in a model. Building up such a causal model from (purely) observational data can be hard, especially when latent confounders (unobserved common causes) may be present. For example, it is well-known that learning a minimal Bayesian network (BN) model over a (sub)set of variables from an underlying causal DAG is NP-hard, even for sparse networks with node degree bounded by k. Given that finding a minimal causal model is more complicated than finding a minimal DAG it was often tacitly assumed that causal discovery in general was NP-hard as well. Indeed the famous FCI algorithm, long the only provably sound and complete algorithm in the presence of latent confounders and selection bias, has worst-case running time that is exponential in the number of nodes N, even for sparse graphs. Perhaps surprisingly then it turns out that we can exploit the structure in the problem to reconstruct the correct causal model in worst case N^(2k+4) independence tests, i.e. polynomial in the number of nodes. In this talk I will present the FCI+ algorithm as the first sound and complete causal discovery algorithm that implements this approach. It does not solve an NP-hard problem, and does not contradict any known hardness results: it just shows that causal discovery is perhaps more complicated, but not as hard as learning a minimal BN. In practice the running time remains close to the PC limit (without latent confounders, order k*N^(k+2), similar to RFCI). Current research aims to tighten complexity bounds and further optimize the algorithm. Research Seminar in Statistics FCI+ or Why learning sparse causal models is not NP-hard	HG G 19.1
Thr 06.03.2014 16:15-17:00	David Rossell University of Warwick	Abstract Gene expression in general, and alternative splicing (AS) in particular, is a phenomenon of great biomedical relevance. For instance, AS differentiates humans from simpler organisms and is involved in multiple diseases such as cancer and malfunctions at the cellular level. Although now high-throughput sequencing allows to study AS at full resolution, having adequate statistical methods to design and analyze such experiments remains a challenge. We propose a Bayesian model to estimate the expression of known variants (i.e. an estimation problem), finding variants de novo (i.e. a model selection problem) and designing RNA-seq experiments. The model captures several experimental biases and uses novel data summaries that preserve more information than the current standard. Regarding model selection, a critical challenge is that the number of possible models increases super-exponentially with gene complexity (measured by the number of exons). It is therefore paramount to elicit prior distributions that are effective at inducing parsimony. We use non-local priors on model-specific parameters, which improve both parameter estimation and model selection. The model space prior is derived from the available genome annotations, so that it represents the current state of knowledge. Compared to three popular methods, our approach reduces MSE by several fold, increases the correlation between experimental replicates and is efficient at finding previously unknown variants. By using posterior predictive simulation, we compare several experimental setups and sequencing depths to indicate how to best continue experimentation. Overall, the framework illustrates the value of incorporating careful statistical considerations when analyzing RNA-sequencing data. ZüKoSt Zürcher Kolloquium über Statistik RNA-seq and alternative splicing. A high-dimensional estimation, model selection & experimental design problem.	HG G 19.1
Thr 13.03.2014 16:15-17:00	Michael Amrein UBS AG, Zürich	Abstract UBS predicts the risk measure 1-day Value-at-Risk (VaR) using historical simulation. For this, the profit-and-loss per day (PnL) of a financial asset is represented by a function of current market data and daily risk factor returns, e.g. returns of equity prices, interest rates or foreign exchange rates. The distribution of these risk factor returns for the next day is assumed to follow the empirical multivariate distribution of these daily risk factor returns over a window of the past 1305 trading days. The VaR of a portfolio consisting of several assets is then given by a quantile of the resulting portfolio PnL distribution. Missing values (singlets or short runs) are common in historical data of risk factors due to foreign holidays or improper data collection, and for some risk factors only a limited data history exists. The calculation of the portfolio PnL's usually involves many risk factors. If just one of these risk factors is unobserved at a specific day in the window, the corresponding portfolio PnL can not be evaluated. To address the problem, we use a Monte Carlo method to impute ("backfill") the missing risk factor returns. First, a statistical model featuring time-varying volatility and correlation across assets is fitted. Second, the missing values are simulated conditional on the observed values and based on the estimated model. As a result, the imputed values are consistent with the data history. Further, an adapted version of the method allows to detect outliers in the data. In the talk, we will discuss the main features of the method and show some applications which of course are not limited to VaR calculation due to the generic nature of the imputation / detection problem. ZüKoSt Zürcher Kolloquium über Statistik The Backfiller: simulation based imputation in multivariate financial time series	HG G 19.1
Fri 21.03.2014 15:15-16:00	Eric Gautier ENSAE-CREST, Paris	Abstract In this talk we present a one-stage method to compute joint confidence sets for the coefficients of a high-dimensional regression with random design under sparsity. The confidence sets have finite sample validity and are robust to non-Gaussian errors of unknown variance and heteroscedastic errors. Nonzero coefficients can be arbitrarily close to zero. This extends previous work with Alexandre Tsybakov where we rely on a conic program to obtain joint confidence sets and estimation for this pivotal linear programming procedure. The method we present only relies on linear programming which is important for dealing with high-dimensional models. We will explain how this method extends to linear models with regressors that are correlated with the error term (called endogenous regressors) as is often the case in econometrics. The procedure relies on the use of so-called instrumental variables. The method is then robust to identification and weak instruments. Research Seminar in Statistics Uniform confidence sets for high dimensional regression and instrumental regression via linear programming	HG G 19.1
Thr 27.03.2014 16:15-17:00	Kaspar Rufibach Roche Biostatistics Oncology, Basel	Abstract Bayesian predictive power is the expectation of the probability to meet the primary endpoint of a clinical trial, or any statistical test, at the final analysis. Expectation is computed with respect to a distribution over the true underlying effect and Bayesian predictive power is a way of quantifying the success probability for the trialsponsor while the trial is still running. The existing framework typically assumes that once the trial is not stopped at an interim analysis, Bayesian predictive power is updated with the resulting interim estimate. However, in blinded Phase III trials, typically an independent committee looks at the data and no effect estimate is revealed to the sponsor after passing the interim analysis. Instead, the sponsor only knows that the effect estimate was between predefined futility and efficacy boundaries. We show how Bayesian predictive power can be updated based on such knowledge only and illustrate potential pitfalls of the concept. This is joint work with Markus Abt und Paul Jordan, both Roche Biostatistics, Basel. ZüKoSt Zürcher Kolloquium über Statistik Update Bayesian predictive power after a blinded interim analysis	HG G 19.1
Fri 04.04.2014 15:15-16:00	Alessio Sancetta Royal Holloway, University of London	Abstract Many quantities of interest in economics and finance can be represented as partially observed functional data. Examples include structural business cycles estimation, implied volatility smile, the yield curve. Having embedded these quantities into continuous random curves, estimation of the covariance function is needed to extract factors, perform dimensionality reduction, and conduct inference on the factor scores. A series expansion for the covariance function is considered. Under summability restrictions on the absolute values of the coefficients in the series expansion, an estimation procedure that is resilient to overfitting is proposed. Under certain conditions, the rate of consistency for the resulting estimator is nearly the parametric rate when the observations are weakly dependent. When the domain of the functional data is K(> 1) dimensional, the absolute summability restriction of the coefficients avoids the so called curse of dimensionality. As an application, a Box-Pierce statistic to test independence of partially observed functional data is derived. Simulation results and an empirical investigation of the efficiency of the Eurodollar futures contracts on the Chicago Mercantile Exchange are included. Research Seminar in Statistics A Nonparametric Estimator for the Covariance Function of Functional Data	HG G 19.1
Fri 09.05.2014 15:15-16:00	Iain Currie Heriot-Watt University, Edinburgh	Abstract A Generalized Linear Array Model (GLAM) is a generalized linear model where the data lie on an array and the model matrix can be expressed as a Kronecker product. GLAM is conceptually attractive since its high-speed, low-footprint algorithms exploit the structure of both the data and the model. GLAMs have been applied in mortality studies, density estimation, spatial-temporal smoothing,variety trials, etc. In this talk we (1) describe the GLAM ideas and algorithms in the setting of the original motivating example, two-dimensional smooth model of mortality, (2) give an extended discussion of a recent application to the Lee-Carter model, an important model in the forecasting of mortality. References Currie, I. D., Durban, M. and Eilers, P. H. C. (2006) Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B, 68, 259-280. DOI reference: http://doi:10.1111/j.1467-9868.2006.00543.x Currie, I. D. (2013) Smoothing constrained generalized linear models with an application to the Lee-Carter model. Statistical Modelling, 13,69-93. DOI reference: http://doi:10.1177/1471082X12471373 Research Seminar in Statistics GLAM: Generalized Linear Array Models	HG G 19.1
Thr 15.05.2014 16:15-17:00	Andreas Krause Department of Computer Science, ETH Zürich	Abstract In many applications, ranging from autonomous experimental design to environmental monitoring to system tuning, we wish to gather information about some unknown function. Often, acquiring samples is noisy and expensive. In this talk, I will discuss how Bayesian confidence bounds can play a natural role in focusing exploration: Reducing uncertainty in a structured way to reliably estimate properties of interest such as extremal values, location of critical regions, Pareto-frontiers etc. First, I will show how a simple confidence-guided sampling rule attains near-minimal regret for bandit problems involving objectives modeled via Gaussian process priors or having low RKHS norm. I will further demonstrate how the approach allows to scale up through parallelization, effectively localize level-sets, and address multi-objective tradeoffs. I will illustrate the approach in several real-world applications. Applied to experimental design for protein structure optimization, our approach enabled engineering of active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution. ZüKoSt Zürcher Kolloquium über Statistik Learning to Optimize with Confidence	HG G 19.1
Fri 16.05.2014 15:15-16:00	Mohammad Sadeh Max Planck Institute for Molecular Genetics	Abstract Our current understanding of virtually all cellular signaling pathways is almost certainly incomplete. We miss important but sofar unknown players in the pathways. Moreover, we only have a partial account of the molecular interactions and modications of the known players. When analyzing the cell, we look through narrow windows leaving potentially important events in blind spots. Much network reconstruction methods are based on investigating unknown relations of known players assuming there are not any unknown players. This might severely bias both the computational and manual reconstruction of underlying biological networks. Here we ask the question, which features of a network can be confounded by incomplete observations and which cannot. In the context of nested eect model based network reconstruction, we show that in the presence of missing observations or hidden factors with their unknown eects (unknown-unknowns), a reliable reconstruction of the full network is not feasible. Nevertheless, we can show that certain characteristics of signaling networks like the existence of cross talk between certain branches of the network can be inferred in a not-confoundable way. We derive a simple polynomial test for inferring such not-confoundable characteristics of signaling networks. We also define a set of edges to partially reconstruct the signaling networks when the unknown players exist. Finally, we evaluate the performance of the proposed method on simulated data and two biological studies, a first application to embryonic stem cell differentiation in mice and a recent study on the Wnt signaling pathway in colorectal cancer cells. We demonstrate that taking unknown hidden mechanisms into account changes our account of real biological networks. References [1] Sadeh, M. J., Moa, G. and Spang, R. (2013). Considering Unknown Unknowns - Reconstruction of Non-confoundable Causal Relations in Biological Networks. In RE- COMB 234248. [2] Anchang B, Sadeh M, Jacob J, Tresch A, Vlad M, et al. (2009) Modeling the temporal interplay of molecular signaling and gene expression by using dynamic nested eects models. Proceedings of the National Academy of Sciences 106: 6447. [3] Tresch A, Markowetz F (2008) Structure learning in nested eects models. Stat Appl Genet Mol Biol 7. [4] Markowetz, F and Bloch, J and Spang, R (2005) Non-transcriptional pathway features reconstructed from secondary eects of RNA interference Bioinformatics 21:4026-32. [5] Markowetz, F and Kostka, D and Troyanskaya, O G and Spang, R (2007) Nested eects models for high-dimensional phenotyping screens Bioinformatics 13:i305-12. 1Max Planck Institute for Molecular Genetics. Ihnestrae 63-73, D-14195 Berlin, (Germany). Research Seminar in Statistics Considering Unknown Unknowns - Reconstruction of Non-confoundable Causal Relations in Biological Networks	HG G 19.1
Fri 23.05.2014 15:15-16:00	Gassiat Elisabeth Université Paris-Sud	Abstract In this talk I will present recent results about non parametric identifiability of hidden Markov models, and some consequences in non parametric estimation. References: E. Gassiat, A. Cleynen, S. Robin Finite state space non parametric hidden Markov models are in general identifiable arxiv preprint, 2013. E. Gassiat, J. Rousseau Non parametric finite translation mixtures with dependent regime Bernoulli, à paraitre. E.Vernet Posterior consistency for nonparametric Hidden Markov Models with finite state space T. Dumont and S. Le Corff Nonparametric regression on hidden phi-mixing variables: identifiability and consistency of a pseudo-likelihood based estimation procedure Research Seminar in Statistics Non parametric hidden Markov models.	HG G 19.1
Fri 23.05.2014 16:30-17:15	Jane L. Hutton University of Warwick, UK	Abstract Chain event graphs (CEGs) extend graphical models to address situations in which, after one variable takes a particular value, possible values of future variables differ from those following alternative values (Thwaites et al 2010). These graphs are a useful framework for modelling discrete processes which exhibit strong asymmetric dependence structures, and are derived from probability trees by merging the vertices in the trees together whose associated conditional probabilities are the same. We exploit this framework to develop new classes of models where missingness is influential and data are unlikely to be missing at random (Barclay et al 2014). Context-specific symmetries are captured by the CEG. As models can be scored efficiently and in closed form, standard Bayesian selection methods can be used to search over a range of models. The selected maximum a posteriori model can be easily read back to the client in a graphically transparent way. The efficacy of our methods are illustrated using survival of people with cerebral palsy, and a longitudinal study from birth to age 25 of children in New Zealand, analysing their hospital admissions aged 18-25 years with respect to family functioning, education, and substance abuse aged 16-18 years. P Thwaites, JQ Smith, and E Riccomagno (2010) "Causal Analysis with Chain Event Graphs" Artificial Intelligence, 174, 889-909. LM Barclay, JL Hutton and JQ Smith, (2014) "Chain Event Graphs for Informed Missingness", Bayesian Analysis, Vol. 9, 53-76. Research Seminar in Statistics Chain Event Graphs for Informative Missingness	HG G 19.1

Archive: SS 24 AS 23 SS 23 AS 22 SS 22 AS 21 SS 21 AS 20 SS 20 AS 19 SS 19 AS 18 SS 18 AS 17 SS 17 AS 16 SS 16 AS 15 SS 15 AS 14 SS 14 AS 13 SS 13 AS 12 SS 12 AS 11 SS 11 AS 10 SS 10 AS 09