Activities @ NCRA
Activities @ NCRA
NCRA Seminar Series
Marc Schoenauer (INRIA Saclay - Ile-de-France)
Abstract: Support Vector Machines are today widely used in Machine Learning. Designed originally to handle classification problems, their success relies on 2 main ingredients: linear classification, that leads to some simple quadratic constrained optimization problems, that can be solved using standard powerful algorithms; the "kernel trick", that allows the programmer to solve the quadratic problem in a very high dimensional feature space while computing only in the low-dimension original space. Furthermore, beyond classification, several other Machine Learning problems can be addressed by using other constraints on the resulting quadratic problem, from continuous to ordinal regression to one-class identification. Surrogate models are mandatory for any optimization method when tackling expensive objective functions - the more for Evolutionary Algorithms, that are well-known for their appetite in terms of CPU cycles. Regression-SVM are one of the many methods that have been used to build approximations of the objective function in the evolutionary framework. However, the flexibility of SVM to address other ML paradigms mentioned above can lead to original surrogate models. Along these lines, in the framework of continuous evolutionary optimization, this talk will present two Pareto-oriented surrogate models for multi-objective optimizers, and a comparison-based surrogate for (comparison-based) CMA-ES in the context of single-objective optimization.
Prof Dietmar Maringer (University of Basel)
Abstract:
Prof Chris Stephens (UNAM)
Abstract:
Prof Philip Hamill (University of Ulster)
Abstract:
Miguel Nicolau (UCD)
Abstract:
Prof Barry McMullin (Dublin City University)
Abstract: In this talk I will present an idealised model of an RNA-world "protocell", and some of the evolutionary phenomena this gives rise to. In this model a protocell is simply a vesicle-like container carrying a well stirred mixture of "polymers" which can function as abstract analogues of certain (hypothetical) kinds of ribozymes - i.e., they function both as informational templates to be replicated, and as replicase enzymes to catalyse such replication. Replicases vary in the classes of templates which they can replicate (i.e., in the template motifs which they recognise and bind to). Some molecular species function as self-replicases. As the molecular population increases through replication, the vesicle grows until it reaches a critical size at which it fissions. The result is a system in which there are two distinct but coupled selectional processes: one at the molecular level, one at the cellular level. I will illustrate some of the (perhaps unexpected?) consequences of this architecture.
Colin Johnson (University of Kent, Canterbury, England)
Abstract: This talk will consist of an overview of a number of projects that we have carried out in recent years. The general topic is the interaction between theoretical issues in programming languages and genetic programming. The talk will split into two sections. In the first section I will discuss the idea of analysing programs to derive fitness properties. I will give two case studies: the first is the use of multimodal optimisation to evolve a robot control task with a safety constraint that must be satisfied by the final evolved program; the second is the application of model checking as a means of constructing analysis-based fitness functions that apply across the input space. The second section will look at the application of program semantics to initialisation and operators in GP. I will start with a discussion of the idea of a "semantic wrapper" as a general principle for introducing semantics to aspects of GP, and then present some examples from initialisation, crossover and mutation.
Ronald Hochreiter (University of Vienna)
Abstract: Multi-stage decision optimization under uncertainty can be useful to obtain quantitative decision support solutions for a wide range of application areas. However, both the modeling process of the underlying real-world problem as well as the solution of the resulting stochastic optimization problems can be tedious and complicated, especially if non-textbook applications are considered. An overview of various issues which need to be considered to obtain useful results will be given, and a set of solution approaches for each step from pre-processing data to analyzing solutions will be presented.
Prof Wolfgang Banzhaf (Memorial University of Newfoundland)
Abstract: In the past 50 years Computer Scientists and Engineers have been very successful in gleaning recipes from Natural Evolution. With the advent of Genome Biology, however, there have been many new discoveries in Biology not yet reflected in computational models of evolution. This talk discusses some of the more recent developments in Biology, and how they might possibly impact our thinking about algorithms.
Jorge Tavares (INRIA Lille)
Abstract: A fundamental issue in the development of evolutionary algorithms is the representation of a candidate solution. The success of this type of approaches relies on choosing a genetic representation that allows the algorithm to be efficient and accurate. Additionally, it is important to understand its role and the interplay with heuristics. Does a particular representation need auxiliary mechanisms? Which of these complementary procedures are influential? Within combinatorial optimization problems, this fact becomes even more relevant since they possess a strong practical application in many diverse fields. Understanding the effects of a representation when solving a problem is important so that new and more efficient encodings can be developed. The ultimate case will be when the encoding designer is the evolutionary algorithm itself. Preliminary work in this direction has been made bringing self-organization and evolution to a common problem solving framework. This is the natural next step since finding a good representation for an optimization problem is a problem by itself. Even when studying the known representations for a given problem, it may not be enough to ensure that the algorithm will encode the candidate solutions in the best way.
Leonardo Vanneschi (University of Milano-Bicocca)
Abstract: Genetic Programming (GP) is an evolutionary approach which extends the genetic model of learning to the space of programs. It is a major variation of Genetic Algorithms in which the evolving individuals are themselves computer programs instead of fixed length strings from a limited alphabet of symbols. In the last few years, GP has become more and more popular for biomedical applications. In particular, GP has been recently used to mine large datasets with the goal of automatically generating the underlying (hidden) functional relationship between data and correlate the behavior of latent features with some interesting parameters bound to drug activity patterns. In this seminar, after a brief introduction to GP and an informal discussion of how and why GP works as an optimization method,an application of GP in drug discovery and development will be discussed in which GP has been used for quantitative prediction of drug induced toxicity and bioavailability.
Prof. Per Mykland (Dept of Statistics, University of Illinois at Chicago)
Abstract: The econometric literature of high frequency data usually relies on momentestimators which are derived from assuming local constancy of volatilityand related quantities. We here show that this first order approximation is not always valid if used naively. We find that such approximations require an ex post adjustment involving asymptotic likelihood ratios. These are given. Several examples (powers of volatility, leverage effect,ANOVA) are provided. The first order approximations in this study can be over the period of one observation, or over blocks of successive observations. The theory relies heavily on the interplay between stableconvergence and measure change, and on asymptotic expansions for martingales. Practically, the procedure permits (1) the definition of estimators of hard to reach quantities, such as the leverage effect, of volatility, (2) the improvement in efficiency in classical estimators, and (3) easy analysis.
Prof. Lan Zhang (Dept of Finance, University of Illinois at Chicago)
Abstract: The availability of high frequency data for financial instruments has opened the possibility of accurately determining volatility in small time periods,such as one day. Recent work on such estimation indicates that it is necessary to analyze the data with a hidden semimartingale model, typically by incorporating microstructure noise. We develop the methodology for analyzing such data, including two- and multiscale sampling. We also show how to make these estimators robust to dependent noise.
Nic McPhee (University of Minnesota, Morris)
Abstract: Subtree crossover is one of the oldest and remains one of the most widely used recombination operators in genetic programming (GP). It is still unclear, however, why or how it works. It's hardly obvious that yanking a random chunk of code from one program, and plopping it unceremoniously in a random location in a second program would be a good thing. Yet it clearly works (at some level) in GP. But why? How (and when) does subtree crossover move the population closer to the solution?
I will present a new mechanism for studying the impact of subtree crossover in terms of semantic building blocks. This approach allows us to completely and compactly describe the semantic action of crossover, and provide insight into what does (or doesn't) make crossover effective. Our results make it clear that a very high proportion of crossover events (typically over 75\% in our experiments) are guaranteed to perform no immediately useful search in the semantic space...
Youwei Li (Queens University Belfast)
Abstract:
Erik Hemberg (NCRA)
Abstract: Xgrid allows a group of network computers to avail of distributed computing by acting as a job scheduler. An introduction to NCRA's Xgrid is presented and advice is provided on how best to utilise this resource for our needs.
Prof. Bob McKay (Seoul National University)
Abstract:
Prof Doyne Farmer (Santa Fe Institute)
Abstract: Prof. Farmer is a senior member of the Santa Fe Institute and previously worked at the Los Alamos National Laboratory. He has a PhD in Physics from the Univerisity of California (Santa Cruz). While at Santa Cruz, he was a founding member of the 'Dynamical Systems Collective', an informal grouping, which played an important role in the development of Chaos Theory. After finishing his PhD, Doyne took up a post-doctoral appointmment at the centre for non-linear studies at the Los Alamos National Laboratory and in 1988 became leader of the Complex Systems Group there.
Martin Hemberg (Imperial College London)
Abstract: We present an algorithmic growth process that is an extension of Lindenmayer's Map L-systems. This growth process relies upon a set of rewrite rules, a map axiom and a geometric interpreter which is integrated with a 3D simulated environment. The outcome of the growth process is a digital surface in 3D space which has grown within and in response to its environment. The growth procedure has been can be derived from a context-free grammar and we have developed an evolutionary algorithm based on Grammatical Evolution (GE) which is able to generate sets of rewrite rules. Using a quantitative multi-objective fitness function that evaluates a variety of surface properties, the integrated system (evolutionary algorithm and growth process) can explore and generate diverse and interesting surfaces with a resemblance of organic form. The algorithms have been implemented to create a design tool for architects called Genr8.
Erik Hemerg (NCRA)
Abstract: The ability of Genetic Programming to scale to problems of increasing difficulty operates on the premise that it is possible to capture regularities that exist in a problem environment by decomposition of the problem into a hierarchy of modules. As computer scientists and more generally as humans we tend to adopt a similar divide-and-conquer strategy in our problem solving. In this paper we consider the adoption of such a strategy for Genetic Algorithms. By adopting a modular representation in a Genetic Algorithm we can make efficiency gains that enable superior scaling characteristics to problems of increasing size. We present a comparison of two modular Genetic Algorithms, one of which is a Grammatical Genetic Programming algorithm, the meta-Grammar Genetic Algorithm (mGGA), which generates binary string sentences instead of traditional GP trees. A number of problems instances are tackled which extend the Checkerboard problem by introducing different kinds of regularity and noise.
Miguel Nicolau (INRIA)
Abstract: A recently introduced artificial gene regulatory network is presented and analysed. Created through a process of whole genome duplication and divergence, it presents remarkable similarities to natural transcriptional regulatory networks. Its dynamic regulatory nature has the potential for use on optimisation problems, and some potential applications are presented.
Kai Fan (NCRA)
Abstract:
Carlos Cotta (Departamento de Lenguajes y Ciencias de la Computacion, University of Malaga)
Abstract: Memetic algorithms are population-based metaheuristics aimed to solving hard optimization problems. These techniques are explicitly concerned with exploiting available knowledge in order to achieve the most effective resolution of the target problem. The rationale behind these optimization techniques, and their main features will be presented. A glimpse of current research issues and future challenges will be provided as well.
Dietmar Maringer (Centre of Computational Finance and Economic Agents (CCFEA), University of Essex)
Abstract: Though heavily criticized for its shortcomings, Value at risk (VaR) has become a standard measure of portfolio risk over the last decade. It even became one of the corner stones in the Basel II accord about banks' equity requirements. Nevertheless, the practical application of the VaR concept suffers from two problems: how to estimate VaR and how to optimise a portfolio for a given level of VaR? For the first problem, several approaches have been suggested including the historical simulation method. The second problem, however, might become an utmost challenge: even under the simple assumption of normally distributed returns, constraints faced in real life such as non-divisibility of the assets and disallowed short-selling will make the optimisation problem too complex for standard optimisation techniques. Empirical distributions are often claimed to be superior to parametric distributions, yet to also increase the computational complexity and are therefore hard to apply in portfolio optimisation. However, if non-parametric distributions are used to describe the assets' returns, the optimisation problem becomes virtually inapproachable to traditional numerical methods. In this paper, a memetic algorithm was implemented where a population based search strategy is combined with local neighbourhood search elements. In a preliminary study, several variants for the local search part and the usefulness of elitist strategies were tested on different kinds of constraints. We found that different types of constraints seem to favour different variants of the heuristic. Next, stock portfolios (Standard and Poor's 100 stocks) and to bond portfolios (Swiss bonds) are then optimized. Our results indicate that empirical distributions might turn into a Pandora's Box: Though highly reliable for predicting the assets' risks, employing these distributions in the optimisation process might result in severe mis-estimations of the resulting portfolios' actual risk. It is found that even a simple mean-variance approach can be superior despite its known specification errors. In other words, we find that a solution to the two aforementioned problems gives rise to a third one: the actual VaR of portfolios optimised under a VaR constraint might exceed its nominal level to a large extent.
William Langdon (Computer Science, University of Essex)
Abstract: Experiments with linear genetic programming (GP) system, including memory, IF and loops, suggests the fraction of terminating programs is vanishingly small. However their number grows exponentially with their length. The halting probability in von Neumann architectures scales as the square root of the program's length. Suggesting you can test for almost all infinite loops on GHz machines in one millisecond.
Robin Matthews (Kingston University Business School London)
Abstract: