Report on the EPSRC Network on Evolvability in Biology and Software Systems Meeting

Gatherer, D. (2002). Report on the EPSRC Network on Evolvability in Biology and Software Systems Meeting.
Journal of Memetics - Evolutionary Models of Information Transmission,6.
http://cfpm.org/jom-emit/2002/vol6/gatherer_d_report.html

Software Evolution (SE) is a discipline that attempts to apply evolutionary analogies to the study of historical trends in software, thereby learning lessons for the future development of better software. Evolutionary Computation (EC) is a field in which computer programs, or components thereof, are actively evolved, through rounds of mutation and selection, towards the solution of a problem. EC is thus a kind of selective breeding of software, whereas SE is the ecological study of software `in the wild'. This ESPRC Symposium brought together experts in both areas for two days of fruitful exchange of ideas. Here we briefly review a selection of the 21 talks and posters presented. All abstracts are available on http://homepages.feis.herts.ac.uk/~nehaniv/EN/seec.html.

Manny Lehman (Imperial College: http://www.doc.ic.ac.uk/~mml) began his study of SE as a result of the commission of a report by IBM in the 1960s, and in the ensuing 3 and a half decades has brought together a set of general principles known as `The Laws of Software Evolution', supported by a variety of empirical, formal and sociological studies. Lehman and his collaborator Juan Ramil (Open University: http://mcs.open.ac.uk/jfr46) described the development of The Laws, the metrics used in their definition, and their recommendations for future `good practice' in writing software with `high evolvability'. The notion of evolvability had previously been explored by the symposium chair, Chrystopher Nehaniv (Univ. of Hertfordshire), touching upon issues such as robustness, redundancy, phenotypic plasticity and reusability, and noting that all of these features are found to varying extents in the way that biological systems encode information in DNA.

Symposium co-organizer Paul Wernick (Univ. of Hertforshire) provided a detailed critique and comparison of the Darwinian and Lamarckian metaphors for SE. Biological systems are usually highly variable at the genetic level, and consequently considerably variable at the phenotypic level. Natural selection acts by the differential retention or disposal of the variability. A piece of software, however, is generally phenotypically homogeneous. There are not, for instance, multiple variations of the most popular word processing program, of which only the most successful will survive. Variation is generally confined to the design phase and even then is usually deliberately introduced in response to some perceived problem with the software, or in order to add a further feature to the next release. The `purposeful' nature of introduced variation suggests to Wernick a more Lamarckian model than the random variation that would be required for a truly neo-Darwinian perspective to be adopted. The identification of a `bug', its elimination by recoding and the subsequent appearance of the repaired code in all further releases of the software, is almost analogous to the `inheritance of acquired characteristics', in that the bug-fixing `patch' is directly produced by an encounter with the environment (users) and is then inherited (appears in the next release). While maintaining reservations about both Darwinian and Lamarckian models, Wernick supports the use of the term `evolution' to describe how software changes. One might conclude that it is `evolution, Jim, but not as we know it'. This was paralleled by Lehman's statement that he was essentially uninterested in exact parallels between biological and software evolution. `Software evolution' may be taken as a kind of evolutionary mechanism in its own right. To illustrate this, imagine an organism (e.g. Tyrannosoftus wordprocessorus) with a genotype (code) and a phenotype (functions), living, but not directly reproducing, in a tough environment where great demands are made on the phenotype (by a completely different type of organism, Userus badtemperus). T. wordprocessorus individuals have identical genotypes. Whenever a phenotype is consistently sub-standard, the genotype of T. wordprocessorus is altered in a direct attempt to correct the phenotypic problem. At a single swoop, all individuals of T. wordprocessorus now have the new genotype and new phenotype, the old genotype and phenotype disappear, and selection recommences. There is no intra-population variability, and no reproduction of individuals, merely a periodic refashioning of the entire species in response to the environment. Now is this Darwinian or Lamarckian or neither? Now consider the situation for a rather different kind of software organism, e.g. Viroscriptus malevolentus. Does that have the same kind of evolution as T. wordprocessorus?

The field of SE may perhaps be regarded as a sub-discipline of cultural evolution in general and, whatever the exact mechanism(s) of evolution in culture, it is still possible to derive clear indications of that most classically Darwinian of features `homology by descent'. This was illustrated by Andrew Lord and If Price (Sheffield Hallam Univ. - see JoM passimhttp://cfpm.org/jom-emit/2001/vol5/lord_a&price_i.html), in their reconstruction of the history of post-Reformation Christianity from a data vector compiled for each church from doctrines and practices. The trees produced from these vectors by agglomerative clustering are sensitive to the weightings of each variable. By repeated reweighting, recalculation of the tree and comparison with the historical record, Lord and Price find that the presence of episcopal structure in a religion is approximately 8 times more important than any other factor in maintaining the integrity of the other parts of the vector, an elegant demonstration from first principles of the restraining force of sociological structure on the rapidity of change of doctrine and practices.

Symposium co-organizer Martin Loomes (Univ. of Hertfordshire) provided a persuasive critique of the `software development lifecycle model', arguing that what was informally devised in the late 1960s as a rough guide to aid large scale software development projects became, within very few years, reified into a rigid methodology taught with almost religious intensity to subsequent generations of software engineering undergraduates. Loomes argues that this has resulted in a `technocentricity' which subordinates the real needs of users to the satisfaction of abstract notions of good design. He suggests that software itself does not evolve but rather theoretical representations of systems. The actual physically instantiated software is only produced at critical points where human agents decide that the current theoretical representation is worth constructing.

Ever since people began to write computer code, it has been known that computer languages are syntactically fragile. Even small alterations (and perhaps most small alterations) can cause the malfunction or collapse of the system. Another side of the SE effort is the attempt to learn lessons from biological systems that can be incorporated in software design, thus reducing this fragility. Robert Laddaga (MIT AI Lab: http://www.ai.mit.edu/projects/dynlangs/People/bob.htm) described how `self-adaptive' software allows the introduction of other elements from the biological metaphor, such as self-organization of a complex structure from a simple set of instructions. Meurig Beynon (Univ. of Warwick: http://www.dcs.warwick.ac.uk/modelling) described the Empirical Modelling (EM) approach, going beyond the now ubiquitous object-orientation, in allowing software entities to redefine their relationship with the environment (their `semantic relation') while they are running. Christopher Landauer (The Aerospace Corp.) described the use of `wrappings', i.e. machine-interpretable descriptions of resources and how they can be applied to problems. All aspects of the computer system are considered resources, including the wrappings themselves - thus introducing elements of self-knowledge into the system. Neil McBride (de Montfort Univ.) described the concept of `homoeotic' programming, by analogy with the homoeotic genetic control systems of animal developmental systems. Homoeotic programs are programs that control the activity of batteries of other programs in response to some environmental cue, either externally or within the system itself. As in biological systems, there are multiple overlaps in the sets of programs called from within each homoeotic program. The basic programs may be legacy code or other basic utilities. Homoeosis thus suggests a mechanism for designing flexible, sensitive systems without the necessity to recode all aspects from scratch.

William Langdon (University College London: http://www.cs.ucl.ac.uk/staff/W.Langdon/) described Genetic Programming (GP). Whereas Genetic Algorithms (GAs) are represented as linear strings or arrays containing parameter information for program input, a Genetic Program is represented as a tree structure, which itself can evolve. Therefore while GAs evolve solutions to combinatorially complex problems within a standard programming framework, GPs evolve the programming framework itself. Limiting the depth of the trees in GP limits the possible length of programs and can crucially change the search landscape. Further details can be found in Langdon's new book "Foundations of Genetic Programming" (written in collaboration with Ricardo Poli - http://www.amazon.co.uk/exec/obidos/ASIN/3540424512/202-3156583-8246220).

Peter Bentley (University College London: http://www.cs.ucl.ac.uk/staff/p.bentley/) in a plenary lecture described numerous areas in which he is seeking to apply computer techniques, especially those involving GAs, to biological problems. A major example is "PlantWorld", a digital environment developed by him and some of his students. PlantWorld is a two dimensional landscape where plants germinate, grow, reproduce and die. It is hoped it will be helpful in the investigation of ecological theories by modelling the population dynamics of evolving plants in a digital environment and to test the veracity of predictions about population dynamics that arise from numerical models.

Symposium co-organizer Paul Marrow (British Telecom Intelligent Systems Laboratory: http://www.labs.bt.com/people/marrowp/index.htm) described the lack of fault tolerance in current systems, both in computer architecture and in computer software. Living organisms are adaptable and flexible to changing environments. He expects to be able to apply the lessons from natural systems into computers. In this way, the system "Flyphones" was developed using an analogy from nature to produce a channel allocation method for a mobile phone network. More specifically, its solution is based on the way the pattern of bristles in a fruitfly is set during development by interactions among its cells. By comparing fruitflies cells to base stations and bristles to channels, the Flyphones system provides a rapid solution not available using conventional algorithms.

Jon Bird (University of Sussex) and Paul Layzell (Hewlett-Packard Laboratories) talked about hardware evolution. A human designer is constrained by the abstractions, modelling and methodologies used in conventional design. Using artificial evolution to control a configurable device and measuring the circuit's performance with evolutionary fitness evaluations can produce electronic circuits. Evolved circuits can be produced entirely in simulation, but the Bird and Layzell's novel slant on this process is to use evolution to configure real, physical circuits, thereby allowing the exploitation of all the physical properties inherent in the silicon medium. The hardware evolution experiment they described in their talk resulted in an "Evolved Radio Receiver", a network of transistors sensing and utilising the radio waves emanating from nearby PCs. Such an experiment would be practically impossible to implement in simulation, as one normally would not think about adding to the simulation the nearby PCs in the laboratory.

Julian Miller (University of Birmingham: http://www.cs.bham.ac.uk/~jfm/) talked about the development of a new kind of GP called Cartesian Genetic Programming. In standard GP there is no clear difference between the phenotype and the genotypes, as the genotypes are the programs and the phenotype the result of executing the programs. Cartesian Genetic Programming uses a representation where the genotypes are represented as cells and have a fixed length, while allowing phenotypes of variable length. This representation manages to avoid one of the common problems with standard GP, i.e., bloating of programs. Miller has further developed a new form of Cartesian Genetic Programming where an initial single cell program is allowed to differentiate and divide to evolve into an organism of multiple cells. This representation seemed more advantageous as it allows more evolvability. The implications of this are important for the question of what makes a good genotype-phenotype mapping for the evolution of computer programs.

What was clear from 2 days of fascinating exposition and discussion, was that the boundaries between the various disciplines involved, are being readily crossed in all directions. Anybody who understands evolution and who can work a computer, can make a contribution to this open and burgeoning field.

Meeting Report: Engineering and Physical Sciences Research Council Network on Evolvability in Biology and Software Systems Symposium on Software Evolution and Evolutionary Computation, University of Hertfordshire, UK 7th - 8th February 2002