VSNS BioComputing Division

BioMOO Electronic Conference transcript, HTML Edition

July 27th, 1996, 16:00 h GMT


Genetic Algorithms and Protein Structure Analysis


Special guest and invited lecturer:

Dr. Steffen Schulze-Kremer
Max-Planck-Institute for Molecular Genetics
Dept. Lehrach / Biocomputing
Berlin, Germany


A list of participants can be found below. There are also links to other VSNS-BCD Guest Lectures at BioMOO.

GeorgF says, "Welcome to a special session with Dr Steffen Schulze-Kremer, author of Chapter 5, "Genetic Algorithms and Protein Structure Analysis". I'm very happy that Steffen has agreed to discuss his chapter with you, answer your questions, etc. Thanks a lot !"

SteffenSK says, "Hello to everybody from Berlin, Germany!"

GeorgF [to SteffenSK]: For this meeting, I expect more of a free-flow discussion rather than a lecture. Maybe, as an intro statement, can you tell us how you first encountered Genetic Algorithms? Did you ``invent them yourself'', as some people do or did you read about them?"

SteffenSK says, "Well, that was actually some eight years ago, when I met John Holland on one of our AI seminars of then Brainware GmbH. At that time I was working on secondary structure prediction when John Holland introduced me to the concept. So I thought protein structure prediction would make a nice example of applying a concept to the objects it was originally derived from in the beginning."

John Towell grins

SteffenSK says, "I.e. Genes -> GA -> Proteins, or maybe better: Proteins <- Genes -> GA."

GeorgF ahs. See Fig.1 in Steffen's text in chapter 5 of the hypertext course book.

SteffenSK says, "Now, many people have caught on and there are many papers out there using GAs for protein structure prediction. Has anybody in the audience him/herself experimented in this field? Was anybody using GAs for protein related stuff?

Hersh shakes his head no.

GeorgF used GAs to evolve Pseudo-Random Number Generators, in 1992.

AndreaSch shakes her head.

EdgarA for sure hasn't.

GeorgF says, "Seems to me GAs are still not mainstream for biologists. Which things changed since 1995, when you wrote the course-text?"

SteffenSK says, "Ok, a number of things have emerged since the early research work: 1) GAs are NO function optimizers per se but can be used like them under certain circumstances. 2) The protein structure prediction problem is hard ... actually harder than many enthusiasts have anticipated. The consequences, in brief, are that people actually trying to think about ways around the protein structure prediction problem instead of solving it. GAs have - and I am sorry to say - helped to undermine that point but did NOT revolutionize structure prediction. That is probably the reason why they did not (yet) catch on the biosciences. However, there are some things that can naturally be done with GAs and in those cases I believe they come in useful. One just should not make the mistake of trying to squeeze an arbitrary application into the GA paradigm."

Hersh says, "Do you think that GAs have any particular advantages over other approaches to problems with complex solution spaces, such as neural nets or simulated annealing? On which problems are GAs most likely to succeed? Can you elaborate on that?"

SteffenSK says, "It was made plain clear by Kenneth de Jong on the PPSN 93 conference that assuming that GAs are function optimizers can be quite disastrous. - So what are they good for? Any application that can be decomposited into ''genes'' with natural building blocks I think is worth an attempt using GAs. Actually, that was what led me to protein structure prediction: It seemed that secondary structures could actually take the role of Holland's building blocks and be easily shuffled around to assemble a fine tertiary structure. Unfortunately, I myself did never observe secondary structures appearing as building blocks or schemata ... :-("

Hersh says, "I have previously seen some attempts to formulate problems in this way in what seemed to me to be quite unnatural ways. What criteria can one apply to determine if a problem fits well into this paradigm?"

EdgarA says, "Holland's building blocks?"

GeorgF says, "Natural building blocks... Can you expand a little?"

SteffenSK says, "Concerning the ''unnatural ways'' of designing building blocks: this problem is closely related to application specific operators, i.e. the genetic operators. You have to find a way where shuffling and exchanging parts of the chromosomes captures 'nearly independent decomposable subparts' which is clearly but unfortunately not true for proteins because proteins are folding in a cooperative manner."

John Towell nods and also mentions chaperones.

GeorgF [to SteffenSK]: "Some people seem to distinguish schemata and building blocks, no?"

SteffenSK says, "EdgarA asked about building blocks: ideally they correspond to the schemata in a gene as defined by John Holland. Schemata are (sub-)patterns in a gene, building blocks are fragments of an actual gene."

GeorgF [to SteffenSK]: "Isn't there also the distinction schemata are syntax, building blocks are a semantic notion ?"

SteffenSK [to GeorgF]: "Yes, schemata are GA constructs whereas building blocks refer to the application model. - A few months ago, ''it was like being hit by a ton of bricks'' when I heard some recognized, established protein researcher announcing that he would move away from protein structure prediction to genome analysis. This seems to me one of a number of signs that the focus in molecular biology is shifting from protein structure analysis to comparative genome analysis. However, as a 'scientific' subject of research GAs made their way into protein structure analysis. Even the most important journal in molecular biology (Journal of Molecular Biology, Academic Press) had a few papers about it. But all concluded that the prediction accuracy is not sufficiently high enough yet."

DWild finds his way in.

Robertgi finds his way in.

DWild says, "Sorry I missed the beginning of the discussion..."

Hersh says, "I'm still interested in getting some idea of the kinds of problems (either in this domain or not) for which GAs seem to be more effective than neural nets and simulated annealing."

SteffenSK [to Hersh]: "Problems that have a structure and are not easily encoded as numbers (which could be fed in ANNs). Eg. symbolic function approximation."

DWild says, "I also wanted to ask how GA's compare to simulated annealing :-)"

SteffenSK says, "GA - SA: well, I'd say the latter require much less implementation effort and have been shown to be quite efficient. GAs - on the other hand - allow a more detailed representation of the model (in gene, operators, competition in reproduction)."

DWild says, "Have there been any practical comparisons of GA/SA in protein structure modelling?"

SteffenSK [to DWild]: "Yes, see Lavery et al. I think the reference should be in my chapter. There is some work on side chain placement where both approaches have been compared."

DWild nods.

GeorgF says, "Btw, I found the reference "P. Tuffery, C. Etchebest, S. Hazout, R. Lavery, A new approach to the rapid determination of protein side chain conformations, J. Biomol. Struct. Dyn., vol 8, no 6, pp. 1267-1289, 1991."

SteffenSK says, "Yes, that's it!"

GeorgF [to SteffenSK]: What was the `most successful' application of GAs in biocomputing / in general? Not structure prediction, I guess? (in your honest opinion, of course ;-)"

SteffenSK [to GeorgF]: "The most successful application of GAs in biocomputing / in general? Hm. I don't know in general. There are so many things going on."

GeorgF says, "Probably better, some very successful applications?"

SteffenSK says, "I have difficulties even with 'very successful applications': would that be something that revolutionized the application domain or that performed well compared to standard approaches?"

GeorgF [to SteffenSK]: "Let's start with the latter (performed well compared to standard approaches)."

SteffenSK [to GeorgF]: "Then it could very well be tertiary structure prediction. GAs did about as good as other methods ..."

John Towell says, "I'm bothered about the notion that the information is contained in the primary structure - when it is clear that proteins fold in vitro differently than the way they fold in vivo with chaperone assistance- shouldn't computational approaches include the 'perturbation' from the chaperone influence?"

SteffenSK [to John Towell]: "Sure, chaperones have always been neglected in this work. However, there are certainly proteins that won't need chaperones."

John Towell nods.

Ed finds his/her way in.

1st Guest (HLA) says, "As I understood from a recent review (Pedersen & Moult Curr Opin Struct Biol 1996 6: 227-231) a main advantage with GAs is the possibility to easily implement them to run in parallel than other trajectory search procedures?"

SteffenSK [to HLA]: "Yes, true. I myself took the advantage of easy parallelization in GA / protein work when using Parallab's Intel Paragon with 96 nodes."

DWild would like to know if there is any code for GAs applied to protein structure 'on the net'.

SteffenSK [to DWild]: "Well, there is always GENESIS by J. Greffenstette. I wrote some simple functions (in C) that implement a 2D model of proteins (Unger & Moult). That code could be made available. It is also documented in my book."

DWild says, "Thanks! I also wanted to ask if the simulations described in chapter 5 were your own work and if the code was available?"

SteffenSK says, "I cannot distribute the code for the actual force field simulations because they belong to my previous employer, Brainware GmbH (although I implemented a large part of it). The 2D code is mine and I am prepared and willing to share it for free. I could add the source code to my home page within the next couple of days. See http://mycroft.rz-berlin.mpg.de/~steffen."

Robertgi finds his way out.

SteffenSK says, "So, is there anybody in this room planning to apply GAs to some topic in molecular biology?"

DWild says, "I'd thought about GA's for protein x-ray refinement, but I'm not sure there would be an advantage over existing SA methods..."

SteffenSK [to DWild]: "Have you seen Lavery's work? They tried some in that direction, although the GA is not their best algorithm (but which must not be the final word in this case)."

DWild says, "re: Lavery - no - I'll look it up - thanks"

GeorgF says, "Genetic Algorithms for Multiple Alignment. I'm interested to apply GAs there, but I know just one paper about it: Notredame, C. and Higgins, D.G., SAGA: sequence alignment by genetic algorithm, Nucleic Acids Research 1996 24(8): 1515-1524."

SteffenSK [to GeorgF]: "Sounds to me as a good application: there are building blocks (the homologous fragments) and they are independent in the application model."

GeorgF [to SteffenSK]: I've got the following quote from Martin Vingron's thesis
-----------------------------GeorgF at BioMOO------------------------------
"[...] In the case of alignments it is very difficult to perturb a given alignment without sacrificing the score acquired thus far. Established methods, be they deterministic like gradient walking or stochastic like simulated annealing, require such ways to introduce perturbations. [...]"
----------------------------------GeorgF-----------------------------------

SteffenSK [to GeorgF]: "Yes, I think that's just what I meant: GAs could be very efficient to maintain already fitting parts without sacrificing much of the score acquired so far."

GeorgF says, "You mean, GAs in contrast to simulated annealing/gradient walking ?"

SteffenSK [to GeorgF]: "Yes."

GeorgF [to SteffenSK]: "Do you know any other papers on this topic?"

SteffenSK [to GeorgF]: "No, sorry."

GeorgF says, "I think we should devote some time now for more technical questions, in particular on the course-text. I know that there are some :-)
SteffenSK says, "Yes, any technical questions?"

Bergeron has a question

GeorgF [to Bergeron]: go ahead :-)

Bergeron says, "Why do you conclude that s(H,t) will grow exponentially. It seems to me that there should be additional hypothesis on the form of function 'g'? For example if g=(1-b)^i there is no growth..."

SteffenSK says, "Ok, you can rewrite the schema propagation function. I am just loading the hypertext course book ..."

GeorgF says, "On GA theory: http://www.techfak.uni-bielefeld.de/bcd/Curric/ProtEn/111.html. (If you have the HTML, Find string ``This equation is of the form''; Postscript: it's on page 6)."

SteffenSK says, "Ok, I have it right in front of me. Let's see: For that equation I assume a schema H could always outperform other schemata by a fraction b of the total mean fitness."

Bergeron nods.

SteffenSK says, "where (1 + b) is always > 1 but g < 1. The condition g=(1-b)^i would therefore not occur. Is that correct?"

Bergeron says, "g is the last factor and is of the form (1-a) so that fi=f0*(1+b)^i*(1-a)^i."

SteffenSK [to Bergeron]: "Yes."

Ed finds his/her way out.

GeorgF smiles. such technical questions are the everyday challenge for the instructors:)

SteffenSK [to Bergeron]: Yes. And if that is done a number of times the exponent increases which is synonymous with exponential increase (only for good schemata!).

Bergeron says, "But if a = b the last equation is equivalent to fi=f0*(1-b*b)^i."

SteffenSK [to Bergeron]: I doubt that fi=f0*(1+b)^i*(1-a)^i with a = b is equivalent to fi=f0*(1-b*b)^i. Are you sure?"

Bergeron says, "(1-b)*(1+b) = (1-b^2)."

SteffenSK [to Bergeron]: "Yes, you are right! Stupid me ... but what does that mean?"

Bergeron says, "If a is not really small, fi will not grow..."

SteffenSK says, "It is still exponential: in that case an exponential decay occurs for schemata with certain a and b. So we found a constraint for 'good' schemata!"

Bergeron is mildly unconvinced :)

SteffenSK says, "Yes, and Rechenberg and Holland also argue for small mutation and crossover rates!

Bergeron nods.

SteffenSK [to Bergeron]: Thank you for that question. I think that is an interesting point about the relative size of a and b. I'll have to think about the implications of that observation to the ratio and size of 'good' individuals (and schemata) to population size and mutation/crossover rates!

Bergeron nods.

GeorgF says, "For the non-maths people... I've got a few technical non-maths questions in a few minutes :)"

Syed has disconnected.

GeorgF says, "I think the conversation we just had shows what the course is about: how some (maths) knowledge about the tools helps using them better."

SteffenSK [to GeorgF]: "That's what I think!"

GeorgF says, "We should adjourn in 10 minutes, I guess."

SteffenSK says, "Why?"

GeorgF says, ">Why?< after 2 hours, people get tired. We should have another session in 3 weeks instead :)"

SteffenSK [to GeorgF]: "Yes, you are right. I'll try to be present!"

EdgarA . o O ( another session in 2 or three weeks is a good idea )"

GeorgF says, "May I ask 2 small questions ?"

Bergeron will be on a desert island in three weeks. She wont bother SteffenSK.

GeorgF smiles to Bergeron.

John Towell ews

SteffenSK [to Bergeron]: "I actually might also be out of town at that time, ..."

DWild [to GeorgF] "Sorry I missed the first part of the discussion - could you email the transcript?"

GeorgF says, "Yes, we intent to mail the transcript." He looks around. "Does anyone else have a question now? (technical or non-technical) ... - Ok, can you explain the difference torsion angle <-> bond angle? I think I've got an approximate idea, but not a clear grasp."

SteffenSK says, "Bond angles are between 3 atoms, the angle enclosed between two bonds. Torsion angles are between 4 atoms. It is the angle between the first and the third bond (of four consecutive atoms)."

------------------------John_Towell_'jt' at BioMOO-------------------------
 A                             Y----A
 |            <- bond angle    |<- torsional angle = rotation about 
 X---- B                       X----B                   X-Y bond  
-----------------------------John_Towell_'jt'------------------------------
John Towell plays around.

GeorgF thanks to John Towell and asks SteffenSK (grinning): "Maybe there is room for one very short last question: Do you think you'll move away from protein structure prediction anytime soon or, are there still things to do, that you expect can be done successfully?"

SteffenSK says, "Actually, I've shifted focus already - at least temporarily! This department is more concerned with genomes and genes but I am working on a project to build a system that can rationalise over genes, proteins and functions. A kind of electronic, knowledge based library on a large scale."

John Towell smiles.

GeorgF hehs.

GeorgF says, "Well, I'd like to thank Steffen Schulze-Kremer for his patience with our questions, and his insightful comments on GA applicability in particular. I hope we'll have another session, maybe in 3 weeks, or maybe in 5 weeks. This will be announced via vsns-bcd-students."

John Towell likes that a lot.

John Towell stands and applauds.

DWild says, "Thanks, Steffen!"

SteffenSK says, "And thanks to you, GeorgF - Without your help I couldn't get my Moo connection running! Plus thanks for kindly moderating this session!"

Bergeron waves goodbye to everyone.

John Towell [to SteffenSK]: thank you :)

SteffenSK says, "My pleasure!"

EdgarA waves to Steffen. Thanks!

AndreaSch join the others in thanking Steffen

1st Guest (HLA) smiles happily

GeorgF turns the BCD recorder [bcdr] off.

Participants (most descriptions taken from the public BioMOO user database):

Back to VSNS BioComputing Division Home Page.