Received: by alpheratz.cpm.aca.mmu.ac.uk id OAA03816 (8.6.9/5.3[ref pg@gmsl.co.uk] for cpm.aca.mmu.ac.uk from fmb-majordomo@mmu.ac.uk); Tue, 6 Jun 2000 14:11:41 +0100 Subject: Fwd: What Is a Gene, Anyway? Date: Tue, 6 Jun 2000 09:08:51 -0400 x-sender: wsmith1@camail2.harvard.edu x-mailer: Claris Emailer 2.0v3, Claritas est veritas From: "Wade T.Smith" <wade_smith@harvard.edu> To: "memetics list" <memetics@mmu.ac.uk> Content-Type: text/plain; charset="US-ASCII" Message-ID: <20000606130909.AAA4066@camailp.harvard.edu@[128.103.125.215]> Sender: fmb-majordomo@mmu.ac.uk Precedence: bulk Reply-To: memetics@mmu.ac.uk
Special Report Updated: May 29, 2000
http://news.bmn.com/news/sreport
What Is a Gene, Anyway?
by Tabitha M. Powledge
You've heard, of course, that genome scientists gathered at a recent Cold 
Spring Harbor Laboratory meeting a couple weeks ago started a betting 
pool on the number of human genes. This humanizing news - that even staid 
scientists like a little flutter - transcended the weekly journals to be 
featured in an eclectic array of popular media ranging from Der Speigel 
to Wired. The 228 wagers laid down as of this writing range from 27,462 
all the way to 200,000, but the median falls at the lower end: 53,700.
Lost in the numbers game was the fact that you can't count genes unless 
you know what counts as genes. "It's interesting in that this seems to 
have focused so many genomics people, many so wrapped up in mapping and 
sequencing projects in the last few years, in what is, in a way, a very 
old debate - that is, the nature of the gene," says Cold Spring Harbor's 
David Stewart, keeper of the wagers.
For Genesweep - the pool's name - there had to be a definition. For 
example, one of the stipulations (found on the pool's Web site) is that 
for betting purposes, a gene is a set of connected transcripts. The 
European Bioinformatics Institute's Ewan Birney, who organized Genesweep 
and is the keeper of the official count, cheerfully concedes that the 
specs were selected largely for practical reasons. "It's the first time 
we've been able to come up with an operational definition of a gene 
that's not based around genetics. This is a true operational definition 
from a bioinformatics perspective," he recounts. A curious boast, but 
never mind.
So what is a gene?
Nobody knows.
If anyone knew, it would be Human Genome Project head Francis Collins, 
right? But Collins, director of the National Human Genome Research 
Institute, has acknowledged, "The closer you look at the definition of a 
gene, the harder it is to be sure you've got it right." For discussion 
purposes, he's willing to settle for calling a gene a packet of 
information that carries out a particular instruction, usually to make a 
particular protein.
The Genesweep definition, however, will have no truck with weaseling 
qualifiers like "usually." It counts only genes that make proteins, on 
the pragmatic grounds that genes that make RNA will be too difficult to 
assess by 2003. That's Genesweep's end date, selected, Birney says, 
partly on a whim and partly because "in three years we really should have 
this number, plus or minus 100." (It doesn't hurt that it's also the 50th 
anniversary of Watson and Crick's famously terse description of DNA's 
structure.)
Noting that a gene can make two or more proteins, Collins asks, "Do you 
call that one gene or two?" The resounding answer from Cold Spring Harbor 
is "one." For Birney, and many others, a key feature of the definition is 
that it rules out alternative splicing.
"The definition seems pretty reasonable to me - in particular, the 
restriction to protein-coding genes, and not counting alternatively 
spliced products as separate genes," says the University of Washington's 
Phil Green, coauthor of the estimate of 35,000 human genes appearing in 
June 2000's Nature Genetics.
It also seems reasonable to Samuel Aparicio of Cambridge University, 
author of the News and Views commentary on the three papers in that 
issue, which provide wildly varying estimates, ranging from 30,000 to 
120,000, of human gene number. Despite Birney's claim that the Genesweep 
definition is not based on genetics, Aparicio argues that it takes off 
from the classical genetic definition: a gene is a heritable unit that 
corresponds to an observable phenotype. Phenotype variation is mostly due 
to mutations in a protein-coding sequence. "So one is really talking 
about gene locus," he declares. "The definition is not completely 
inclusive, but it embodies the concept of gene locus."
The definition certainly is not inclusive. Among the missing are 
regulatory regions and enhancers. Of course, these sequences are 
sometimes distant from the coding region they superintend, and - to 
further confuse matters - can also preside over more than one coding 
region. Cold Spring Harbor's Jan Witkowski says he tends to think of a 
gene as a protein-coding region plus all related sequences, looking at it 
as a functional unit rather than a structural unit or a sequence unit. 
But, he allows, "Even in the fuzzy real world you can have a structural 
definition, where the DNA elements are close together, or a functional 
definition, where the elements need not be close together."
Lest you think for a moment that a distinction between functional and 
structural definitions will help keep you clear on what a gene is, listen 
to Green: "Genes have both functions and structures, so I'm not sure it 
makes any sense to talk about the definition being functional or 
structural. Regulatory elements sometimes are considered part of a gene, 
but sometimes they control two or more different genes, so they aren't 
part of any gene. I think most biologists would not consider a regulatory 
element to be a gene in itself, so these considerations don't really have 
any bearing on the gene number."
It may seem peculiar not to count regulatory sequences as at least parts 
of genes, since they are as essential as the coding sequence for 
producing a protein. But there are reasons for excluding them besides 
mere topographical convenience. The chief one is that a gene does not 
have to be expressed to exist. Somatic cells all have the same genes, but 
particular cell types express only some of them, as Christopher D. Epp, 
of Massey University in New Zealand, pointed out in a 1997 letter to 
Nature.
Apparently, they spend a lot of time down under considering what a gene 
is. Paul Griffiths is head of history and philosophy of science at the 
University of Sydney in Australia, and coauthor of a paper on gene 
definitions that appeared in August 1999 in BioScience. The Genesweep 
definition, he says, pretty much hews to what he calls the classical 
molecular gene concept, which arose in the 1960s. "A recent survey I did 
here suggests that it is still the dominant definition amongst molecular 
biologists," he reports. On the other hand, he points out, the Genesweep 
definition also specifies "that the translation machinery does translate 
the sequence at some time." This proviso is not part of the classical 
molecular gene. "Interesting," he says. "This will exclude lots of 
sequences often counted as genes."
Nor does that conclude our definitional ambiguities. "One issue is 
whether multiple copies of a gene that are identical in sequence should 
all be counted as distinct genes. By the definition they would be, but 
some investigators would argue that they shouldn't," Philip Green notes.
There are other issues too, but enough already.
Confused? As you can see, you've got lots of classy company. And not all 
of it is human. New Scientist recently described a report, which appeared 
in the April issue of Genome Research, on a test of gene-detection 
software from 12 labs. The weekly said the researchers set the programs 
loose on a stretch of well-characterized Drosophila DNA nearly 3 million 
base pairs long. The programs were able to detect up to 97 percent of the 
protein-coding sequences. But they blew it when, grouping the sequences 
into genes and distinguishing genes from junk, they failed to find up to 
16 percent of the genes, and found that up to 52 percent of the sequences 
identified as genes are probably not. So cheer up. Computers aren't sure 
what a gene is either.
===============================================================
This was distributed via the memetics list associated with the
Journal of Memetics - Evolutionary Models of Information Transmission
For information about the journal and the list (e.g. unsubscribing)
see: http://www.cpm.mmu.ac.uk/jom-emit
This archive was generated by hypermail 2b29 : Tue Jun 06 2000 - 14:12:22 BST