Received: by alpheratz.cpm.aca.mmu.ac.uk id OAA03816 (8.6.9/5.3[ref pg@gmsl.co.uk] for cpm.aca.mmu.ac.uk from fmb-majordomo@mmu.ac.uk); Tue, 6 Jun 2000 14:11:41 +0100 Subject: Fwd: What Is a Gene, Anyway? Date: Tue, 6 Jun 2000 09:08:51 -0400 x-sender: wsmith1@camail2.harvard.edu x-mailer: Claris Emailer 2.0v3, Claritas est veritas From: "Wade T.Smith" <wade_smith@harvard.edu> To: "memetics list" <memetics@mmu.ac.uk> Content-Type: text/plain; charset="US-ASCII" Message-ID: <20000606130909.AAA4066@camailp.harvard.edu@[128.103.125.215]> Sender: fmb-majordomo@mmu.ac.uk Precedence: bulk Reply-To: memetics@mmu.ac.uk
Special Report Updated: May 29, 2000
http://news.bmn.com/news/sreport
What Is a Gene, Anyway?
by Tabitha M. Powledge
You've heard, of course, that genome scientists gathered at a recent Cold
Spring Harbor Laboratory meeting a couple weeks ago started a betting
pool on the number of human genes. This humanizing news - that even staid
scientists like a little flutter - transcended the weekly journals to be
featured in an eclectic array of popular media ranging from Der Speigel
to Wired. The 228 wagers laid down as of this writing range from 27,462
all the way to 200,000, but the median falls at the lower end: 53,700.
Lost in the numbers game was the fact that you can't count genes unless
you know what counts as genes. "It's interesting in that this seems to
have focused so many genomics people, many so wrapped up in mapping and
sequencing projects in the last few years, in what is, in a way, a very
old debate - that is, the nature of the gene," says Cold Spring Harbor's
David Stewart, keeper of the wagers.
For Genesweep - the pool's name - there had to be a definition. For
example, one of the stipulations (found on the pool's Web site) is that
for betting purposes, a gene is a set of connected transcripts. The
European Bioinformatics Institute's Ewan Birney, who organized Genesweep
and is the keeper of the official count, cheerfully concedes that the
specs were selected largely for practical reasons. "It's the first time
we've been able to come up with an operational definition of a gene
that's not based around genetics. This is a true operational definition
from a bioinformatics perspective," he recounts. A curious boast, but
never mind.
So what is a gene?
Nobody knows.
If anyone knew, it would be Human Genome Project head Francis Collins,
right? But Collins, director of the National Human Genome Research
Institute, has acknowledged, "The closer you look at the definition of a
gene, the harder it is to be sure you've got it right." For discussion
purposes, he's willing to settle for calling a gene a packet of
information that carries out a particular instruction, usually to make a
particular protein.
The Genesweep definition, however, will have no truck with weaseling
qualifiers like "usually." It counts only genes that make proteins, on
the pragmatic grounds that genes that make RNA will be too difficult to
assess by 2003. That's Genesweep's end date, selected, Birney says,
partly on a whim and partly because "in three years we really should have
this number, plus or minus 100." (It doesn't hurt that it's also the 50th
anniversary of Watson and Crick's famously terse description of DNA's
structure.)
Noting that a gene can make two or more proteins, Collins asks, "Do you
call that one gene or two?" The resounding answer from Cold Spring Harbor
is "one." For Birney, and many others, a key feature of the definition is
that it rules out alternative splicing.
"The definition seems pretty reasonable to me - in particular, the
restriction to protein-coding genes, and not counting alternatively
spliced products as separate genes," says the University of Washington's
Phil Green, coauthor of the estimate of 35,000 human genes appearing in
June 2000's Nature Genetics.
It also seems reasonable to Samuel Aparicio of Cambridge University,
author of the News and Views commentary on the three papers in that
issue, which provide wildly varying estimates, ranging from 30,000 to
120,000, of human gene number. Despite Birney's claim that the Genesweep
definition is not based on genetics, Aparicio argues that it takes off
from the classical genetic definition: a gene is a heritable unit that
corresponds to an observable phenotype. Phenotype variation is mostly due
to mutations in a protein-coding sequence. "So one is really talking
about gene locus," he declares. "The definition is not completely
inclusive, but it embodies the concept of gene locus."
The definition certainly is not inclusive. Among the missing are
regulatory regions and enhancers. Of course, these sequences are
sometimes distant from the coding region they superintend, and - to
further confuse matters - can also preside over more than one coding
region. Cold Spring Harbor's Jan Witkowski says he tends to think of a
gene as a protein-coding region plus all related sequences, looking at it
as a functional unit rather than a structural unit or a sequence unit.
But, he allows, "Even in the fuzzy real world you can have a structural
definition, where the DNA elements are close together, or a functional
definition, where the elements need not be close together."
Lest you think for a moment that a distinction between functional and
structural definitions will help keep you clear on what a gene is, listen
to Green: "Genes have both functions and structures, so I'm not sure it
makes any sense to talk about the definition being functional or
structural. Regulatory elements sometimes are considered part of a gene,
but sometimes they control two or more different genes, so they aren't
part of any gene. I think most biologists would not consider a regulatory
element to be a gene in itself, so these considerations don't really have
any bearing on the gene number."
It may seem peculiar not to count regulatory sequences as at least parts
of genes, since they are as essential as the coding sequence for
producing a protein. But there are reasons for excluding them besides
mere topographical convenience. The chief one is that a gene does not
have to be expressed to exist. Somatic cells all have the same genes, but
particular cell types express only some of them, as Christopher D. Epp,
of Massey University in New Zealand, pointed out in a 1997 letter to
Nature.
Apparently, they spend a lot of time down under considering what a gene
is. Paul Griffiths is head of history and philosophy of science at the
University of Sydney in Australia, and coauthor of a paper on gene
definitions that appeared in August 1999 in BioScience. The Genesweep
definition, he says, pretty much hews to what he calls the classical
molecular gene concept, which arose in the 1960s. "A recent survey I did
here suggests that it is still the dominant definition amongst molecular
biologists," he reports. On the other hand, he points out, the Genesweep
definition also specifies "that the translation machinery does translate
the sequence at some time." This proviso is not part of the classical
molecular gene. "Interesting," he says. "This will exclude lots of
sequences often counted as genes."
Nor does that conclude our definitional ambiguities. "One issue is
whether multiple copies of a gene that are identical in sequence should
all be counted as distinct genes. By the definition they would be, but
some investigators would argue that they shouldn't," Philip Green notes.
There are other issues too, but enough already.
Confused? As you can see, you've got lots of classy company. And not all
of it is human. New Scientist recently described a report, which appeared
in the April issue of Genome Research, on a test of gene-detection
software from 12 labs. The weekly said the researchers set the programs
loose on a stretch of well-characterized Drosophila DNA nearly 3 million
base pairs long. The programs were able to detect up to 97 percent of the
protein-coding sequences. But they blew it when, grouping the sequences
into genes and distinguishing genes from junk, they failed to find up to
16 percent of the genes, and found that up to 52 percent of the sequences
identified as genes are probably not. So cheer up. Computers aren't sure
what a gene is either.
===============================================================
This was distributed via the memetics list associated with the
Journal of Memetics - Evolutionary Models of Information Transmission
For information about the journal and the list (e.g. unsubscribing)
see: http://www.cpm.mmu.ac.uk/jom-emit
This archive was generated by hypermail 2b29 : Tue Jun 06 2000 - 14:12:22 BST