Fwd: What Is a Gene, Anyway?

From: Wade T.Smith (wade_smith@harvard.edu)
Date: Tue Jun 06 2000 - 14:08:51 BST

  • Next message: Wade T.Smith: "Fwd: The Scientist in the Crib: Minds, Brains, and How Children Learn"

    Received: by alpheratz.cpm.aca.mmu.ac.uk id OAA03816 (8.6.9/5.3[ref pg@gmsl.co.uk] for cpm.aca.mmu.ac.uk from fmb-majordomo@mmu.ac.uk); Tue, 6 Jun 2000 14:11:41 +0100
    Subject: Fwd: What Is a Gene, Anyway?
    Date: Tue, 6 Jun 2000 09:08:51 -0400
    x-sender: wsmith1@camail2.harvard.edu
    x-mailer: Claris Emailer 2.0v3, Claritas est veritas
    From: "Wade T.Smith" <wade_smith@harvard.edu>
    To: "memetics list" <memetics@mmu.ac.uk>
    Content-Type: text/plain; charset="US-ASCII"
    Message-ID: <20000606130909.AAA4066@camailp.harvard.edu@[128.103.125.215]>
    Sender: fmb-majordomo@mmu.ac.uk
    Precedence: bulk
    Reply-To: memetics@mmu.ac.uk
    

    Special Report Updated: May 29, 2000

    http://news.bmn.com/news/sreport

    What Is a Gene, Anyway?

    by Tabitha M. Powledge

    You've heard, of course, that genome scientists gathered at a recent Cold
    Spring Harbor Laboratory meeting a couple weeks ago started a betting
    pool on the number of human genes. This humanizing news - that even staid
    scientists like a little flutter - transcended the weekly journals to be
    featured in an eclectic array of popular media ranging from Der Speigel
    to Wired. The 228 wagers laid down as of this writing range from 27,462
    all the way to 200,000, but the median falls at the lower end: 53,700.

    Lost in the numbers game was the fact that you can't count genes unless
    you know what counts as genes. "It's interesting in that this seems to
    have focused so many genomics people, many so wrapped up in mapping and
    sequencing projects in the last few years, in what is, in a way, a very
    old debate - that is, the nature of the gene," says Cold Spring Harbor's
    David Stewart, keeper of the wagers.

    For Genesweep - the pool's name - there had to be a definition. For
    example, one of the stipulations (found on the pool's Web site) is that
    for betting purposes, a gene is a set of connected transcripts. The
    European Bioinformatics Institute's Ewan Birney, who organized Genesweep
    and is the keeper of the official count, cheerfully concedes that the
    specs were selected largely for practical reasons. "It's the first time
    we've been able to come up with an operational definition of a gene
    that's not based around genetics. This is a true operational definition
    from a bioinformatics perspective," he recounts. A curious boast, but
    never mind.

    So what is a gene?

    Nobody knows.

    If anyone knew, it would be Human Genome Project head Francis Collins,
    right? But Collins, director of the National Human Genome Research
    Institute, has acknowledged, "The closer you look at the definition of a
    gene, the harder it is to be sure you've got it right." For discussion
    purposes, he's willing to settle for calling a gene a packet of
    information that carries out a particular instruction, usually to make a
    particular protein.

    The Genesweep definition, however, will have no truck with weaseling
    qualifiers like "usually." It counts only genes that make proteins, on
    the pragmatic grounds that genes that make RNA will be too difficult to
    assess by 2003. That's Genesweep's end date, selected, Birney says,
    partly on a whim and partly because "in three years we really should have
    this number, plus or minus 100." (It doesn't hurt that it's also the 50th
    anniversary of Watson and Crick's famously terse description of DNA's
    structure.)

    Noting that a gene can make two or more proteins, Collins asks, "Do you
    call that one gene or two?" The resounding answer from Cold Spring Harbor
    is "one." For Birney, and many others, a key feature of the definition is
    that it rules out alternative splicing.

    "The definition seems pretty reasonable to me - in particular, the
    restriction to protein-coding genes, and not counting alternatively
    spliced products as separate genes," says the University of Washington's
    Phil Green, coauthor of the estimate of 35,000 human genes appearing in
    June 2000's Nature Genetics.

    It also seems reasonable to Samuel Aparicio of Cambridge University,
    author of the News and Views commentary on the three papers in that
    issue, which provide wildly varying estimates, ranging from 30,000 to
    120,000, of human gene number. Despite Birney's claim that the Genesweep
    definition is not based on genetics, Aparicio argues that it takes off
    from the classical genetic definition: a gene is a heritable unit that
    corresponds to an observable phenotype. Phenotype variation is mostly due
    to mutations in a protein-coding sequence. "So one is really talking
    about gene locus," he declares. "The definition is not completely
    inclusive, but it embodies the concept of gene locus."

    The definition certainly is not inclusive. Among the missing are
    regulatory regions and enhancers. Of course, these sequences are
    sometimes distant from the coding region they superintend, and - to
    further confuse matters - can also preside over more than one coding
    region. Cold Spring Harbor's Jan Witkowski says he tends to think of a
    gene as a protein-coding region plus all related sequences, looking at it
    as a functional unit rather than a structural unit or a sequence unit.
    But, he allows, "Even in the fuzzy real world you can have a structural
    definition, where the DNA elements are close together, or a functional
    definition, where the elements need not be close together."

    Lest you think for a moment that a distinction between functional and
    structural definitions will help keep you clear on what a gene is, listen
    to Green: "Genes have both functions and structures, so I'm not sure it
    makes any sense to talk about the definition being functional or
    structural. Regulatory elements sometimes are considered part of a gene,
    but sometimes they control two or more different genes, so they aren't
    part of any gene. I think most biologists would not consider a regulatory
    element to be a gene in itself, so these considerations don't really have
    any bearing on the gene number."

    It may seem peculiar not to count regulatory sequences as at least parts
    of genes, since they are as essential as the coding sequence for
    producing a protein. But there are reasons for excluding them besides
    mere topographical convenience. The chief one is that a gene does not
    have to be expressed to exist. Somatic cells all have the same genes, but
    particular cell types express only some of them, as Christopher D. Epp,
    of Massey University in New Zealand, pointed out in a 1997 letter to
    Nature.

    Apparently, they spend a lot of time down under considering what a gene
    is. Paul Griffiths is head of history and philosophy of science at the
    University of Sydney in Australia, and coauthor of a paper on gene
    definitions that appeared in August 1999 in BioScience. The Genesweep
    definition, he says, pretty much hews to what he calls the classical
    molecular gene concept, which arose in the 1960s. "A recent survey I did
    here suggests that it is still the dominant definition amongst molecular
    biologists," he reports. On the other hand, he points out, the Genesweep
    definition also specifies "that the translation machinery does translate
    the sequence at some time." This proviso is not part of the classical
    molecular gene. "Interesting," he says. "This will exclude lots of
    sequences often counted as genes."

    Nor does that conclude our definitional ambiguities. "One issue is
    whether multiple copies of a gene that are identical in sequence should
    all be counted as distinct genes. By the definition they would be, but
    some investigators would argue that they shouldn't," Philip Green notes.

    There are other issues too, but enough already.

    Confused? As you can see, you've got lots of classy company. And not all
    of it is human. New Scientist recently described a report, which appeared
    in the April issue of Genome Research, on a test of gene-detection
    software from 12 labs. The weekly said the researchers set the programs
    loose on a stretch of well-characterized Drosophila DNA nearly 3 million
    base pairs long. The programs were able to detect up to 97 percent of the
    protein-coding sequences. But they blew it when, grouping the sequences
    into genes and distinguishing genes from junk, they failed to find up to
    16 percent of the genes, and found that up to 52 percent of the sequences
    identified as genes are probably not. So cheer up. Computers aren't sure
    what a gene is either.

    ===============================================================
    This was distributed via the memetics list associated with the
    Journal of Memetics - Evolutionary Models of Information Transmission
    For information about the journal and the list (e.g. unsubscribing)
    see: http://www.cpm.mmu.ac.uk/jom-emit



    This archive was generated by hypermail 2b29 : Tue Jun 06 2000 - 14:12:22 BST