Re: maybe Walpiri is a language, but English isn't

From: Grant Callaghan (
Date: Wed May 01 2002 - 16:13:38 BST

  • Next message: Grant Callaghan: "Re: future language"

    Received: by id QAA20205 (8.6.9/5.3[ref] for from; Wed, 1 May 2002 16:19:26 +0100
    X-Originating-IP: []
    From: "Grant Callaghan" <>
    Subject: Re: maybe Walpiri is a language, but English isn't
    Date: Wed, 01 May 2002 08:13:38 -0700
    Content-Type: text/plain; format=flowed
    Message-ID: <>
    X-OriginalArrivalTime: 01 May 2002 15:13:38.0742 (UTC) FILETIME=[C57AB560:01C1F122]
    Precedence: bulk

    >Several people have made similar points about just how much "our" language
    >has grown within living memory. "Our" apparently means "English", the
    >language used on this list, though some of you read and write it as a
    >(or third, etc.) language.
    >You may have noticed my recent comments about getting trapped in a semantic
    >dilemma where I find myself wanting to say that English, which some people
    >obviously take as the very paradigm of language, is not a language at all,
    >while some computer coding systems like Pascal, Prolog, or Python which
    >didn't exist all until recently seem quite obviously to be languages.
    >This dilemma is quite real, and probably worthy of more discussion, but I
    >all semanticked-out at the moment and will keep my nutty notions to
    >yself -- please let me instead mention some relatively well understood
    >linguistic ideas, intellectually respectable ones.
    >Many linguists who would immediately insist that Walpiri, Ainu, or Inuit
    >languages, and could only laugh at my preferences for strange artificial
    >concoctions (yes, Prof. Newton, I mean you!), nevertheless have some
    >suspicions about English. Whatever English is, it sure doesn't have much
    >in common with the "real" languages spoken (mostly) by illiterate people
    >people illiterate until recently). For one thing, English just has too
    >many damn words.
    >My knowledge of the linguistics literature is rapidly becoming out-of-date
    >and even more rapidly being forgotten, but I think many of the linguists
    >study obscure natural languages like Walpiri that lack any written culture
    >have described such "real" languages as containing about 6,000 words --
    >course this applies only to languages in which the notion of "word" is
    >As a long-time science fiction reader I have no trouble imagining an
    >alternative universe in which English has either swallowed up or replaced
    >all other literary languages, leaving a world with a couple of thousand
    >small 6,000 word natural languages plus the one great monster, English,
    > hundreds of thousands of words. In such a universe many more people
    >have wondered if it makes any sense to say that the label "language" is as
    >correct when applied to English as it is when applied to all the little
    >non-literary languages. And even in our own universe, this question make
    >sense and has been debated by some linguists (though I can't remember whom,
    >at the moment).
    >Those of you who responded to my earlier messages by pointing out how the
    >large the vocabulary of English has grown might be surprised to learn just
    >how few words they use themselves. I routinely run various text files,
    >HTML pages, and e-mail messages of mine through corpus-linguistics
    >generating wordlists and frequency counts. It usually embarrassing and
    >still surprising to learn that my seemingly literate text, which I so
    >put before the world whenever I get a chance, is written with a vocabulary
    >of about 3,000 words.
    >Some linguists downplay the significance of such small numbers by pointing
    >to an always larger number that is supposed to measure "recognition
    >vocabulary". Since I'm not set up to do controlled laboratory
    >to count the words I can recognise, I've come up with an approximate number
    >by looking at various dictionaries, of various sizes.
    >I used to keep a Merriam-Webster paperback dictionary beside my chair, but
    >few years ago I noticed that if I needed to look up a word, then that word
    >would almost never be found in that 60 thousand word dictionary. I'm very
    >fond of the Collins Concise Dictionary Plus which has about 115 thousand
    >words, and it remains useful to me because if I needed to look up a word,
    >then about half the time the word would be there. From these two data
    >points I estimate that my "recognition vocabulary" is about 60,000 words.
    >But this is pretty dubious stuff. From morphological clues we can all
    >"recognise" words we have never seen before and be quite sure of their
    >meaning. And contextual clues help even more.
    >I have a little computer program that randomly replaces a few real words in
    >a text file with morphologically-opaque non-words, word-like sequences of
    >letters that are not in English or any other (common) language. These
    >non-words are often easily understood from context alone, and "feel" like
    >part of one's recognition vocabulary, although they certainly are not.
    >If we can imagine scrapping the dubious notion of "recognition vocabulary",
    >I think we can shed some light on the status of English. Great literary
    >geniuses aside, we all seem to communicate using only a very small number
    >words, a few thousand, a number suspiciously like the number of words used
    >by speakers of all those non-literary languages.
    >Those of us whose first "language" is English like to think of ourselves as
    >what Churchill called "The English Speaking Peoples", but I'm increasingly
    >willing to admit that I speak and write only a subset of English, and not a
    >very big subset, at that.
    >That I can read the works of Churchill, a man with a very large active
    >vocabulary, much larger than my own, proves little -- to read or understood
    >spoken speech we use morphological and contextual clues, as explained
    >I've modelled that capability with computer programs, which were quite good
    >(about a 30% success rate) at guessing the meaning of words not in their
    >internal dictionary.
    >So, perhaps we don't speak and write English (Warning, warning, semantic
    >paradox alert!) because English is not a language. We speak and write our
    >own small subsets of English, each subset being essentially a language in
    >itself, the kind of thing our brains have evolved to use). English itself
    >might not be a language at all, just a amorphous collection of
    >Our subset-languages do overlap to some extent. But I can testify from
    >and sometimes bitter experience that even old school-friends who met in
    >childhood, often have long discussions, and read many of the same books
    >speak subset-languages that don't overlap completely and so these people
    >sometimes misunderstand each other quite badly, when conversations drift
    >into topics and terminology outside the overlapping areas. When that
    >happens it often feels like we are speaking different languages, and I now
    >suspect that we are.
    >Well, anyway, the status of English as a language has been questioned, and
    >am most certainly not the first person to do so. It is much more novel to
    >argue that none of the (other?) natural languages are languages either,
    >just memetic content expressed in some underlying ideal language -- see my
    >earlier messages) -- but I'm not the first person to do that, either. The
    >idea is clearly visible in von Humboldt's writings, and in the papers of
    >some of the generative-semantics people who took Chomsky's "deep structure"
    >theories, and ran with them. If there is any novelty in my version of
    >idea it is perhaps that I am the first person foolish enough to try to
    >explain it to people by telling them that "languages are not languages".
    > dpw
    That's an interesting outlook on the subject of "Is English really a
    language?" But since that is the term we most often use to describe the
    tools we use to communicate with each other and you didn't suggest any other
    name for them, I don't see why you balk at using it.

    As to the programming "languages" you mentioned, just take the statements
    from those programs and say them aloud and I'm pretty sure you will come to
    the conclusion they are a subset of English, just as mathematical notation
    is in the hands of an English speaker. A French or German speaker will
    pronounce them in their own language. But 1 + 1 = 2, when spoken aloud
    comes out "One plus one equals two," a valid English sentence. Such
    structures as "If A = 1 then dowhile A NOT > 5" Is also a statement that
    would be pretty obvious to any English speaking person when said aloud. A
    Japanese programmer would pronounce "if A then B" more like English than
    Japanese. It might come out, "Ifu Ah zhen Bi," which is what the Japanese
    have done to their own language since the American occupation after World
    War II. They have adopted words like "necktie" and "taxi" by making them
    sound like "nekutai" and "takushi."

    As for vocabulary, the huge number of words available to us are also being
    adopted by people of other cultures in ways that fit into the gramar and
    pronunciation schemes of the language they happen to speak. DNA has the
    same spelling in most languages of the world, but is pronounced differently
    in each, as the common pronounciation of those letters sound in their own
    language. It's a lot like the traffic signs we see on the highway. The red
    octogon that tells me to "stop" in English tells the people in Mexico
    "Alto." So DNA in Mexico comes out "de ene a" as the rules of pronunciation
    in their language require.

    Everyone uses a small subset of the total set we call English. But like I
    said many times before, the words you choose depend on the work you do and I
    doubt anyone is ever going to use all of the words available to him/her.
    There just isn't enough time in one person's life. When I was in high
    school, my biology teacher told me a cell had three parts. In today's
    world, the cell has nearly as many parts as the human body. If you're going
    to describe a cell, you will have to use the names of the parts in order to
    be understood. Many of these names didn't exist 50 years ago, or were taken
    from other disciplines to use in this context. The same applies to the
    fields of physics, electronics, computers, cosmology, etc., etc. The more
    new things we learn, the more new words are required to talk about them.
    Most of science in the world today is adapted from English or Russian
    because America and Russia put out the most used text books on the subjects
    and the young people in China, Japan, and North and South America had to
    learn enough English or Russian to read them. In Vietnam, they mostly
    learned them from French textbooks.

    But this escalation of vocabulary is just not a phenomenon of English. It
    applies equally to every language that has a large enough population
    speaking it to make the adoption of new ideas necessary. A hundred years or
    so ago, books written in Chinese didn't have punctuation marks. But Chinese
    scholars who saw the value of these tools for writing quickly adapted them
    to their own language. Modern Chinese use the same marks for the same
    reasons that European writers do. They convey necessary information that
    makes reading easier.

    You might be able to argue that in today's world, all languages are subsets
    of one big world language. Call it the human language. But as the culture
    of science and technology spread to more and more cultures, the people of
    those cultures adapt what they learn to the structures and forms of their
    own native tongues.

    Close to a thousand distinct languages were once spoken on the American
    continents. Now there are fewer than a hundred and soon there will be just
    three: English, Spanish and Portugese. At some point in time, the world
    will have reduced it all to a human language that has large elements of
    English, Russian, Chinese French and Spanish. Over time, due to the
    influence of radio, TV, movies, the internet and the spread of textbooks in
    these languages being used in schools everywhere, it will be hard to
    separate the world into distinct language groups. The vocabulary of science
    and technology belongs to everyone.

    I heard from a scientist at UCSD that when he goes to a conference in
    Europe, he has no trouble talking shop to just about everyone there. They
    all understand the same vocabulary items that apply to their field of
    interest, no matter how they are pronounced. The grammar is the grammar of
    mathematical notation. Science is becoming the universal human language.
    But the elements you choose depend on what you are using the language for.

    Love making and literature usually take you back into your native tongue.
    Business has adopted a more universal vocabulary that can be understood
    everywhere in the world today. The name "pidgin English" came from the way
    Chinese in Hongkong pronounced the English word "business." Words pass from
    culture to culture at the speed of TV and the internet, which means at the
    speed of light. The vocabulary of science, technology and business belong
    to no particular culture anymore. Everyone is borrowing from everyone else
    and the result can't be placed at the feet of any one linguistic group.


    MSN Photos is the easiest way to share and print your photos:

    This was distributed via the memetics list associated with the
    Journal of Memetics - Evolutionary Models of Information Transmission
    For information about the journal and the list (e.g. unsubscribing)

    This archive was generated by hypermail 2b29 : Wed May 01 2002 - 16:31:16 BST