maybe Walpiri is a language, but English isn't

  Next message: Grant Callaghan: "Re: maybe Walpiri is a language, but English isn't"

    Several people have made similar points about just how much "our" language
    has grown within living memory. "Our" apparently means "English", the
    language used on this list, though some of you read and write it as a second
    (or third, etc.) language.

    You may have noticed my recent comments about getting trapped in a semantic
    dilemma where I find myself wanting to say that English, which some people
    obviously take as the very paradigm of language, is not a language at all,
    while some computer coding systems like Pascal, Prolog, or Python which
    didn't exist all until recently seem quite obviously to be languages.

    This dilemma is quite real, and probably worthy of more discussion, but I am
    all semanticked-out at the moment and will keep my nutty notions to
    yself -- please let me instead mention some relatively well understood
    linguistic ideas, intellectually respectable ones.

    Many linguists who would immediately insist that Walpiri, Ainu, or Inuit are
    languages, and could only laugh at my preferences for strange artificial
    concoctions (yes, Prof. Newton, I mean you!), nevertheless have some
    suspicions about English. Whatever English is, it sure doesn't have much
    in common with the "real" languages spoken (mostly) by illiterate people (or
    people illiterate until recently). For one thing, English just has too
    many damn words.

    My knowledge of the linguistics literature is rapidly becoming out-of-date
    and even more rapidly being forgotten, but I think many of the linguists who
    study obscure natural languages like Walpiri that lack any written culture
    have described such "real" languages as containing about 6,000 words -- of
    course this applies only to languages in which the notion of "word" is

    As a long-time science fiction reader I have no trouble imagining an
    alternative universe in which English has either swallowed up or replaced
    all other literary languages, leaving a world with a couple of thousand
    small 6,000 word natural languages plus the one great monster, English, with
     hundreds of thousands of words. In such a universe many more people might
    have wondered if it makes any sense to say that the label "language" is as
    correct when applied to English as it is when applied to all the little
    non-literary languages. And even in our own universe, this question make
    sense and has been debated by some linguists (though I can't remember whom,
    at the moment).

    Those of you who responded to my earlier messages by pointing out how the
    large the vocabulary of English has grown might be surprised to learn just
    how few words they use themselves. I routinely run various text files,
    HTML pages, and e-mail messages of mine through corpus-linguistics software,
    generating wordlists and frequency counts. It usually embarrassing and
    still surprising to learn that my seemingly literate text, which I so boldly
    put before the world whenever I get a chance, is written with a vocabulary
    of about 3,000 words.

    Some linguists downplay the significance of such small numbers by pointing
    to an always larger number that is supposed to measure "recognition
    vocabulary". Since I'm not set up to do controlled laboratory experiments
    to count the words I can recognise, I've come up with an approximate number
    by looking at various dictionaries, of various sizes.

    I used to keep a Merriam-Webster paperback dictionary beside my chair, but a
    few years ago I noticed that if I needed to look up a word, then that word
    would almost never be found in that 60 thousand word dictionary. I'm very
    fond of the Collins Concise Dictionary Plus which has about 115 thousand
    words, and it remains useful to me because if I needed to look up a word,
    then about half the time the word would be there. From these two data
    points I estimate that my "recognition vocabulary" is about 60,000 words.

    But this is pretty dubious stuff. From morphological clues we can all
    "recognise" words we have never seen before and be quite sure of their
    meaning. And contextual clues help even more.

    I have a little computer program that randomly replaces a few real words in
    a text file with morphologically-opaque non-words, word-like sequences of
    letters that are not in English or any other (common) language. These
    non-words are often easily understood from context alone, and "feel" like
    part of one's recognition vocabulary, although they certainly are not.

    If we can imagine scrapping the dubious notion of "recognition vocabulary",
    I think we can shed some light on the status of English. Great literary
    geniuses aside, we all seem to communicate using only a very small number of
    words, a few thousand, a number suspiciously like the number of words used
    by speakers of all those non-literary languages.

    Those of us whose first "language" is English like to think of ourselves as
    what Churchill called "The English Speaking Peoples", but I'm increasingly
    willing to admit that I speak and write only a subset of English, and not a
    very big subset, at that.

    That I can read the works of Churchill, a man with a very large active
    vocabulary, much larger than my own, proves little -- to read or understood
    spoken speech we use morphological and contextual clues, as explained above.
    I've modelled that capability with computer programs, which were quite good
    (about a 30% success rate) at guessing the meaning of words not in their
    internal dictionary.

    So, perhaps we don't speak and write English (Warning, warning, semantic
    paradox alert!) because English is not a language. We speak and write our
    own small subsets of English, each subset being essentially a language in
    itself, the kind of thing our brains have evolved to use). English itself
    might not be a language at all, just a amorphous collection of

    Our subset-languages do overlap to some extent. But I can testify from long
    and sometimes bitter experience that even old school-friends who met in
    childhood, often have long discussions, and read many of the same books
    speak subset-languages that don't overlap completely and so these people
    sometimes misunderstand each other quite badly, when conversations drift
    into topics and terminology outside the overlapping areas. When that
    happens it often feels like we are speaking different languages, and I now
    suspect that we are.

    Well, anyway, the status of English as a language has been questioned, and I
    am most certainly not the first person to do so. It is much more novel to
    argue that none of the (other?) natural languages are languages either, (but
    just memetic content expressed in some underlying ideal language -- see my
    earlier messages) -- but I'm not the first person to do that, either. The
    idea is clearly visible in von Humboldt's writings, and in the papers of
    some of the generative-semantics people who took Chomsky's "deep structure"
    theories, and ran with them. If there is any novelty in my version of this
    idea it is perhaps that I am the first person foolish enough to try to
    explain it to people by telling them that "languages are not languages".


