Received: by alpheratz.cpm.aca.mmu.ac.uk id MAA19746 (8.6.9/5.3[ref email@example.com] for cpm.aca.mmu.ac.uk from firstname.lastname@example.org); Wed, 1 May 2002 12:48:57 +0100 Date: Wed, 01 May 2002 03:42:00 -0700 From: "Douglas P. Wilson" <email@example.com> Subject: maybe Walpiri is a language, but English isn't To: firstname.lastname@example.org Cc: Robert Neville <email@example.com> Message-id: <firstname.lastname@example.org> X-MIMEOLE: Produced By Microsoft MimeOLE V5.00.3018.1300 X-Mailer: Microsoft Outlook Express 5.00.3018.1300 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT X-Priority: 3 X-MSMail-priority: Normal Sender: email@example.com Precedence: bulk Reply-To: firstname.lastname@example.org
Several people have made similar points about just how much "our" language
has grown within living memory. "Our" apparently means "English", the
language used on this list, though some of you read and write it as a second
(or third, etc.) language.
You may have noticed my recent comments about getting trapped in a semantic
dilemma where I find myself wanting to say that English, which some people
obviously take as the very paradigm of language, is not a language at all,
while some computer coding systems like Pascal, Prolog, or Python which
didn't exist all until recently seem quite obviously to be languages.
This dilemma is quite real, and probably worthy of more discussion, but I am
all semanticked-out at the moment and will keep my nutty notions to
yself -- please let me instead mention some relatively well understood
linguistic ideas, intellectually respectable ones.
Many linguists who would immediately insist that Walpiri, Ainu, or Inuit are
languages, and could only laugh at my preferences for strange artificial
concoctions (yes, Prof. Newton, I mean you!), nevertheless have some
suspicions about English. Whatever English is, it sure doesn't have much
in common with the "real" languages spoken (mostly) by illiterate people (or
people illiterate until recently). For one thing, English just has too
many damn words.
My knowledge of the linguistics literature is rapidly becoming out-of-date
and even more rapidly being forgotten, but I think many of the linguists who
study obscure natural languages like Walpiri that lack any written culture
have described such "real" languages as containing about 6,000 words -- of
course this applies only to languages in which the notion of "word" is
As a long-time science fiction reader I have no trouble imagining an
alternative universe in which English has either swallowed up or replaced
all other literary languages, leaving a world with a couple of thousand
small 6,000 word natural languages plus the one great monster, English, with
hundreds of thousands of words. In such a universe many more people might
have wondered if it makes any sense to say that the label "language" is as
correct when applied to English as it is when applied to all the little
non-literary languages. And even in our own universe, this question make
sense and has been debated by some linguists (though I can't remember whom,
at the moment).
Those of you who responded to my earlier messages by pointing out how the
large the vocabulary of English has grown might be surprised to learn just
how few words they use themselves. I routinely run various text files,
HTML pages, and e-mail messages of mine through corpus-linguistics software,
generating wordlists and frequency counts. It usually embarrassing and
still surprising to learn that my seemingly literate text, which I so boldly
put before the world whenever I get a chance, is written with a vocabulary
of about 3,000 words.
Some linguists downplay the significance of such small numbers by pointing
to an always larger number that is supposed to measure "recognition
vocabulary". Since I'm not set up to do controlled laboratory experiments
to count the words I can recognise, I've come up with an approximate number
by looking at various dictionaries, of various sizes.
I used to keep a Merriam-Webster paperback dictionary beside my chair, but a
few years ago I noticed that if I needed to look up a word, then that word
would almost never be found in that 60 thousand word dictionary. I'm very
fond of the Collins Concise Dictionary Plus which has about 115 thousand
words, and it remains useful to me because if I needed to look up a word,
then about half the time the word would be there. From these two data
points I estimate that my "recognition vocabulary" is about 60,000 words.
But this is pretty dubious stuff. From morphological clues we can all
"recognise" words we have never seen before and be quite sure of their
meaning. And contextual clues help even more.
I have a little computer program that randomly replaces a few real words in
a text file with morphologically-opaque non-words, word-like sequences of
letters that are not in English or any other (common) language. These
non-words are often easily understood from context alone, and "feel" like
part of one's recognition vocabulary, although they certainly are not.
If we can imagine scrapping the dubious notion of "recognition vocabulary",
I think we can shed some light on the status of English. Great literary
geniuses aside, we all seem to communicate using only a very small number of
words, a few thousand, a number suspiciously like the number of words used
by speakers of all those non-literary languages.
Those of us whose first "language" is English like to think of ourselves as
what Churchill called "The English Speaking Peoples", but I'm increasingly
willing to admit that I speak and write only a subset of English, and not a
very big subset, at that.
That I can read the works of Churchill, a man with a very large active
vocabulary, much larger than my own, proves little -- to read or understood
spoken speech we use morphological and contextual clues, as explained above.
I've modelled that capability with computer programs, which were quite good
(about a 30% success rate) at guessing the meaning of words not in their
So, perhaps we don't speak and write English (Warning, warning, semantic
paradox alert!) because English is not a language. We speak and write our
own small subsets of English, each subset being essentially a language in
itself, the kind of thing our brains have evolved to use). English itself
might not be a language at all, just a amorphous collection of
Our subset-languages do overlap to some extent. But I can testify from long
and sometimes bitter experience that even old school-friends who met in
childhood, often have long discussions, and read many of the same books
speak subset-languages that don't overlap completely and so these people
sometimes misunderstand each other quite badly, when conversations drift
into topics and terminology outside the overlapping areas. When that
happens it often feels like we are speaking different languages, and I now
suspect that we are.
Well, anyway, the status of English as a language has been questioned, and I
am most certainly not the first person to do so. It is much more novel to
argue that none of the (other?) natural languages are languages either, (but
just memetic content expressed in some underlying ideal language -- see my
earlier messages) -- but I'm not the first person to do that, either. The
idea is clearly visible in von Humboldt's writings, and in the papers of
some of the generative-semantics people who took Chomsky's "deep structure"
theories, and ran with them. If there is any novelty in my version of this
idea it is perhaps that I am the first person foolish enough to try to
explain it to people by telling them that "languages are not languages".
This was distributed via the memetics list associated with the
Journal of Memetics - Evolutionary Models of Information Transmission
For information about the journal and the list (e.g. unsubscribing)
This archive was generated by hypermail 2b29 : Wed May 01 2002 - 13:00:36 BST