From: Lawrence DeBivort (debivort@umd5.umd.edu)
Date: Sun 02 Mar 2003 - 02:08:03 GMT
WORD 'BURSTS' MAY REVEAL ONLINE TRENDS
By Will Knight
New Scientist
February 18, 2003
http://www.newscientist.com/news/news.jsp?id=ns99993405
Searching for sudden "bursts" in the usage of particular words could be used
to rapidly identify new trends and sort information more efficiently, says a
US computer scientist.
Jon Kleinberg, at Cornell University in New York, has developed computer
algorithms that identify bursts of word use in documents.
While other popular search techniques simply count the number of words or
phrases in documents, Kleinberg's approach also takes into account the rate
at which the word usage increases.
Kleinberg suggests that the method could be applied to weblogs to track new
social trends. For example, identifying word bursts in the hundreds of
thousands of personal diaries now on the web could help advertisers quickly
spot an emerging craze.
Hot or not
The algorithms used to identify these sudden bursts are relatively simple,
but very powerful, says Christos Papadimitriou, at the University of
California at Berkeley.
"The key is to find unexpected changes in the frequency of the appearance of
words," he told New Scientist. Papadimitriou agrees the method could prove
valuable when searching for new trends in weblogs.
The approach could also be applied to sifting through other types of
information. Identifying word bursts within email messages sent to a
company's customer support address might help maintenance staff spot a major
new problem.
Researchers at Google, the world's most widely used internet search engine,
have already shown that identifying spikes in search terms can be used to
track the spread of news and rumours around the world. The algorithms that
run Google's automated news aggregation service remain secret, but it is not
difficult to imagine that word bursts could, or do, have a useful role.
In a simple historical test of the technique, Kleinberg analysed all the
annual State of the Union addresses given by US Presidents since 1790. He
found that particular word "bursts" could indeed be linked to important
events at the time the speeches were delivered.
In the years that immediately followed the American Revolution, for example,
sudden bursts in the use of words such as "militia", "British" and "savages"
are found.
From 1930 to 1937 a spike in the use of the word "depression" is seen. And
from 1949 to 1959 "atomic" is the word with the greatest "burstiness". Later
in the 20th century, words such as "Vietnam", "Soviet", "communist" and
"Afghanistan" increase sharply in usage.
Kleinberg presents his findings on Tuesday at the American Association for
the Advancement of Science's annual meeting in Denver, Colorado.
===============================================================
This was distributed via the memetics list associated with the
Journal of Memetics - Evolutionary Models of Information Transmission
For information about the journal and the list (e.g. unsubscribing)
see: http://www.cpm.mmu.ac.uk/jom-emit
This archive was generated by hypermail 2.1.5 : Sun 02 Mar 2003 - 01:53:20 GMT