Fwd: At MIT, they can put words in our mouths

From: Wade Smith (wade_smith@harvard.edu)
Date: Wed May 15 2002 - 15:41:05 BST

  • Next message: Grant Callaghan: "Re: Fwd: At MIT, they can put words in our mouths"

    Received: by alpheratz.cpm.aca.mmu.ac.uk id PAA10119 (8.6.9/5.3[ref pg@gmsl.co.uk] for cpm.aca.mmu.ac.uk from fmb-majordomo@mmu.ac.uk); Wed, 15 May 2002 15:47:44 +0100
    Date: Wed, 15 May 2002 10:41:05 -0400
    Content-Type: text/plain; charset=ISO-8859-1; format=flowed
    Subject: Fwd: At MIT, they can put words in our mouths 
    From: Wade Smith <wade_smith@harvard.edu>
    To: memetics@mmu.ac.uk
    Content-Transfer-Encoding: quoted-printable
    Message-Id: <C98B2E7A-6811-11D6-94B8-003065A0F24C@harvard.edu>
    X-Mailer: Apple Mail (2.481)
    Sender: fmb-majordomo@mmu.ac.uk
    Precedence: bulk
    Reply-To: memetics@mmu.ac.uk
    

    At MIT, they can put words in our mouths

    By Gareth Cook, Globe Staff, 5/15/2002

    http://www.boston.com/dailyglobe2/135/metro/At_MIT_they_can_put_words_in_our_mouthsP.
    shtml

    CAMBRIDGE - Scientists at the Massachusetts Institute of
    Technology have created the first realistic videos of people
    saying things they never said - a scientific leap that raises
    unsettling questions about falsifying the moving image.

    In one demonstration, the researchers taped a woman speaking
    into a camera, and then reprocessed the footage into a new video
    that showed her speaking entirely new sentences, and even
    mouthing words to a song in Japanese, a language she does not
    speak. The results were enough to fool viewers consistently, the
    researchers report.

    The technique's inventors say it could be used in video games
    and movie special effects, perhaps reanimating Marilyn Monroe or
    other dead film stars with new lines. It could also improve
    dubbed movies, a lucrative global industry.

    But scientists warn the technology will also provide a powerful
    new tool for fraud and propaganda - and will eventually cast
    doubt on everything from video surveillance to presidential
    addresses.

    ''This is really groundbreaking work,'' said Demetri
    Terzopoulos, a leading specialist in facial animation who is a
    professor of computer science and mathematics at New York
    University. But ''we are on a collision course with ethics. If
    you can make people say things they didn't say, then potentially
    all hell breaks loose.''

    The researchers have already begun testing the technology on
    video of Ted Koppel, anchor of ABC's ''Nightline,'' with the aim
    of dubbing a show in Spanish, according to Tony F. Ezzat, the
    graduate student who heads the MIT team. Yet as this and similar
    technology makes its way out of academic laboratories, even the
    scientists involved see ways it could be misused: to discredit
    political dissidents on television, to embarrass people with
    fabricated video posted on the Web, or to illegally use trusted
    figures to endorse products.

    ''There is a certain point at which you raise the level of
    distrust to where it is hard to communicate through the
    medium,'' said Kathleen Hall Jamieson, dean of the Annenberg
    School for Communication at the University of Pennsylvania.
    ''There are people who still believe the moon landing was
    staged.''

    Currently, the MIT method is limited: It works only on video of
    a person facing a camera and not moving much, like a newscaster.
    The technique only generates new video, not new audio.

    But it should not be difficult to extend the discovery to work
    on a moving head at any angle, according to Tomaso Poggio, a
    neuroscientist at the McGovern Institute for Brain Research, who
    is on the MIT team and runs the lab where the work is being
    done. And while state-of-the-art audio simulations are not as
    convincing as the MIT software, that barrier is likely to fall
    soon, researchers say.

    ''It is only a matter of time before somebody can get enough
    good video of your face to have it do what they like,'' said
    Matthew Brand, a research scientist at MERL, a Cambridge-based
    laboratory for Mitsubishi Electric.

    For years, animators have used computer technology to put words
    in people's mouths, as they do with the talking baby in CBS's
    ''Baby Bob'' - creating effects believable enough for
    entertainment, but still noticeably computer-generated. The MIT
    technology is the first that is ''video-realistic,'' the
    researchers say, meaning volunteers in a laboratory test could
    not distinguish between real and synthesized clips. And while
    current computer-animation techniques require an artist to
    smooth out trouble spots by hand, the MIT method is almost
    entirely automated.

    Previous work has focused on creating a virtual model of a
    person's mouth, then using a computer to render digital images
    of it as it moves. But the new software relies on an ingenious
    application of artificial intelligence to teach a machine what a
    person looks like when talking.

    Starting with between two and four minutes of video - the
    minimum needed for the effect to work - the computer captures
    images which represent the full range of motion of the mouth and
    surrounding areas, Ezzat said.

    The computer is able to express any face as a combination of
    these faces (46 in one example), the same way that any color can
    be represented by a combination of red, green, and blue. The
    computer then goes through the video, learning how a person
    expresses every sound, and how it moves from one to the next.

    Given a new sound, the computer can then generate an accurate
    picture of the mouth area and virtually superimpose it on the
    person's face, according to a paper describing the work. The
    researchers are scheduled to present the paper in July at
    Siggraph, the world's top computer graphics conference.

    The effect is significantly more convincing than a previous
    effort, called Video Rewrite, which recorded a huge number of
    small snippets of video and then recombined them. Still, the new
    method only seems lifelike for a sentence or two at a time,
    because over longer stretches, the speaker seems to lack emotion.

    MIT's Ezzat said that he would like to develop a more complex
    model that would teach the computer to simulate basic emotions.

    A specialist can still detect the video forgeries, but as the
    technology improves, scientists predict that video
    authentication will become a growing field - in the courts and
    elsewhere - just like the authentication of photographs. As
    video, too, becomes malleable, a society increasingly reliant on
    live satellite feeds and fiber optics will have to find even
    more direct ways to communicate.

    ''We will probably have to revert to a method common in the
    Middle Ages, which is eyewitness testimony,'' said the
    University of Pennsylvania's Jamieson. ''And there is probably
    something healthy in that.''

    Compare original and synthetic videos from MIT on www.boston.com/globe.

    Gareth Cook can be reached at cook@globe.com.

    This story ran on page A1 of the Boston Globe on 5/15/2002. ©
    Copyright 2002 Globe Newspaper Company.

    ===============================This was distributed via the memetics list associated with the
    Journal of Memetics - Evolutionary Models of Information Transmission
    For information about the journal and the list (e.g. unsubscribing)
    see: http://www.cpm.mmu.ac.uk/jom-emit



    This archive was generated by hypermail 2b29 : Wed May 15 2002 - 15:59:29 BST