Google Is All About Large Amounts of Data

Tags: , , , , ,
Categories: electronic culture
Hits for this post:231
Tiny URL: http://r-echos.net/lk/11803
Monday, December 17th, 2007 at 11:23 am
Bookmark on del.icio.us | Twitter This Stumble This

Google Is All About Large Amounts of Data


In a very interesting interview from October, Google’s VP Marissa Mayer confessed that having access to large amounts of data is in many instances more important than creating great algorithms.

Right now Google is really good with keywords, and that’s a limitation we think the search engine should be able to overcome with time. People should be able to ask questions, and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions — not about what words will appear on the page but more like “what is this about?” A lot of people will turn to things like the semantic Web as a possible answer to that. But what we’re seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they’re done through brute force.

When you type in “GM” into Google, we know it’s “General Motors.” If you type in “GM foods” we answer with “genetically modified foods.” Because we’re processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart like it achieved that semantic understanding, but it hasn’t really. It has to do with brute force. That said, I think the best algorithm for search is a mix of both brute-force computation and sheer comprehensiveness and also the qualitative human component.

Marissa Mayer admitted that the main reason why Google launched the free 411 service is to get a lot of data necessary for training speech recognition algorithms.

You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video, we can do it with high accuracy.

Peter Norvig, director of research at Google, seems to agree. “I have always believed (well, at least for the past 15 years) that the way to get better understanding of text is through statistics rather than through hand-crafted grammars and lexicons. The statistical approach is cheaper, faster, more robust, easier to internationalize, and so far more effective.” Google uses statistics for machine translation, question answering, spell checking and more, as you can see in this video. The same video explains that the more data you have, the better your AI algorithm will perform, even if it isn’t the best.

Peter Norvig says that Google developed its own speech recognition technology. “We wanted speech technology that could serve as an interface for phones and also index audio text. After looking at the existing technology, we decided to build our own. We thought that, having the data and computational resources that we do, we could help advance the field. Currently, we are up to state-of-the-art with what we built on our own, and we have the computational infrastructure to improve further. As we get more data from more interaction with users and from uploaded videos, our systems will improve because the data trains the algorithms over time.”

Google is in the privileged position to gain access to large amounts of data that could be used to improve other services.

(Via Google Operating System.)

Related Posts




Comments are closed.

R-Echos

Subscribe in a reader




R-Echos context

Collections

* at the occasion of R-Echos issue 1 we organised some pages into topic oriented piles:

  • Displaying
  • un-Realisation
  • Physical Interface
  • Augmented Reality
  • Publishing
  • Geometry
  • Visualisation
  • Open Source Mobile Phone
  • Fab


  • Since 2004, R-Echos is an experimental online magazine dedicated to republication; topics vary from biology to graphic design, from ecology to business. It agglomerates anything which is about art, computing, science. His form is made out of collages of texts, links, images, references, videos and sounds - choosen with care to take part to this very personnal publication.



  • About
  • Articles
  • Beta version
  • Categories
  • Defragmentation
  • Directory
  • Fab
  • Index
  • Links
  • Monthly Archives
  • Open Source Mobile Phone
  • R-Echos issue 1
  • Somewhere else
  • Tags
  • Visual Index
  • Visualisation


  • Search R-Echos



    * curation / edition / selection is made by Electronest

    On Purpose: Design Concepts

    On Purpose: Design Concepts

    On Purpose: Design Concepts looks at conceptual design practices, the emergence of ‘meta design’, and the question of who or what can define something as design…
    With Åbäke, Droog Design, Daniel Eatock, Electronest, Ann-Sofie Back, Will Holder, Peter Jensen, Onkar Kular & Noam Toran, Metahaven, Alex Rich, Savage, Yuri Suzuki
    September 13 - [...]

    websites and White Cubes

    websites and White Cubes

    Dumb sign, originally uploaded by blackbeltjones.
    Been asked to work on the nominations for designs of the year again at the Design Museum, which is very nice.But it leads me back to this hoary old question – how should interactive work best be shown in a museum or gallery context? Should it be [...]

    R-Echos issue 1 - AMP001

    R-Echos issue 1

    An experiment in the economics of production: how can we shift focus from consumption of a finished product to investment in the processes of design, print & production?

    This is a poster and a text: an analog R-Echos
    Would you be interested in investing in the tangible production of this work?
    1. You can download the digital archive
    and [...]

    What if, VACANT LOT, Hoxton, London

    What if, VACANT LOT, Hoxton, London

    Related PostsBuilding and designing Digitalism’s IdealisticPaper Circuitssub-studio design blog: Herzog and de Meuron Parisian PyramidThe best CNC project machines - Hack a Daygreenpix zero-energy massive LED displayDIY Blubber BotBotanicalls Twitter DIYBuild Your Own War Bot - Wired How-To WikiHOW TO - Embroider digital imagesThe Shipyard ReturnsBottoms Up DoorbellThey [...]

    magazines as objects exhibition

    Colophon events this week

    Colophon events this week

    There are a couple of Colophon-related events in Europe this week. First up, Andrew Losowsky – that’s him above next to a copy of IsNotMagazine – has curated an exhibition of magazines as objects in Milan. CR Blog has an in-depth report with details – it sounds great, lots of magazine-y-ness. Andrew’s [...]



    R-Echos has its own tiny url system:

    * tiny url are url you can copy/paste into email without the risk of having a long line that surely will get broken and a link unusable.

    To get updates via email:

    mailinglist delivered via FeedBurner



    free advertising network