Quantity Over Quality at Google Book Search

Tags: , , , , , , , ,
Categories: electronic culture
Hits for this post:671
Tiny URL: http://r-echos.net/lk/11854
Saturday, January 5th, 2008 at 5:01 pm
Bookmark on del.icio.us | Twitter This Stumble This

Quantity Over Quality at Google Book Search

Campus Technology has a well-documented article about Google Book Search: The Good, the Bad & the Ugly, which suggests that Google’s project is more about quantity than quality. For example, The University of California has to deliver 3,000 books a day to Google, according to their agreement. “All of the libraries are talking about that, in the sense of what might be the most interesting materials to scan. But I’ll be very frank: There’s a real balance point between volume and selection, especially when looking at these numbers. UC is trying to meet the needs of the contract it’s signed,” says Robin Chandler, former director of data acquisitions for UC’s California Digital Library.

And since Google has to scan a lot of books, it needs a scalable scanning technology. “When it first started, the technical challenge was simply building a scanning device that worked. The next technical challenge was being able to run this scanning process at scale. We would have been quite happy to use commercial scanning technologies if they were adequate to scale to this. We only built our own scanning process because that was the way to make this project achievable for Google,” says Dan Clancy from Google.

Surprisingly, the scanning process involves humans, as you can see in some books from Google’s index (TechCrunch, Google Blogoscoped, George Hernandez, The Genealogue spotted fingers). “If you go into Google [Book Search] and look at any book, you’ll be able to see by the number of body parts and fingerprints that [the pages] are being turned manually,” suggests Linda Becker, VP at Kirtas, the company that produces the fastest robotic book scanner in the world: APT BookScan 2400. “If you were to go to the Google site, you’d see that one out of every five pages is either missing, or has fingers in it, or is cut off, or is blurry.”


Larry Page announced in October 2007 that the book search index is “over a million books”. A search for “now” returns 2,190,600 results (1,740,600 available in limited preview and 214,600 fully available for reading and downloading).

The conclusion of the article is optimistic:

When it comes down to it, then, this brave new world of book search probably needs to be understood as Book Search 1.0. And maybe participants should not get so hung up on quality that they obstruct the flow of an astounding amount of information. Right now, say many, the conveyor belt is running and the goal is to manage quantity, knowing that with time the rest of what’s important will follow. Certainly, there’s little doubt that in five years or so, Book Search as defined by Google will be very different. The lawsuits will have been resolved, the copyright issues sorted out, the standards settled, the technologies more broadly available, the integration more transparent.

(Via Google Operating System.)

Related Posts




Comments are closed.

R-Echos

Subscribe in a reader


Since 2004, R-Echos is an experimental online magazine dedicated to republication; topics vary from biology to graphic design, from ecology to business. It agglomerates anything which is about art, computing, science. His form is made out of collages of texts, links, images, references, videos and sounds - choosen with care to take part to this very personnal publication.



  • About
  • Articles
  • Beta version
  • Categories
  • Defragmentation
  • Directory
  • Index
  • Links
  • Monthly Archives
  • R-Echos issue 1
  • Somewhere else
  • Tags
  • Visual Index
  • Visualisation


  • Search R-Echos



    * curation / edition / selection is made by Electronest

    On Purpose: Design Concepts

    On Purpose: Design Concepts

    On Purpose: Design Concepts looks at conceptual design practices, the emergence of ‘meta design’, and the question of who or what can define something as design…
    With Åbäke, Droog Design, Daniel Eatock, Electronest, Ann-Sofie Back, Will Holder, Peter Jensen, Onkar Kular & Noam Toran, Metahaven, Alex Rich, Savage, Yuri Suzuki
    September 13 - [...]

    websites and White Cubes

    websites and White Cubes

    Dumb sign, originally uploaded by blackbeltjones.
    Been asked to work on the nominations for designs of the year again at the Design Museum, which is very nice.But it leads me back to this hoary old question – how should interactive work best be shown in a museum or gallery context? Should it be [...]

    R-Echos issue 1 - AMP001

    R-Echos issue 1

    An experiment in the economics of production: how can we shift focus from consumption of a finished product to investment in the processes of design, print & production?

    This is a poster and a text: an analog R-Echos
    Would you be interested in investing in the tangible production of this work?
    1. You can download the digital archive
    and [...]

    What if, VACANT LOT, Hoxton, London

    What if, VACANT LOT, Hoxton, London

    Related PostsBuilding and designing Digitalism’s IdealisticThe best CNC project machines - Hack a Daygreenpix zero-energy massive LED displayDIY Blubber BotBotanicalls Twitter DIYBuild Your Own War Bot - Wired How-To WikiHOW TO - Embroider digital imagesThe Shipyard ReturnsBottoms Up DoorbellThey were flexible in the fifties tooThe Magic Roundabout, SwindonPrintBot [...]

    magazines as objects exhibition

    Colophon events this week

    Colophon events this week

    There are a couple of Colophon-related events in Europe this week. First up, Andrew Losowsky – that’s him above next to a copy of IsNotMagazine – has curated an exhibition of magazines as objects in Milan. CR Blog has an in-depth report with details – it sounds great, lots of magazine-y-ness. Andrew’s [...]



    Collections

    * at the occasion of R-Echos issue 1 we organised some pages into topic oriented piles:

  • Displaying
  • un-Realisation
  • Physical Interface
  • Augmented Reality
  • Publishing
  • Geometry
  • Visualisation


  • R-Echos has its own tiny url system:

    * tiny url are url you can copy/paste into email without the risk of having a long line that surely will get broken and a link unusable.



    R-Echos context

    To get updates via email:

    mailinglist delivered via FeedBurner



    free advertising network