How Copiepresse and Google could have avoided their lawsuit
Wed 14 Feb 07 01:25 | Tags: Internet
News broke earlier today that a Belgian court that handed down a ruling against Google in a lawsuit perpetrated by Copiepresse (site in French), a group representing various newspapers in the country. It's difficult to ascertain exactly what their issue was from that AP article, which at times makes it seem like Copiepresse is taking objection to Google merely linking to their web sites, but I doubt that's the case; not many commercial sites complain about potential visitors being able to find their sites too easily. My best guess is that they merely objected to Google's caching features storing a copy of their articles on Google's server and then offering that cached article for searchers to read. The newspaper group objects because they want to offer those articles to read for a price on their own site.
My first thought is that it must be the newspapers' faults for not locking down their sites enough so that the Google spider and cache bots can find their premium articles, because if those bots can see those articles for free, anyone can. Then, I realized what's probably going on here is that the papers are initially offering the articles for free, at which time Google's bots cache them; then locking the articles down for paid reading after a certain amount of time, while the cached free version lives on in Google's servers. This seems like a more likely scenario.
Something still seemed fishy, though. I know that there are commands one can put in the code of their web pages which can stop search engine bots from indexing a site. Surely there's a command to stop a bot from caching pages, but still allow it to crawl and index it? Sure enough, after doing a little research, I found the answer -- on Google's site, no less.
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
Or, in proper XHTML:
<meta name="robots" content="noarchive" />
So with just this single extra line of code on their web pages, the Belgian group could have stopped their copyrighted information from being cached by Google. Instead, they got lawyers involved. Par for the course, I suppose.
The bigger issue, though, is if Google is violating copyright laws with their caching activities. (The same could be asked about other web sites that cache, like the Wayback Machine). If it is an act of copyright violation to make an unauthorized copy of copyrighted materials and offer it to others, then Google would definitely be in violation, even if the original copyright holder could have taken very simple steps to prevent the "theft." Perhaps the proper thing for Google (and other caching servers) to do is to not cache all pages which don't have the "no archive" line above, but to not cache any pages unless they have a "please archive" line. That way, those who don't mind having their pages cached can still allow Google to do so, and Google doesn't have to worry about future copyright trouble along these lines.
Let's see how this case turns out.
Oh yeah, one more thing: Folks, the word "cache" is pronounced like "cash," not like "cash-ay." The newsreader on the talk radio station where I first heard about this story pronounced it that way, and she's not the only one I've heard mispronounce it. It's wrong, dang it.
Get more great Ray Gun Robot content sent directly to your feed reader or email inbox! Subscribe today!
Articles & Links — Via Email
Articles Only — Via Email
0 Comments | 0 Trackbacks |
| ![]()

