December 22, 2008 9:01 am GMT
Ethics of content scraping and feed aggregation
by Gary IllyesNowadays’ one of the most interesting phenomenon of the internet is content scraping and blogs’ feed aggregation. Many complain about it, but I think not all of the cases are a good reason for complaining.
There are two type of content scarping: when the scraper links back to your website making it clear that the published content is “borrowed” from a 3rd party, and another case when the content is stolen as is, without linking back to the author and published as it would be the scraper’s own content.
The second case is considered plagiarism and it does give headaches for the original author because even though he worked hard to publish an article, there is no profit in return. This is bad and action should be taken in every case. Mammoth sites like CNN or BBC have dedicated teams which handles these cases; usually they contact the host of the scraper with a take-down notice. If they fail they either give up — unlikely –, or they contact a DMCA attorney which handles the case in a legal manner. Small websites usually have the only option to contact the webmaster of the website which stolen the content with a take-down or link-back inquiry, but what will happen, depends solely on the scraper: if he is in good mood, he will link back or take the content down; if he’s not, will ‘forget’ to answer.
Another form of scraping is when the webmaster posts an excerpt of your content and links back. This is how Technorati works like: they grab your RSS feed and takes the excerpt of your posts, then republish it on their website, linking back to the original article.
The latter case is in your benefit as search engines counts the link to your article as an inbound link, thus increasing — slightly — your page’s rank. Additionally, if a user of Technorati finds your post’s excerpt interesting, they will visit your site to read the whole article.
The conclusion is, if you decide to scrape content keep in mind the following:
- Publish only the excerpt of the original post, thus forcing the users to visit the author’s website if they find the excerpt interesting
- Never forget to link back to the original article with a — possibly — nofollow-free link
















How to write a content scraper or feed aggregator for Wordpress in 10 minutes with PHP and cURL | Developer Oracles on Mon, 29th Dec 2008 9:40 am
[...] few days ago we published an article which sheds some light on the ethics of content [...]