Web 3.0: When Web Sites Become Web Services

by iatanasov 10. June 2007 08:48
Today's Web has terabytes of information available to humans, but hidden from computers. It is a paradox that information is stuck inside HTML pages, formatted in esoteric ways that are difficult for machines to process. The so called Web 3.0, which is likely to be a pre-cursor of the real semantic web, is going to change this. What we mean by 'Web 3.0' is that major web sites are going to be transformed into web services - and will effectively expose their information to the world.

The transformation will happen in one of two ways. Some web sites will follow the example of Amazon, del.icio.us and Flickr and will offer their information via a REST API. Others will try to keep their information proprietary, but it will be opened via mashups created using services like Dapper, Teqlo and Yahoo! Pipes. The net effect will be that unstructured information will give way to structured information - paving the road to more intelligent computing. In this post we will look at how this important transformation is taking place already and how it is likely to evolve.

The Amazon E-Commerce API - open access to Amazon's catalog

We have look about Amazon's visionary WebOS strategy. The Seattle web giant is reinventing itself by exposing its own infrastructure via a set of elegant APIs. One of the first web services opened up by Amazon was the E-Commerce service. This service opens access to the majority of items in Amazon's product catalog. The API is quite rich, allowing manipulation of users, wish lists and shopping carts. However its essence is the ability to lookup Amazon's products.

Why has Amazon offered this service completely free? Because most applications built on top of this service drive traffic back to Amazon (each item returned by the service contains the Amazon URL). In other words, with the E-Commerce service Amazon enabled others to build ways to access Amazon's inventory. As a result many companies have come up with creative ways of leveraging Amazon's information.

The rise of the API culture

The web 2.0 poster child, del.icio.us, is also famous as one of the first companies to open a subset of its web site functionality via an API. Many services followed, giving rise to a true API culture. John Musser over at programmableweb has been tirelessly cataloging APIs and Mashups that use them. This page shows almost 400 APIs organized by category, which is an impressive number. However, only a fraction of those APIs are opening up information - most focus on manipulating the service itself. This is an important distinction to understand in the context of this article.

The del.icio.us API offering today is different from Amazon's one, because it does not open the del.icio.us database to the world. What it does do is allow authorized mashups to manipulate the user information stored in del.icio.us. For example, an application may add a post, or update a tag, programmatically. However, there is no way to ask del.icio.us, via API, what URLs have been posted to it or what has been tagged with the tag web 2.0 across the entire del.icio.us database. These questions are easy to answer via the web site, but not via current API.

Standardized URLs - the API without an API

Despite the fact that there is no direct API (into the database), many companies have managed to leverage the information stored in del.icio.us. Here are some examples...

How Web Scraping Works

Web Scraping is essentially reverse engineering of HTML pages. It can also be thought of as parsing out chunks of information from a page. Web pages are coded in HTML, which uses a tree-like structure to represent the information. The actual data is mingled with layout and rendering information and is not readily available to a computer. Scrapers are the programs that "know" how to get the data back from a given HTML page. They work by learning the details of the particular markup and figuring out where the actual data is. For example, in the illustration below the scraper extracts URLs from the del.icio.us page. By applying such a scraper, it is possible to discover what URLs are tagged with any given tag.

This sounds great, but is this legal?

Scraping technologies are actually fairly questionable. In a way, they can be perceived as stealing the information owned by a web site. The whole issue is complicated because it is unclear where copy/paste ends and scraping begins. It is okay for people to copy and save the information from web pages, but it might not be legal to have software do this automatically. But scraping of the page and then offering a service that leverages the information without crediting the original source, is unlikely to be legal.

But it does not seem that scraping is going to stop. Just like legal issues with Napster did not stop people from writing peer-to-peer sharing software, or the more recent YouTube lawsuit is not likely to stop people from posting copyrighted videos. Information that seems to be free is perceived as being free. 

The opportunities that will come after the web has been turned into a database are just too exciting to pass up. So if conversion is going to take place anyway, would it not be better to rethink how to do this in a consistent way?

Why Web Sites should offer Web Services

There are several good reasons why Web Sites (online retailers in particular), should think about offering an API. The most important reason is control. Having an API will make scrapers unnecessary, but it will also allow tracking of who is using the data - as well as how and why. Like Amazon, sites can do this in a way that fosters affiliates and drives the traffic back to their sites.

The old perception is that closed data is a competitive advantage. The new reality is that open data is a competitive advantage. The likely solution then is to stop worrying about protecting information and instead start charging for it, by offering an API. Having a small fee per API call (think Amazon Web Services) is likely to be acceptable, since the cost for any given subscriber of the service is not going to be high. But there is a big opportunity to make money on volume. This is what Amazon is betting on with their Web Services strategy and it is probably a good bet.

 

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , ,

web service | asp.net 2.0

Related posts

Add comment


(Will show your Gravatar icon)  

  Country flag




Live preview

October 12. 2008 01:18

Gravatar

Powered by BlogEngine.NET 1.1.0.7
Theme by Mads Kristensen

About the author

Ivan Atanasov - web developer
E-mail me Send mail Subscribe Feed

Calendar

<<  October 2008  >>
MoTuWeThFrSaSu
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789

View posts in large calendar

Pages

    Recent posts

    Recent comments

    Authors

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2008 it-coder.com

    Sign in