Wednesday, April 22, 2009

Portfolio 7

I was unable to complete portfolio 7 because I was unable to get pysqlite to work on my computer or the lab computers. Pysqlite needs Python 2.5 and will not work with Python 2.6....

Chapter 4 had to deal with searching and ranking. For example, Google's page rank algorithm. The first step is crawling, which is starting with a small set of documents and following the links from the documents to find new documents. After a large set of documents have been found they are indexed into a table with the documents and the locations of the words. The last step is returning a ranked list of documents.

Ranking queries can be done by creating a neural network which will associate searches with results based on what links people click on after they get a list of results. The neural network will use the new queries to alter the ranking of documents.

Content-Based Ranking gives a scores to pages for each query. This is done by using word frequency (The number of times the words from a given query are in the document), document location (The closer to the beginning a word is, the higher the score) and word distance (If multiple words are used in a query, then the closer those words are together in a document, the higher the score).

Many search engines also rely upon the number of times a link is clicked. The content-based ranking with the simple click count is how Google created the infamous Page-Rank Algorithm

No comments:

Post a Comment