I was unable to complete portfolio 7 because I was unable to get pysqlite to work on my computer or the lab computers. Pysqlite needs Python 2.5 and will not work with Python 2.6....
Chapter 4 had to deal with searching and ranking. For example, Google's page rank algorithm. The first step is crawling, which is starting with a small set of documents and following the links from the documents to find new documents. After a large set of documents have been found they are indexed into a table with the documents and the locations of the words. The last step is returning a ranked list of documents.
Ranking queries can be done by creating a neural network which will associate searches with results based on what links people click on after they get a list of results. The neural network will use the new queries to alter the ranking of documents.
Content-Based Ranking gives a scores to pages for each query. This is done by using word frequency (The number of times the words from a given query are in the document), document location (The closer to the beginning a word is, the higher the score) and word distance (If multiple words are used in a query, then the closer those words are together in a document, the higher the score).
Many search engines also rely upon the number of times a link is clicked. The content-based ranking with the simple click count is how Google created the infamous Page-Rank Algorithm
Wednesday, April 22, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment