Wednesday, April 29, 2009

Portfolio Assignment 9

For our final project we took a closer look at the N.E.R.O. machine learning game. N.E.R.O. stands for Neuro-Evolving Robotic Operatives. The game allows the user to train a team of robots to perform simple tasks, like going around a wall.
When the training your robots for the first time, your troops just run around. This is because the troops have not been trained to do anything. The first step is to put a static enemy in and reward your troops for approaching the enemy and firing upon the enemy. Next the user can put a wall between the troops, so the robots cannot see the enemy. Depending on how your robots are rewarded, your troops should start to navigate around the wall and fire upon the enemy. After this task, you can start to train your troops to defeat a maze, a turret and even a group of moving enemies.
There were several setbacks while using N.E.R.O., such as N.E.R.O. crashing every twenty minutes of use. This was a major setback because time is needed when training your robots to do certain tasks. We wanted to be able to start the training and let the troops learn overnight and with the crashing of the program we couldn't do it. Another set back was the intensive calculations required on for the program to run, which made running N.E.R.O. hard to do on my laptop for more than a few minutes at a time without the laptop heat way up. 
I enjoyed using N.E.R.O. and I found it interesting to see the robots learning. After training for an hour I compared my troops to untrained troops and the difference was amazing. If we were able to allow the program to run for 24 hours with converging the brains of the fittest robot every five minutes we would see the extent of the training capabilities of N.E.R.O.. 

Monday, April 27, 2009

Random Thing I found on Engadget

http://www.engadget.com/2009/04/27/ibms-watson-to-rival-humans-in-round-of-jeopardy/

Thursday, April 23, 2009

Portfolio 8

Chapter 10: Finding Independent Features

The goal of the unsupervised technique used in from chapter 10 is not trying to determine the outcomes from the data, but rather trying to charaterize the data from sets of data that are not labled with a specific outcome. Feature extraction is the process of finding new data rows that can be used in combination to recontruct the rows from the original dataset. The book used the cocktail party problem to describe feature extraction.
The cocktail party problem is the problem of understanding one person talking while many people are talking in the same room. The human brain seperates all the sounds and focuses on the one voice. A computer can be programmed to do the same thing. Feature extraction can also identify recurring word-usage patterns in a set of documents, which allows the computer to determine the independent features in each document.Then the computer can categorize Articles into themes using the indepentent features extracted from the documents.
Non-Negative Matrix Factorization(yes, this is a real-world application for Linear Algebra) uses a features matrix (Has a row for each feature and a column for every word with the values showing how important each word is to each feature) and a weights matrix (Maps the features to the articles matrix). When the features matrix and the weights matrix are multiplied they recreate a dataset similar to the original dataset.
Matrix operations may be used in Python if the NumPy package is used. An unnamed alogrithm is used uses NumPy to reconstruct the articles matrix as closely as possible by calculating the best features and weights matrices.

The following are used

data matrix: The original articles matrix.
hn: The transposed weight matrix by the data matrix.
hd: The transposed weights matrix multiplied by the weights matrix multiplied by the features matrix.
wn: The data matrix multiplied by the transposed features matrix.
wd: The weights matrix multiplied by the features matrix multiplied by the transposed features matrix.

To display the result the computer must go through each of the features and create a list of all words and their weights. Then the computer should display the top weighted words from the list and then go through all the articles and sort by their weights. Usually only the top articles are displayed.

The chapter ends with a stock market example which uses features extraction and Non-Negative Matrix Factorization.

Wednesday, April 22, 2009

Portfolio 7

I was unable to complete portfolio 7 because I was unable to get pysqlite to work on my computer or the lab computers. Pysqlite needs Python 2.5 and will not work with Python 2.6....

Chapter 4 had to deal with searching and ranking. For example, Google's page rank algorithm. The first step is crawling, which is starting with a small set of documents and following the links from the documents to find new documents. After a large set of documents have been found they are indexed into a table with the documents and the locations of the words. The last step is returning a ranked list of documents.

Ranking queries can be done by creating a neural network which will associate searches with results based on what links people click on after they get a list of results. The neural network will use the new queries to alter the ranking of documents.

Content-Based Ranking gives a scores to pages for each query. This is done by using word frequency (The number of times the words from a given query are in the document), document location (The closer to the beginning a word is, the higher the score) and word distance (If multiple words are used in a query, then the closer those words are together in a document, the higher the score).

Many search engines also rely upon the number of times a link is clicked. The content-based ranking with the simple click count is how Google created the infamous Page-Rank Algorithm