Exercise 1: Tanimoto Score
Here is my Tanimoto score function:
# Returns the Tanimoto score for person1 and person2
def tanimoto_score(prefs,person1,person2):
# Get the list of shared_items
si={}
for item in prefs[person1]:
if item in prefs[person2]: si[item]=1
# if they have no ratings in common, return 0
if len(si)==0: return 0
# Add up the squares of all the differences
tanimoto =sum([prefs[person1][item]*prefs[person2][item] / (pow(prefs[person1][item],2)+pow(prefs[person2][item],2)-prefs[person1][item]*prefs[person2][item]) for item in prefs[person1] if item in prefs[person2]])
return 1/(1+tanimoto)
When I ran the function I got:
>>> recommendations.tanimoto_score(recommendations.critics,'Lisa Rose', 'Gene Seymour')
0.15581371067992209
To calculate the Tanimoto Score I used:
From: http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29
The Tanimoto Score is often used to find similarity between two documents.
Part 1: Weka
Part 2: Cleveland Heart Disease Dataset
I downloaded the dataset in the ARFF file format, which made this part very quick to do. When I classified the the data I found,
Correctly Classified Instances: 77.558%
Incorrectly Classified Instances: 22.4422%
Total Instances: 303
I believe the J48 classifier is a fairly accurate method of analyzing data, however, I believe running several types of classifiers would help us predict more accurate information. When I ran the Random Forest classifier I found,
Correctly Classified Instances: 81.5182%
Incorrectly Classified Instances: 18.4818%
Total Instances: 303
The Random Forest classifier yielded more accurate information. When I ran the Decision Stump I yielded less accurate information than the J48 classifier.
Correctly Classified Instances: 71.6172%
Incorrectly Classified Instances: 28.3828%
Total Instances: 303
After the comparisons, I came to the conclusion that multiple classifiers should be ran when analyzingdatasets in order to give the best results possible.
I found that the majority of the people in the dataset were male (Total: 207) while women were the minority (Total: 96). The amount of people with fbs was 258 versus while 48 did not. The median age was 55.366 and thal was the first set of data on the tree.
No comments:
Post a Comment