Could it be true? No need to ever use R again? Well, that's how it looks to me. Scikit-learn is a Python module for machine learning which seems to replicate almost all of the multivariate analysis modules I used to use in R. Thanks to Nikolas Fechner at the RDKit UGM for tuning me into this.
Let's see it in action for a simple example that uses SVM to classify irises (not the eyeball type). First, the R:
library(e1071) library(MASS) data(iris) mysvm <- svm(Species ~ ., iris) mysvm.pred <- predict(mysvm, iris) table(mysvm.pred,iris$Species) # mysvm.pred setosa versicolor virginica # setosa 50 0 0 # versicolor 0 48 2 # virginica 0 2 48
And now the Python:
from sklearn import svm, datasets from sklearn.metrics import confusion_matrix iris = datasets.load_iris() mysvm = svm.SVC().fit(iris.data, iris.target) mysvm_pred = mysvm.predict(iris.data) print confusion_matrix(mysvm_pred, iris.target) # [[50 0 0] # [ 0 48 2] # [ 0 0 50]]This library is quite new, but there seems to be quite a bit of momentum in the data processing space right now in Python. See also Statsmodels and Pandas. These videos from PyData 2012 give an overview of some of these projects.