Could it be true? No need to ever use R again? Well, that's how it looks to me. Scikit-learn is a Python module for machine learning which seems to replicate almost all of the multivariate analysis modules I used to use in R. Thanks to Nikolas Fechner at the RDKit UGM for tuning me into this.
Let's see it in action for a simple example that uses SVM to classify irises (not the eyeball type). First, the R:
library(e1071) library(MASS) data(iris) mysvm <- svm(Species ~ ., iris) mysvm.pred <- predict(mysvm, iris) table(mysvm.pred,iris$Species) # mysvm.pred setosa versicolor virginica # setosa 50 0 0 # versicolor 0 48 2 # virginica 0 2 48
And now the Python:
from sklearn import svm, datasets from sklearn.metrics import confusion_matrix iris = datasets.load_iris() mysvm = svm.SVC().fit(iris.data, iris.target) mysvm_pred = mysvm.predict(iris.data) print confusion_matrix(mysvm_pred, iris.target) # [[50 0 0] # [ 0 48 2] # [ 0 0 50]]This library is quite new, but there seems to be quite a bit of momentum in the data processing space right now in Python. See also Statsmodels and Pandas. These videos from PyData 2012 give an overview of some of these projects.
2 comments:
Great post Noel, but I don't think I'm ready to walk away from R just yet ... Although I'm a huge fan of Python and have been using scikit-learn and pandas, there are a few reasons to maintain my allegiance to R
R is a fantastic tool for interactive data analysis. There are so many great tools for slicing and dicing data, and packages like plyr give R capabilities that I can't find anywhere else. It's possible that all of this can be done with pandas, and I just need to get better at it.
There is incredible breadth in what's available in R. Over the last few years, R has become the standard for academic statistics and machine learning. When I go looking for an implementation of a new method, I can usually find an R package.
R has excellent plotting capabilities, especially with the addition of lattice and ggplot. I haven't found any other package that gives me the power and control over plots that I get with R.
I agree that R can be syntactically strange and that things are not implemented in a consistent fashion. After 10 years of using R, my brain is sufficiently warped that I'm starting to get the hang of it.
I don't disagree with any of this, but I have always found the process of using R frustrating. Unfortunately, R seemed to be so well established that I had despaired of ever being able to dispense with it.
Post a Comment