Tuesday 13 November 2007

Using R from Python - the best of both worlds

R is a programming environment containing every known statistical method; however, it's ugly and difficult to program in. Python has only a small number of statistical methods (which are available through SciPy), but it's elegant and easy to program in. Both have graphing libraries which leave a lot to be desired in terms of ease of use, but the graphs from Python's matplotlib are a lot more polished. So, I'd like to use R from Python...

In order to do this, you will need to install the RPy library (as well as Python and R, of course). I'm working on Windows, but I've used this previously on Linux. As an example, here are three ways to access the same hclust object from Python. They each illustrate different aspects of the Python/R interface:
from rpy import *

# START OF METHOD 1
hclust = r("""
a <- read.table("cpOfSimMatrix.txt")
mydist <- dist(1-a)
hclust(mydist)
""")
r("rm(a)") # Ends with an error if you leave anything in memory
# END OF METHOD 1

# START OF METHOD 2
set_default_mode(NO_CONVERSION)
a = r.read_table("cpOfSimMatrix.txt")
mydist = r.dist(r["-"](1,a)) # Note the trick for '1-a' here
set_default_mode(BASIC_CONVERSION)
hclust = r.hclust(mydist) # Converts R object to Python dict
# END OF METHOD 2

# START OF METHOD 3 (here's one I created earlier)
r.load(".RData")
hclust = r('myHclust')
# END OF METHOD 3

For further examples, check out Peter Cock's excellent pages on combining R and Python.

2 comments:

Mandar said...

This is super helpful. I am a new user to both R and Python and definitely want to combine both. Would you mind if I hit you up with more questions as I come across them?

Noel O'Boyle said...

I'm glad you found this useful, but for any additional info, you should check out the rpy mailing list and associated docs.