For many cheminformaticians, R is the preferred way of analysing multivariate data and developing predictive models. However, it is not so widely known that there are R packages available that are directly aimed at handling chemical data.
Over the last few years, Rajarshi Guha (Indiana University) has been doing some nice work integrating the CDK and R. His publication in J. Stat. Soft., Chemical Informatics Functionality in R, describes the rcdk and rpubchem packages. The rcdk package allows the user to read SDF files directly into R, calculate fingerprints and descriptors, calculate Tanimoto values, view molecules in 2D (JChemPaint) and 3D (Jmol), calculate SMILES strings, and access the property fields of file formats. The rpubchem package is focused on downloading compounds, property values and assay data from PubChem. See also articles in R news and CDK news , , .
A more recent development is ChemmineR, described in the latest issue of Bioinformatics: "ChemmineR: a compound mining framework for R". The authors appear to be unaware of the earlier work by Rajarshi, and so there is no comparison of available features. However, based on the documentation on their website, it seems that much of the functionality revolves around a type of fingerprint called atom-pair descriptors (APD). SDF files, when read in, are converted to a database of APDs and these can be used for similarity searching, clustering, removal of duplicates and so on. Sets of molecules can be visualised using a web connection to the ChemMine portal (I'm not sure what software is used). According to the documentation, future work will include descriptor calculation with JOELib.
So, there you have it. An exhaustive survey of the two available methods for bringing chemistry into R. Is the time ripe for a cheminformatics equivalent to Bioconductor?