Thursday 18 February 2010

Chemistry in R - Upcoming workshop

Some time ago, I described the available libraries for carrying out cheminformatics in R. Much of this work has been carried out by Rajarshi Guha, who has developed the R packages RCDK, RPubChem and fingerprint.

Now's your chance to have a masterclass in the use of these packages because Rajarshi is going to be giving an intensive workshop on their use in May at the EBI. Information on the workshop is available on its webpage, but all you need to know is that it costs a whopping £25 per day (the second day is on metabolomics and R) and you might want to book early.

I already have. See you there!

Here's the timetable on the first day:
  • 09:15 Brief Introduction to R
  • A very brief overview of R from a programming and application point of view. Will look at some basic programming constructs and then survey packages that are useful for cheminformatics problems. This session will also briefly touch on RDBMS access from R.
  • 10:45 Brief Introduction to the CDK
  • An overview of CDK functionality. This will be relatively high level and will not go into the nitty gritty details of Java programming. Will serve to highlight what can (and cannot) be done from R.
  • 11:30 Input/Output & Molecular Manipulations
  • Reading and writing chemical structures from various sources and in various formats. What does I/O entail in the CDK programming model? How does it affect working in R? Once we have a set of molecules, what can we do with them? We’ll cover accessing atoms and bonds, setting and getting properties on molecules and so on.
  • 13:30 Fingerprints & Similarity Searching
  • I’ll discuss accessing the various fingerprint methods of the CDK and manipulating fingerprints using the fingerprint R package. I’ll also address reading fingerprint data from files generated by other programs.
  • 14:00 Descriptors and QSAR Modeling
  • QSAR modeling is a common cheminformatics task. Key to developing QSAR models is the evaluation of molecular descriptors. In this session, I’ll cover the available types of descriptors and how one evaluates them. We’ll then run through examples of developing QSAR models, starting from molecule loading and ending at a final model.
  • 15:45 R, CDK and Chemical Databases
  • Getting chemical structure and bioassay information from PubChem using R. This section will overview the functions that let one retrieve structures, assay information and assay data. I will also highlight current limitations in terms of data size and ways around these limits.
  • 16:45 Adding New Functionality
  • This session will highlight how one might go about extending the package – either by wrapping calls to the CDK in R or by adding your own Java methods and then calling them.
  • 17:15 Additional Practical and Questions

No comments: