Friday 8 February 2008

Searching PubMed with Python

[Update 30Nov09] The Bio.EUtils module has been replaced by Bio.Entrez. This means that the following code and links will no longer work. However, see the comments at the end of the email where readers have posted updated code.

Given a DOI or PMID, how can you find metadata for a publication using Python? The EUtils module of BioPython, by Andrew Dalke, is your friend.

For a DOI, you need to do a search:
from Bio import EUtils
from Bio.EUtils import DBIdsClient

doi = "10.1016/j.jmb.2007.02.065"

client = DBIdsClient.DBIdsClient()
result = client.search(doi + "[aid]", retmax = 1)
summary = result[0].summary()

For a PMID, there's a more direct method:
from Bio import EUtils
from Bio.EUtils import DBIdsClient

PMID = "17238260"
result = DBIdsClient.from_dbids(EUtils.DBIds("pubmed", PMID))
summary = result[0].summary()

So, what can you do with the summary? Something like the following maybe:
>>> data = summary.dataitems
>>> print data.keys()
['DOI', 'Title', 'Source', 'Volume', ...., ]
>>> print "%s. %s %s %s, %s, %s." % (
... ", ".join(data['AuthorList'].allvalues()),
... data['Title'], data['Source'], data['PubDate'].year,
... data['Volume'], data['Pages'])
...
O'Boyle NM, Holliday GL, Almonacid DE, Mitchell JB. Using
reaction mechanism to measure enzyme similarity. J Mol Biol
2007, 368, 1484-99.

For more info:

7 comments:

  1. Thankyouthankyouthankyou.

    This is just what I needed, so score one victory for Noel O'Blog and Google-Fu!

    ReplyDelete
  2. Update: just spent an hour or so figuring out that EUtils is gone, correct module is Bio.Entrez.

    ReplyDelete
  3. the link, http://biopython.org/DIST/docs/api/public/Bio.EUtils-module.html is 404.

    ReplyDelete
  4. @Bennest, Skylar: Yes - the blog post is now out of date. If you figure out the new code, I will be happy to update the post.

    ReplyDelete
  5. Download BioPython from
    http://biopython.org/wiki/Download

    The use the following code:

    from Bio import Entrez

    Entez.email='foo@bar.com'
    database='pubmed'
    myPMID='12345678'

    handle = Entrez.efetch(db=database, id=myPMID, retmode="text", rettype="medline", tool="BioPython, Bio.Entrez")
    record = handle.read()

    Note that you can use other retmodes, for instance HTML or XML.

    Hope this helps, feel free to modify and repost.

    Ben

    ReplyDelete
  6. Thanks Bennest. I will publish an updated post in the next few weeks and add a link to it on this blog post.

    ReplyDelete
  7. Just adding a few lines to Bennest's post.

    from Bio import Entrez

    Entez.email='foo@bar.com'
    database='pubmed'
    myPMID='12345678'

    handle = Entrez.efetch(db=database, id=myPMID, retmode="text", rettype="medline", tool="BioPython, Bio.Entrez")
    record = handle.read()

    firstPMID = record["IdList"][0]
    handle = Entrez.efetch(db="pubmed", id=firstPMID)
    print handle.read()

    ReplyDelete