Wednesday, 5 September 2007

Scooby, D. D., where are you? - Searching for papers online

One of my pet annoyances is searching on a journal website for an article that I know exists, but not being able to find it. Let's take a real-life example...

I met Douglas Hawkins at the ACS and wanted to look up his papers in JCIM/JCICS. I've bookmarked the JCIM TOC page, so I go there. At the top is a handy little shortcut for finding papers by a particular author. So I type in "Hawkins" next to the Author drop-down box, and click "Search"...

10 documents. Of which half are Douglas Hawkins (the right guy), and the other half are Donald Hawkins (the wrong guy). In addition, both JCIM and JCICS were searched. Great. This is where I should have stopped. Instead I decided I only wanted to find those papers by Douglas Hawkins. The following are my attempts to find his papers:

Hawkins, D. M.: 7 article. All false positives.
Hawkins D M: ditto
"Hawkins, D. M.": 0 documents
D M Hawkins: It's my favourite 7 articles again.
D Hawkins: 24 articles. No articles by D. M. Hawkins included.

But how can I have found 24 articles for the last search? That's more than I found with just "Hawkins". Wait a second, I've been moved onto the "Advanced Article Search" page, which is searching all of the journals. So what did it find instead? It found "...Mass, J. D.; Hawkins, A. R.;". Pretty advanced searching, eh?

Thankfully, I happen to know that unlike most chemistry journals, the ACS has contributed data to PubMed. So off I go, and try:

Hawkins, D. M.: first hit contains "M. M. Hawkins"
DM Hawkins: first hit contains "Hawkins EC"
Hawkins DM: jackpot!

Now I just want those papers where Basak is a co-author:
Hawkings DM Basak: jackpot!

Incidentally, this is the first time that Pubmed has ever worked for me. This is because I always try "DM Hawkins Basak" which gives no hits (even after reading the instructions I was never able to find anything). As for how to limit the results to a particular journal? Why isn't there a drop-down box with a list of journals? Why do I have to read the instructions for the obscure syntax used by PubMed every time I want to find a paper?

What about Google scholar? "DM Hawkins" or "D. M. Hawkins" gives 1700 hits, of which the first page at least are true positives. Advanced Search allows me to specify the journal, and I get 1 hit with "J Chem Inf Comput Sci", a different hit for "J Chem Inf Comp Sci" (of which there are 5 versions, one of which has "Comput Sci" instead of "Comp Sci") and 1 for "J Chem Inf Model". In fact, Hawkins has 3 papers in JCICS and 2 in JCIM. You look good Google, but you're not trying very hard.

Searching by author should be easy. It's not quantum mechanics. It's not even rocket science. Can't they make it work better?

6 comments:

Mike said...

Hi. I have just read your post and thought I'd have a go at searching for those articles you mention. It also annoys me that article searching is such an art rather than the easy understandable task it should be. I always seem to end up spending more time than I should of searching for papers that I know exsist!

So here's my attempt (for what it is worth). If you search for Hawkins from the JCIM TOC page it returns 10 results (Hawkins DM and Hawkins DT) then I searched for Basack using the 'Search within results' box you get the 5 articles you refer to.

It is strange that it searches both JCIM and JCompInfSci at the same time. When you search all of the ACS for Hawkins you get 311 results then searching for Basak gives you 7, two are for Hawkins GD and irrelevant.

Next, Google. I do think Google Scholar is good but it is more like a shotgun rather than a scalpel. If you use site:pubs.acs.org basak hawkins you return 6 results 3 of which are from JCIM and JCompInfSci, again it is strange that the search misses 2 of the articles. Just a quick side note if you do a normal web search for basak hawkins dm your blog post is the first result so you can't complain that google isn't up to date.

Using thee Web of Knowledge finds 5 articles for DM Hawkins in JCIM and JCompInfSci and 4 with Basak. I never use PubMed, I can never seem to work it out either. Still some way to go before Google becomes my default scholarly search engine.

PS Like the blog.

Rich Apodaca said...

I've had exactly the same experience with the ACS site - even for subjects. I thought it was just me. My fall-backs are SciFinder (when I had access to it) or even Google. Good to know PubChem works (and how it works).

Rich Apodaca said...

Make that "PubMed", not "PubChem".

baoilleach said...

@mike: Thanks for that. So it seems that on the ACS website, if you know two surnames you're fine. The problem is finding papers by a single person whose initials you know, which was the actual problem (I just digressed a bit in my blog post to bulk it up a bit :-) ).

About two years ago, JCICS and JCID joined to form JCIM. So it's actually quite a nice feature that it searches the historical ancestor of JCIM.

It's interesting that you use Google instead of Google Scholar. "site:" is one of a really useful feature of Google that I keep forgetting. For example to search the CCL mailing list, you can use "site:ccl.net".

Regarding quick indexing by Google, it's not so surprising given that Google owns Blogger since earlier this year. It's strange it comes top of the list though - I wonder does it prioritise Blogger posts in search results. I'm not sure though whether being top of the list for "basak hawkins dm" is equivalent to the Slashdot effect.

@rich: Incidentally, I also checked up Hubmed ("Pubmed reloaded") but it's the same old Pubmed syntax. It might be nice to have a BOMed, where some web savvy chemist has created a Web of Knowledge-type interface to Pubmed. Hint, hint :-)

baoilleach said...

@Rich: Or even RubMed?

Egon Willighagen said...

Noel, on PubMed you can also do "Hawkins[au] Basack[au]", but given that the names are quite unique it would not improve. Just a general tip.