tag:blogger.com,1999:blog-7844526396210378482.post1068374557754629120..comments2024-01-31T09:23:26.925+00:00Comments on Noel O'Blog: Reduce, recycle and reuse bond closure symbols?Noel O'Boylehttp://www.blogger.com/profile/03288289351940689018noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-7844526396210378482.post-28727180968684019052012-04-05T18:46:24.379+01:002012-04-05T18:46:24.379+01:00I usually don't reuse when manually entering a...I usually don't reuse when manually entering a SMILES notation. But my current approach in generating unique SMILES and CurlySMILES notations applies reuse - without reusing the same ring digit on the same ring atom, but in such a way that each polycyclic subsystem of a molecule starts with ring digit 1.Axel D.https://www.blogger.com/profile/06438831836545774008noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-74673876919940794112012-04-05T09:54:06.978+01:002012-04-05T09:54:06.978+01:00@Andrew, @Orion: Regarding same atom reuse, I see ...@Andrew, @Orion: Regarding same atom reuse, I see that OB avoids this (whether by accident or design I don't know) by listing the ring openings first, and then the ring closures. Seems counterintuitive, but then again, it does avoid this problem.<br /><br />@Orion: Interesting idea and I think I've heard you mention it before. I wonder would it be implementable.Noel O'Boylehttps://www.blogger.com/profile/03288289351940689018noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-58305816864965247592012-04-04T02:39:12.183+01:002012-04-04T02:39:12.183+01:00I beleive that SMILES should be a good balance of ...I beleive that SMILES should be a good balance of human-reable and machine-readable. If we were only interested in the latter, probably this wouldn't even be worth discussing (and there are many other choices, besides). As such, I agree with Andrew that reuse is a good thing (and certainly that same atom reuse is evil!). Still wish it was scoped within parens, though. Would make composing substructures much more predictable. (i.e. C1C(C1CC1)CC1 and C1C(C2CC2)CC1 should mean the same thing, but they don't).Orionhttps://www.blogger.com/profile/00412216321696123699noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-59909098345914106312012-04-03T13:46:04.013+01:002012-04-03T13:46:04.013+01:00@Andrew: Feel free to heapify the implementation i...@Andrew: Feel free to heapify the implementation in OB :-) (from <a href="http://openbabel.svn.sourceforge.net/viewvc/openbabel/openbabel/trunk/src/formats/smilesformat.cpp?revision=4736&view=markup" rel="nofollow">Line 2503</a>)<br /><br />Nice example with the two digits on one atom...[five minutes later]...actually that's a really nice corner case. OB seems to do as you suggest and open up a new ring digit. I need to look into exactly how it does that...Noel O'Boylehttps://www.blogger.com/profile/03288289351940689018noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-49755115527871862402012-03-31T19:35:28.311+01:002012-03-31T19:35:28.311+01:00I'm a reuse person, but I think it's parti...I'm a reuse person, but I think it's partially because I get to use a heap to maintain the list of next-available-item, and show off my mad data structure skillz.<br /><br />The fastest, if there is a small number of rings, is of course to not reuse. The complexity comes when you need to reuse. The heuristics for those (rare) cases become more complicated than maintaining the heap in the first place.<br /><br />There's another (minor) con with reuse: I think reusing the same ring digit on the same atom is confusing, as in C1CCCC11CCCC1 .Andrew Dalkehttps://www.blogger.com/profile/17091314849699854287noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-51438855291291046532012-03-30T15:38:38.040+01:002012-03-30T15:38:38.040+01:00Great comment.
The key question is whether adopti...Great comment.<br /><br />The key question is whether adopting a split strategy (i.e. different behaviour in different circumstances) is a good idea. I would feel that it is better to adopt a single strategy in all cases as it makes it easier for the user to deduce the behaviour from a small number of test examples; otherwise they'd have to read the manual...and no-one does that!Noel O'Boylehttps://www.blogger.com/profile/03288289351940689018noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-9788200040931799042012-03-29T17:57:29.830+01:002012-03-29T17:57:29.830+01:00My opinions:
For molecules with less than 9 bond ...My opinions:<br /><br />For molecules with less than 9 bond closures: never reuse<br /><br />For molecules with more than 9 bond closures, but no more than 9 closures are interweaved/intermixed in any one group: reuse, but with the change over happening between groups (e.g. you might get 1-5 and then 1-6). Bonus points if there's a large linear linker under the changeover.<br /><br />Cycle-of-cycle molecules where everything is intermixed due only to a few large macrocycles: Use without reuse the low numbers for the macrocycle, and then reuse the remaining ones for the disconnected subgroups, as above.<br /><br />Large hairy molecules where everything is intermixed and interconnected without an easy way of seperating them into sub-groups: Do whatever works. You're unlikely to understand the structure without some SMILES -> Figure program anyway, so does it really matter?RMnoreply@blogger.com