tag:blogger.com,1999:blog-7844526396210378482.post5348471236852145681..comments2024-01-31T09:23:26.925+00:00Comments on Noel O'Blog: Pybel - Just how unique are your molecules? Part IINoel O'Boylehttp://www.blogger.com/profile/03288289351940689018noreply@blogger.comBlogger5125tag:blogger.com,1999:blog-7844526396210378482.post-47192454149551837912007-08-10T20:02:00.000+01:002007-08-10T20:02:00.000+01:00The first pair of smiles are different enantiomers...The first pair of smiles are different enantiomers. They also are incorrect in that they have a 4 valent uncharged nitrogen. If you remove the H, CONCORD will generate different 3D conformations.<BR/><BR/>The second pair, as written are identical since the two are superimposible on each other. The chiral markings serve to differentiate them from the trans case where one hydroxyl is above the ring and the other below. That the canonicalizer doesn't generate identical SMILES is a bug. <BR/><BR/>Also, I think that there are two other centers that are not markedTahutihttps://www.blogger.com/profile/02314511282628448959noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-33275633340500752842007-07-17T15:23:00.000+01:002007-07-17T15:23:00.000+01:00Thanks for the feedback, GMC. For the record, I wi...Thanks for the feedback, GMC. For the record, I will be following up these issues either with the OpenBabel development team, or with John Irwin, over at ZINC. I will post an update if and when.Noel O'Boylehttps://www.blogger.com/profile/03288289351940689018noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-14176591616898676062007-07-15T20:35:00.000+01:002007-07-15T20:35:00.000+01:00SMILES certainly need to be canonicalised if you'r...SMILES certainly need to be canonicalised if you're going to be using them for duplicate recognition. However SMILES that you appear to have generated with 'smi' (as opposed to 'can') have lost stereochemical information. You might want to check that 'smi' isn't just writing a canonical SMILES without stereochemical information. Does 'smi' reproduce the order of the input SMILES and does it output stereochemical information?<BR/><BR/>I’ve taken a look at your SMILES. I believe that ZINC03883383 and ZINC03883386 are distinct structures because the chiral center in the substituent renders the two faces of the morpholine ring non-equivalent. One worrying feature of these two SMILES is that the 4-connected nitrogen does not have a positive charge so the SMILES encode free radicals. <BR/><BR/>The other two SMILES present more of a mystery since they appear to encode the same structure. Only two of the stereo centers have defined configuration and the two that do not are C1 and C4 of a cyclohexane ring. Encoding cis/trans relationships in 1,4-disubsituted cyclohexanes with SMILES did pose problems in the past (maybe still) because the plane of symmetry is incompatible with chirality and some software ‘knows’ that the relevant carbon atoms can’t be chiral centers. It could be that the canonicalisation has somehow lost some information about stereochemical relationships. There are a couple of things that you could check. First, if you started with the structures in another format, you may be able see which stereocenters have defined configuration and therefore whether any of this has been lost in SMILES generation and/or canonicalisation. Secondly you could edit in a specific configuration for each (and both) of the stereocenters with undefined configuration and check that this information is not lost in the canonicalisation process.Georg-Martin Krapperhttps://www.blogger.com/profile/15416686863175197568noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-1043605804632077912007-07-13T07:57:00.000+01:002007-07-13T07:57:00.000+01:00I think my interpretation is still correct. The no...I think my interpretation is still correct. The non-canonical SMILES are definitely not canonicalised to begin with, so for sure there is a potential to give rise to different SMILES.<BR/><BR/>Secondly, the two molecules with different isomeric SMILES (thanks for the definition) have <I>identical 3D structures</I>, which surely is inconsistent with different isomeric SMILES.Noel O'Boylehttps://www.blogger.com/profile/03288289351940689018noreply@blogger.comtag:blogger.com,1999:blog-7844526396210378482.post-10619421666989638382007-07-12T23:37:00.000+01:002007-07-12T23:37:00.000+01:00Possibly a little confusion here. The @ and @@ sp...Possibly a little confusion here. The @ and @@ specify configuration and when these are present you're dealing with what sometimes gets called an isomeric SMILES. Double bond geometry and isotopes can also be specified in isomeric SMILES. The SMILES that you're calling non-canonical does not have the stereochemistry specified. However it may still be canonical. It's just not an isomeric SMILES.Georg-Martin Krapperhttps://www.blogger.com/profile/15416686863175197568noreply@blogger.com