The code is shown below, but first, the results for ChEMBL 23 (you can copy the whole lot and paste into John Mayfield's CDKdepict to view). There are a couple of Texas carbons, oxygen radicals, and TEMPO-like nitrogens that should be neutral. That's not to say that everything found is dodgy - nitrogen monoxide is rumoured to be stable, for example - but they are definitely unusual and warrant further inspection.
F[Al-3](F)(F)(F)(F)F 181624 C[C@H]([C@@H](CO)NC(=O)[C@@H]1CSSC[C@@H](C(=O)N[C@H](C(=O)N[C@@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N1)C(C)O)CCCCN)CC2=CN=C3=CC=CC=C32)Cc4ccccc4)NC(=O)[C@@H](Cc5ccccc5)N)O 3349005 CC[C@@]1(C[C@@H]2C[C@@]([C]3C(=c4ccccc4=N3)CCN(C2)C1)(c5cc6c(cc5OC)N([C@@H]7[C@]68CCN9[C@H]8[C@@](C=CC9)([C@H]([C@@]7(C(=O)N)O)O)CC)C)C(=O)OC)O.OS(=O)(=O)O 3349007 CC(C)(C)OC(=O)NCCC(=O)N[C@@H](CC1=CN=C2=CC=CC=C21)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](Cc3ccccc3)C(=O)N 3348969 CCCCC1(N(C(CO1)(C)C)[O-])CCCCCCCCCCCCCOP(=O)([O])OC2CC[N+](CC2)(C)C 606677 CCCCCCCCCCCC1(N(C(CO1)(C)C)[O-])CCCCCCOP(=O)([O])OC2CC[N+](CC2)(C)C 606678 CCCCCCC1(N(C(CO1)(C)C)[O-])CCCCCCCCCCCOP(=O)([O])OC2CC[N+](CC2)(C)C 606679 CCCCCCCCCCCCCC1(N(C(CO1)(C)C)[O-])CCCCOP(=O)([O])OC2CC[N+](CC2)(C)C 606742 CCCCCCCCC1(N(C(CO1)(C)C)[O-])CCCCCCCCCOP(=O)([O])OC2CC[N+](CC2)(C)C 606743 CC(C)(C)c1cc(cc(c1[O])C(C)(C)C)CCNc2c3c(ncn2)n(cn3)[C@H]4[C@@H]([C@@H]([C@H](O4)CO)O)O 1098520 CC(C)(C)c1cc(cc(c1[O])C(C)(C)C)CCNc2c3c(ncn2)n(cn3)[C@H]4[C@@H]([C@@H]([C@H](O4)C(=O)NC)O)O 1098715 CC(C)(C)c1cc(cc(c1[O])C(C)(C)C)C(=O)NCCc2ccc(cc2)Nc3c4c(ncn3)n(cn4)[C@H]5[C@@H]([C@@H]([C@H](O5)C(=O)NC)O)O 1094994 [N]=O 1200689 CCCCCCC1(N(C(CO1)(C)C)[O-])CCCCCCCCCCCOP(=O)([O])OCC[N+](C)(C)C 606744 c1cc(ccc1C(=N)N)OCCCCCOc2ccc(cc2)C(=N)N.C(CO[S](=O)=O)O.C(CO[S](=O)=O)O 1405011 OP1(=O)OP(=O)(OP(=O)(O1)O)O.[Al] 1433450 C1[C@@H]2[C@@H]([C@H](O1)[C@H]([CH]O2)O)C[C@H]3[C@@H]([C@H]([C@H]([C@H](O3)CO)OS(=O)(=O)O)O[C@@H]4[C@@H]([C@@H]5[C@H]([C@H](O4)CO5)O[C@H]6[C@@H]([C@H]([C@H]([C@H](O6)CO)O)[O])O)OS(=O)(=O)O)O 1525673 c1ccc(cc1)N(c2ccccc2)[N]c3c(cc(cc3[N+](=O)[O-])[N+](=O)[O-])[N+](=O)[O-] 1668332 [B].c1ccc(cc1)[Si](c2ccccc2)(c3ccccc3)OCCN4CC4 1985326 [B].CS(=O)(=O)O[C@H]1CN2C[C@H]([C@H]([C@H]2[C@H]1CO)O)O 1974383 [B]C(=O)NC(CC(C)C)C(=O)NC1c2ccsc2C(=O)C1O.CN(C)C 1974647 C(C(C(=O)O)S)(C(=O)O)S.[As] 1991929 [B]C(=O)NC(CC(C)C)C(=O)NCC(=O)NC1c2ccsc2C(=O)C1O.CN(C)C 1986958 [B].CN(C)C/C(=C(\c1ccc(c(c1)OC)OC)/Cl)/c2ccc(c(c2)OC)OC 2005771 CC[n+]1c2ccccc2sc1/C=C(/C)\C#C/C=C\3/N(c4ccccc4S3)CC.[F-][P+5]([F-])([F-])([F-])([F-])[F-] 1992520 CN(C)c1ccc(c(c1)[O-])N=O.CN(C)c1ccc(c(c1)[O-])N=O.[Si+4].Cl.[Cl-].[Cl-] 2146183 CC#N.C1CC[C@@H]([C@@H](C1)O)[O-].C1CC[C@@H]([C@@H](C1)O)[O-].[Cl-].[Cl-].[Te+4] 2146182 [H+].C1C[O-][Te+4][O-]1.N.[Cl-].[Cl-].[Cl-] 2146259 CCCCC1C[O-][Te+4][O-]1.N.[Cl-].[Cl-].[Cl-] 2146289 CCCCCCC1C[O-][Te+4][O-]1.N.[Cl-].[Cl-].[Cl-] 2146290 [C] 2106049 [S] 2105487 [F-].[F-].[F-].[F-].[F-].[F-].[Si+4] 2310952 CC[C@H](C)[C@@H]([C@@H](CC(=O)N1CCC[C@H]1[C@@H]([C@@H](C)C(=O)N[C@H](C)[C@H](c2ccccc2)O)OC)OC)N(C)C(=O)[C@H](C(C)C)NC(=O)[C@H](C(C)C)N(C)C(=O)OCc3ccc(cc3)NC(=O)[C@H](CCCNC(=O)N)NC(=O)[C@H](C(C)C)NC(=O)CCCCCN4C(=O)C[CH]C4=O 2364667 c1ccc(cc1)C(=O)CC(=O)c2ccccc2.c1ccc(cc1)C(=O)CC(=O)c2ccccc2.c1ccc(cc1)C(=O)CC(=O)c2ccccc2.[Si+4].Cl.Cl 2374292 c1ccc(c(=O)cc1)[O-].c1ccc(c(=O)cc1)[O-].c1ccc(c(=O)cc1)[O-].[Si+4].[Cl-] 2374293 CC(C)(C)C(=O)CC(=O)C(C)(C)C.CC(C)(C)C(=O)CC(=O)C(C)(C)C.[Si+4].Cl.Cl 2374299 c1cc2cccnc2c(c1)[O-].c1cc2cccnc2c(c1)[O-].c1cc2cccnc2c(c1)[O-].c1cc2cccnc2c(c1)[O-].[Si+4].Cl 2374305 CN1C=NNC1(=S)c2c(c3c(c(nnc3s2)c4ccccc4)c5ccccc5)O 2299271 [Al+3].[P-3] 2272784 Cc1ccc(c2c1oc-3c(c(=O)c(c(c3n2)C(=O)N[C@H]4[C@H](OC(=O)[C@@H](N(C(=O)CN(C(=O)[C@@H]5CCCN5C(=O)[C@H](NC4=O)C(C)C)C)C)C(C)C)C)NCCNC6CC([N+](C(C6)(C)C)[O-])(C)C)C)C(=O)N[C@H]7[C@H](OC(=O)[C@@H](N(C(=O)CN(C(=O)[C@@H]8CCCN8C(=O)[C@H](NC7=O)C(C)C)C)C)C(C)C)C 3228736 Cc1ccc(c2c1oc-3c(c(=O)c(c(c3n2)C(=O)N[C@H]4[C@H](OC(=O)[C@@H](N(C(=O)CN(C(=O)[C@@H]5CCCN5C(=O)[C@H](NC4=O)C(C)C)C)C)C(C)C)C)NC6CC([N+](C(C6)(C)C)[O-])(C)C)C)C(=O)N[C@H]7[C@H](OC(=O)[C@@H](N(C(=O)CN(C(=O)[C@@H]8CCCN8C(=O)[C@H](NC7=O)C(C)C)C)C)C(C)C)C 3228735 Cc1ccc(c2c1oc-3c(c(=O)c(c(c3n2)C(=O)N[C@H]4[C@H](OC(=O)[C@@H](N(C(=O)CN(C(=O)[C@@H]5CCCN5C(=O)[C@H](NC4=O)C(C)C)C)C)C(C)C)C)NCCCNC6CC([N+](C(C6)(C)C)[O-])(C)C)C)C(=O)N[C@H]7[C@H](OC(=O)[C@@H](N(C(=O)CN(C(=O)[C@@H]8CCCN8C(=O)[C@H](NC7=O)C(C)C)C)C)C(C)C)C 3228737 [NH4+].[NH4+].F[Si-2](F)(F)(F)(F)F 3182693 Cc1cn(c(=O)[nH]c1=O)C2C(C(C(O2)C(COC(=O)C)NC(=O)C(CC3=C4=CC=CC=C4N=C3)NC(=O)OC(C)(C)C)OC(=O)C)OC(=O)C 3187332 C1CCN2CC3CC(C2C1)CN4C3CCCC4=O.O[Cl+3]([O-])([O-])[O-] 3186825 F[Si-2](F)(F)(F)(F)F.[Na+].[Na+] 3184182 C[Si](C)O[Si](C)(C)O[Si](C)(C)O[Si](C)C 3184420 CN(C)[N](=NOc1cc(c(cc1[N+](=O)[O-])[N+](=O)[O-])ON=[N+](N2CCN(CC2)C(=O)c3cc(ccc3F)Cc4c5ccccc5c(=O)[nH]n4)[O-])O 3188868 CNc1ccc(cc1)C(=O)Oc2cc(c(cc2C#N)[N+](=O)[O-])ON=[N](N(C)C)O 3187972 C[Si](O[Si](C)(C)C)O[Si](C)(C)C 3189129 c1cc(cc(c1)NC(=O)C(=O)O)C2=NN=N[N]2 3188982 CC(C)N1c2c(c(ncn2)N)C(=C3C=c4cc(ccc4=N3)O)[N]1 3209955 c1ccn2c(c1)c(c(=O)n(c2=O)CCCCN3CCC(CC3)c4c[nH]c5c4F=C(C=C5)F)c6ccc(cc6)F 3397072 CC[C@@]1(C[C@@H]2C[C@@]([C]3C(=c4ccccc4=N3)CCN(C2)C1)(c5cc6c(cc5OC)N([C@@H]7[C@]68CCN9[C@H]8[C@@](C=CC9)([C@H]([C@@]7(C(=O)N)O)O)CC)C)C(=O)OC)O 3545878 c1ccc2c(c1)C(=O)O[Mg]3(O2)Oc4ccccc4C(=O)O3.O.O.O.O 3561635 [H]1C[C@@H](C[C@@H]2[C@]1([C@H]3C[C@H]([C@@]4([C@H](CC[C@@]4([C@@H]3CC2)O)C\5=CC(=O)O/C5=C\c6ccc(cc6)N(C)C)C)O)C)O[C@H]7C[C@@H]([C@@H]([C@H](O7)C)O[C@H]8C[C@@H]([C@@H]([C@H](O8)C)O[C@H]9C[C@@H]([C@@H]([C@H](O9)C)O)O)O)O 3594279 [CH]=CCc1c[nH]c2c1cccc2 3623239 [CH2]c1c[nH]c2c1cccc2 3623240 [CH]=CCc1c[nH]c2c1cc(cc2)F 3623241 c1ccc2c(c1)C(=O)O[Mg]3(O2)Oc4ccccc4C(=O)O3 3580437 C1C[O-][Te+4][O-]1 3558859 CCCCC1C[O-][Te+4][O-]1 3558860 CCCCCCC1C[O-][Te+4][O-]1 3558861 c1ccc(cc1)C(=O)CC(=O)c2ccccc2.c1ccc(cc1)C(=O)CC(=O)c2ccccc2.c1ccc(cc1)C(=O)CC(=O)c2ccccc2.[Si+4] 3559378 CC(C)(C)C(=O)CC(=O)C(C)(C)C.CC(C)(C)C(=O)CC(=O)C(C)(C)C.[Si+4] 3559379 c1cc2cccnc2c(c1)[O-].c1cc2cccnc2c(c1)[O-].c1cc2cccnc2c(c1)[O-].c1cc2cccnc2c(c1)[O-].[Si+4] 3559380 CC#N.C1CC[C@@H]([C@@H](C1)O)O.C1CC[C@@H]([C@@H](C1)O)O.[Te+4] 3559382 CN(C)c1ccc(c(c1)O)N=O.CN(C)c1ccc(c(c1)O)N=O.[Si+4] 3559383 c1ccc(c(=O)cc1)O.c1ccc(c(=O)cc1)O.c1ccc(c(=O)cc1)O.[Si+4] 3559385
import sys import pybel ob = pybel.ob ob.obErrorLog.StopLogging() import multiprocessing as mp common_valencies = {1: {0: [1], 1: [0]}, 2: {0: [0]}, 3: {0: [1], 1: [0]}, 4: {0: [2], 1: [1], 2: [0]}, 5: {-2: [3], -1: [4], 0: [3], 1: [2], 2: [1]}, 6: {-2: [2], -1: [3], 0: [4], 1: [3], 2: [2]}, 7: {-2: [1], -1: [2], 0: [3, 5], 1: [4], 2: [3]}, 8: {-2: [0], -1: [1], 0: [2], 1: [3, 5]}, 9: {-1: [0], 0: [1], 1: [2], 2: [3, 5]}, 10: {0: [0]}, 11: {-1: [0], 0: [1], 1: [0]}, 12: {0: [2], 2: [0]}, 13: {-2: [3, 5], -1: [4], 0: [3], 1: [2], 2: [1], 3: [0]}, 14: {-2: [2], -1: [3, 5], 0: [4], 1: [3], 2: [2]}, 15: {-2: [1, 3, 5, 7], -1: [2, 4, 6], 0: [3, 5], 1: [4], 2: [3]}, 16: {-2: [0], -1: [1, 3, 5, 7], 0: [2, 4, 6], 1: [3, 5], 2: [4]}, 17: {-1: [0], 0: [1, 3, 5, 7], 1: [2, 4, 6], 2: [3, 5]}, 18: {0: [0]}, 19: {-1: [0], 0: [1], 1: [0]}, 20: {0: [2], 1: [1], 2: [0]}, 31: {-2: [3, 5], -1: [4], 0: [3], 1: [0], 2: [1], 3: [0]}, 32: {-2: [2, 4, 6], -1: [3, 5], 0: [4], 1: [3], 4: [0]}, 33: {-3: [0], -2: [1, 3, 5, 7], -1: [2, 4, 6], 0: [3, 5], 1: [4], 2: [3]}, 34: {-2: [0], -1: [1, 3, 5, 7], 0: [2, 4, 6], 1: [3, 5], 2: [4]}, 35: {-1: [0], 0: [1, 3, 5, 7], 1: [2, 4, 6], 2: [3, 5]}, 36: {0: [0, 2]}, 37: {-1: [0], 0: [1], 1: [0]}, 38: {0: [2], 1: [1], 2: [0]}, 49: {-2: [3, 5], -1: [2, 4], 0: [3], 1: [0], 2: [1], 3: [0]}, 50: {-2: [2, 4, 6], -1: [3, 5], 0: [2, 4], 1: [3], 2: [0], 4: [0]}, 51: {-2: [1, 3, 5, 7], -1: [2, 4, 6], 0: [3, 5], 1: [2, 4], 2: [3], 3: [0]}, 52: {-2: [0], -1: [1, 3, 5, 7], 0: [2, 4, 6], 1: [3, 5], 2: [2, 4]}, 53: {-1: [0], 0: [1, 3, 5, 7], 1: [2, 4, 6], 2: [3, 5]}, 54: {0: [0, 2, 4, 6, 8]}, 55: {-1: [0], 0: [1], 1: [0]}, 56: {0: [2], 1: [1], 2: [0]}, 81: {0: [1, 3]}, 82: {-2: [2, 4, 6], -1: [3, 5], 0: [2, 4], 1: [3], 2: [0]}, 83: {-2: [1, 3, 5, 7], -1: [2, 4, 6], 0: [3, 5], 1: [2, 4], 2: [3], 3: [0]}, 84: {0: [2, 4, 6]}, 85: {-1: [0], 0: [1, 3, 5, 7], 1: [2, 4, 6], 2: [3, 5]}, 86: {0: [0, 2, 4, 6, 8]}, 87: {0: [1], 1: [0]}, 88: {0: [2], 1: [1], 2: [0]}} def IsAttachedToNitrogen(atom): nbr = next(ob.OBAtomAtomIter(atom)) return nbr.GetAtomicNum() == 7 def HasCommonValence(mol): for atom in ob.OBMolAtomIter(mol): elem = atom.GetAtomicNum() if elem not in common_valencies: continue # just skip unusual elements # return False # unusual elem chg = atom.GetFormalCharge() data = common_valencies[elem] if chg not in data: return False # unusual charge state totalbonds = atom.BOSum() + atom.GetImplicitHCount() if totalbonds not in data[chg]: if not(elem==8 and chg==0 and totalbonds==1 and IsAttachedToNitrogen(atom)): # TEMPO-like return False # unusual valence (and not TEMPO-like) return True def calculate(smi): mol = pybel.readstring("smi", smi).OBMol if not HasCommonValence(mol): return smi return None if __name__ == "__main__": POOLSIZE = 6 # the number of CPUs CHUNKSIZE = 1000 pool = mp.Pool(POOLSIZE) with open("output.txt", "w") as out: with open(r"D:\LargeData\ChEMBL\chembl_23.ism", "r") as inp: #for result in pool.imap(calculate, inp, CHUNKSIZE): for result in pool.imap_unordered(calculate, inp, CHUNKSIZE): # for result in map(calculate, inp): # no multiprocessing if result: out.write(result)
No comments:
Post a Comment