30 May 2020 (Last Modified 01 Jun 2022)
For my Analysis of Latent Spaces I created a convolutional neural network (link) to classify compounds as belonging to drug categories. To obtain the drug categories and their labels.
The divide between the everyday terminology clinicians use and the formal language of the NDCs can effect observational studies (DeFalco et al., 2013), but our input here is SMILES strings.
One consideration when creating the training data sets was to have enough samples of novel psychoactive substances so that the relative magnitudes of the fractional classifications would be meaningful.
Training Set Should Have
Classes (Is there a pre-existing classification I could use?). Links to CSV only work if you are a known collaborator for this project.
Testing sets:
I used PyBioMed (Git Hub Repo, Tutorial) and ChemSiPy. Here is a good overview of the databases describing apporved sustances (NB openFDA might be useful)
I used requests
to query RxNorm.
import requests
URL = "https://rxnav.nlm.nih.gov/REST/rxclass/classMembers.json"
query = lambda classId: requests.get(url=URL, params={"classId": classId,"relaSource":"ATC"})
The keyword relaSource
refers to the relationship that must hold between the drug and class for a drug to be considered a member of the class. I’m not sure why, but while the source of the relationship has to be specified, the object of the relationship is optional. The documentation does not explain this design choice.
Compound Class | Data Source |
---|---|
Opioids | |
I used the following Resources.
The Wikipedia page, List of Pyschoactive Substances detailed serotonergic agonists and cannabinoid receptor agonists, which I extracted as
I didn’t include the benzofuran derivatives, dimemebfe, also known as 5-MeO-BFE and 5-MeO-DiBF because nither is, structurally, a tryptamine.
I began with the Wikipedia page. I excluded LSD derivatives because they contain phenethylamine. The D stands for diethylamide. I saved them in [filename] because I think they are a good test for the CNN. I expect them to classify LSD and LSD derivatives as partly tryptamine and partly phenethylamine.
I excluded HU-211 because it binds to NMDA as well as CB receptors. (Wikipedia suggests it has therapeutic uses.) HU-211 is the enantiomer of HU-210. The structural diversity of cannabinoids could be a second paper.