Overall Goal
Determine whether the intermediate representations of convolutional neural networks identify molecular motifs whose importance was not previously appreciated.
Background.
(Hirohara et al., 2018) demonstrated that a convolutional neural network trained on the SMILES representation of a molecule
- could classify that molecule as toxic or not
- developed a latent representation (“fingerprint”) of toxicity
Approach
- Specific Aim 1: Extract Names, 2-Dimensional Structures, and Effects from Social Media of Novel Psychoactive Substances
- Specific Aim 2: Train a Convoluational Neural Network to Classify Substancts as Belonging to 1 of 5 Classes of Novel Psychoactive Substances
- Specific AIm 3: Assess how (2) classifies and represents (1)
Specific Aim 1
- We developed a custom named entity recognizer using the Python plugin SpaCy. How train, test, validate it?
Specific Aim 2
- We extended Hirohara et al. How test and validate it? How introspect latent representations?
To train the CNN, I developed a library of chemical substances (Extended Discussion)
Creation of Library
Specific Aim 3
- We did this. How validate it?
Bibliography
- 1.Hirohara, M., Saito, Y., Koda, Y., Sato, K. & Sakakibara, Y. Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC bioinformatics 19, 83–94 (2018).