Overall Goal

Determine whether the intermediate representations of convolutional neural networks identify molecular motifs whose importance was not previously appreciated.

Background.

(Hirohara et al., 2018) demonstrated that a convolutional neural network trained on the SMILES representation of a molecule

  1. could classify that molecule as toxic or not
  2. developed a latent representation (“fingerprint”) of toxicity

Approach

  1. Specific Aim 1: Extract Names, 2-Dimensional Structures, and Effects from Social Media of Novel Psychoactive Substances
  2. Specific Aim 2: Train a Convoluational Neural Network to Classify Substancts as Belonging to 1 of 5 Classes of Novel Psychoactive Substances
  3. Specific AIm 3: Assess how (2) classifies and represents (1)

Specific Aim 1

  • We developed a custom named entity recognizer using the Python plugin SpaCy. How train, test, validate it?

Specific Aim 2

  • We extended Hirohara et al. How test and validate it? How introspect latent representations?
    • Ported from Chainer (name?) to Keras.
    • Developed visualization module
    • Reproduced their published results

To train the CNN, I developed a library of chemical substances (Extended Discussion)

Creation of Library

Specific Aim 3

  • We did this. How validate it?

Bibliography

  1. 1.Hirohara, M., Saito, Y., Koda, Y., Sato, K. & Sakakibara, Y. Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC bioinformatics 19, 83–94 (2018).