Latent Representations of Chemical Activity

Overall Goal

Determine whether the intermediate representations of convolutional neural networks identify molecular motifs whose importance was not previously appreciated.

Background.

(Hirohara et al., 2018) demonstrated that a convolutional neural network trained on the SMILES representation of a molecule

could classify that molecule as toxic or not
developed a latent representation (“fingerprint”) of toxicity

Neural Networks
Latent representaions.
Visualizing Latent Representations
Social Media as a Data Source.

Approach

Specific Aim 1

We developed a custom named entity recognizer using the Python plugin SpaCy. How train, test, validate it?

Specific Aim 2

We extended Hirohara et al. How test and validate it? How introspect latent representations?
- Ported from Chainer (name?) to Keras.
- Developed visualization module
- Reproduced their published results

To train the CNN, I developed a library of chemical substances (Extended Discussion)

Creation of Library

Specific Aim 3

We did this. How validate it?

Bibliography

1.Hirohara, M., Saito, Y., Koda, Y., Sato, K. & Sakakibara, Y. Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC bioinformatics 19, 83–94 (2018).