11 Jun 2020 (Last Modified 18 May 2022)
Thiagarajan et al. (2020) published a preprint where they suggest that an analysis of latent spaces provides a way to interpret neural networks. The idea is conceptually similar to previous citation needed ideas to identify what internal constructs or representations the neural networks is using. This bears a strong parallel to sensory neuroscience, which has understood how the brain represents the luminance of adjacent areas of visual space with adjacent areas of the calcarine cortex, or adjacent frequencies with adjacent neurons in the primary auditory cortex.
I wondered how to apply this idea to language. There are many mehods of dimensionality reduction (Latent Semantic Analysis, Laten Dirichlet Allocation) that project text onto a metric space. (There is the issue of how adequately these spaces represent all levels of meaning of the text).
Related: