08 Jun 2020 (Last Modified 08 Jun 2020)
Natural Language Processing To Extract Health Information from Social Media
The analysis of text from Social Media may overcome the limitations of traditional means of public health epidemiology. Surveys are slow and so may not provide timely information. They are also expensive, which limits their scope introducing sampling bias.
The analysis of text from Social Media has its own issues. The data contain considerable noise, use nonstandard orthography and semantics and are too large for manual curation.
Both of these papers inferred semantics from frequency. The ubiquity of polysemy and slang online make it hard for me to believe that the model in (1) could be applied to other data without re-tuning. In fact, (2) could not replicate the importance of diurnal variation that (1) found. The topics that (2) uncovered had make medical sense, referring to depressed mood, feelings of agitation or desolation, somatic complaints, and medical discussions. But, I feel the authors of (2) got lucky. LDA rarely gives such nice topics.
TODO: Discuss each of these papers more
Word emeddings provide an alternative topic modeling or deriving semantics from syntax.