22 Mar 2023 (Last Modified 23 Apr 2023)
Named Entity Recognition (NER) refers to mapping groups of characters to known entities in the real word, for example recognizing that the sequence of characters ball refers to the round object that bounces. NER is linked with named entity linking (NEL) where the recognized entity is mapped onto a unique identifier.
Biomedical NER is challenging. Biomedical texts have compound words and large out-of-vocabulary sizes. Except for RNN-based models whose F1 scores are around 0.60, models useing word embeddings (GloVe, Word2Vec) have F1 scores between 0.7 and 0.75 (Song et al., 2018). An ensemble model can achieved an F1 score of 0.93 on biomedical texts, but not validation for social media has been reported (Sung et al., 2022).
Including character level features can increase the generalizability of word-level embeddings. Indeed, adding a bidirectional LTSM improves state-of-the-art biomedical NER systems based on word embeddings improved the F1 score from .70 to .75 (Gridach, 2017). This increase is modest. Most biomedical NER systems that rely on word embeddings have F1 scores between 0.70 and 0.75.
Character level features frequently vary in morphology even while preserving phonology, for example carfentanil but fentanyl (Kim & Kang, 2022).