This page outlines the background I think one needs to be productive in my lab. My choices reflect my research focus (computational linguistics and applied mathematics) and training (PhD in computational neuroscience and MD specializing in toxicology). You don’t have to know everything here. Someone focusing on linguistics doesn’t need to know about Hilbert or Banach spaces. Mastery comes from directed learning and focused practice.

In my opinion, background also includes a working knowledge of funding agencies. There are many ways to skin a cat. This document is dynamic and I am open to suggestions.

Table of Contents

Fundamentals Everyone Should Know

  1. GitHub. I use GitHub to organize and share code. In the future I hope to release code packages and use the issue and discussion functions more.
  2. Missing Semester. This MIT course teaches you the day-to-day tools you need to do computational science, for example how to move and rename masses of files, quickly edit files with vim, or make a build system.
  3. LaTeX. LaTeX makes beautiful preprints and technical papers. I use Beamer for almost all my presentations. I write as many of my grants in LaTeX as possible. IEEE conferences accept TeX submissions. Biomedical journals sometimes do.
  4. Facility with Python, Prolog, LiSP, Julia, or Haskell.
  5. Anatomy of a Specific Aims Page. The Specific Aims page is the crystallization of your proposal, similar to but far more detailed and precise than an executive summary. All grant applications, whether to private foundations or the NIH, DoD, or NSF, hinge on the Specific Aims page.

Things for the Computational Linguistics People

My focus is on representing biomedical knowledge in a computable format and on developing tools that use those representation to contextualize (and especially assess the plausibility) of knowledge expressed in unstructured text.

Knowledge Representation

  1. Basic Formal Ontology. BFO is a framework for creating first order logic predicates that, when taken together, express a consistent picture about a “portion of reality”. It’s a commonly used standard.
  2. Markov Logic Networks. MLNs express uncertainty and are thus more expressive than BFO. It is an area of ongoing interest to join a BFO-compliant ontology and MLN schema.
  3. SpaCy. We use SpaCy for our NLP backbone.

Things for the Applied Mathematics People