Curriculum Vitae: Russell Sager Horton
Education
2002
AB Linguistics,
The University of Chicago
- Honors, Phi Beta Kappa
- 2000 - 2001 Université Paris X Nanterre
- French certification: High pass with honors
- GPA: 3.7 GRE: 800 verbal, 770 quantitative, 6 writing
High School, George Stevens Academy, Blue Hill, Maine, 1998
Salutatorian, National Merit Scholarship
SAT:: 800 verbal, 780 quantitative, 800 writing
Employment
2016 - 2017
Lead Data Scientist
2015 - 2016
Data Scientist at
Verizon Labs
- Inventing and prototyping data-driven TV products. Built system to identify entities and concepts in closed-caption data and link them to Wikipedia-derived concept graph.
- Led team of data scientists and engineers building a recommender system for TV and movies backed by viewing data from millions of customers on a catalog of hundreds of thousands of items. Spark, Scala, Python.
2014 - 2015
Senior Data Scientist at
EcoHealth Aliiance
- Led cross-functional team developing emerging disease detection dashboard, funded by Defense Threat Reduction Agency grant. Intensive text analysis and proposition extraction from news reports and social media text fed machine learning pipeline to identify anomalies in human and animal health. Worked closely with epidemiologists and public health scientists to identify textual markers of potential emerging pandemics. Python, Scikit-learn, MeteorJS, Scala, Stanford NLP.
- Annie (https://github.com/ecohealthalliance/annie): A Python NLP annotation framework that creates a unified representation of diverse annotations such as tokenization, ngrams, POS tagging, entity linking, time and date recognition, proposition extraction, etc. Individual annotators are run over text and their results organized in tiers linked to the original text by byte offset. This facilitates analysis across heterogeneous annotation types, for example allowing questions such as "are there any entities of type PERSON, who are the object of the verb infect, in a document that references dates in the 1990s and entities of type PLACE with lat/lon within 100 miles of Nairobi?"
- JVM-NLP (https://github.com/ecohealthalliance/jvm-nlp): A Scala API server that handles requests to annotate text documents using Stanford NLP tools, and returns a JSON representation of various annotation layers for consumption by other services.
2008 - 2014
Computational Linguist at Reverb (FKA
Wordnik)
- Responsible for entity / concept disambiguation framework that formed the core document representation for our recommender system for news and web content. Scala, Python.
2004 - 2009
Programmer Analyst at
The University of Chicago
- Machine learning and text mining, full-text search, retrieval and display. Built Philomine, an early application for machine learning on natural language text, allowing humanities researchers to perform classification and clustering experiments using a variety of algorithms via a simple web interface.
2002 - 2003
Web Developer at
eJungle Engineering, San Diego, California
- Dove Apparel: custom, ground-up shopping cart in ASP/SQL.
- Administration: Database (MSSQL, MySQL), Email, IIS, DNS, Windows NT/2000, maintain server in collocation facility
- Development: PHP, ASP, SQL, HTML, Flash, Photoshop, MS Access
- Customer liaison: meeting with clients, managing projects
2001 - 2002
Web Developer at
Baobab Software, Paris, France
- Software interface translation: French to English
- Creating Flash movie help files
- PHP/MySQL applications
2000 - 2002
Web Developer at
The University of Chicago Admissions IT
- Solo project, created award winning UC Virtual Tour: shot and knit spherical panoramas, pictures and descriptions load dynamically via XML into a Quicktime/Flash movie (LiveStage Pro) from MySQL via PHP.
- Kiosk Site: touch-screen kiosk programming.
Publications
- with Mark Olsen and Glenn Roe, "Something Borrowed: Sequence Alignment and the Identification of Similar Passages in Large Text Collections", Digital Studies / Le Champ numérique Vol 2, No 1, 2010. (html)
- with Timothy Allen, Stéphane Douard, Charles Cooney, Robert Morrissey, Mark Olsen, Glenn Roe, and Robert Voyer, "Plundering Philosophers: Identifying Sources of the Encyclopédie", Journal of the Association for History and Computing, vol. 13, no. 1, Spring 2010. (html)
- with Shlomo Argamon, Charles Cooney, Mark Olsen, and Sterling Stein, "Gender, Race, and Nationality in Black Drama, 1850-2000: Mining Differences in Language Use in Authors and their Characters", Digital Humanities Quarterly, Spring 2009, Volume 3, Number 2. (html)
- with Shlomo Argamon, Jean-Baptiste Goulain, and Mark Olsen, "Vive la Différence! Text Mining Gender Difference in French Literature", Digital Humanities Quarterly, Spring 2009, Volume 3, Number 2. (html)
- with Robert Morrissey, Mark Olsen, Glenn Roe, and Robert Voyer, "Mining Eighteenth Century Ontologies: Machine Learning and Knowledge Classification in the Encyclopédie", Digital Humanities Quarterly, Spring 2009, Volume 3, Number 2. (html)
Conference Papers
- with Les Henderson, "Sequence Alignment and Similarity in Biology and the Humanities", Digital Humanities and Computer Science 2010, Northwestern University, Novemeber 20th 2010
- with Mark Olsen and Glenn Roe, "PAIR: Pairwise Alignment for Intertextual Relations", Annual Meeting of the Society for Digital Humanities -- Société pour l'étude des médias interactifs - Carleton University, Ottawa, May 25-27, 2009.
- with Charles Cooney, Mark Olsen, Glenn Roe, and Robert Voyer, "Deconstructing Machine Learning: A Challenge for Digital Humanities", Digital Humanities 2008, University of Oulu, Oulu, Finland, June 25-29, 2008
- with Charles Cooney, Mark Olsen, Glenn Roe and Robert Voyer,"PhiloMine: An Integrated Environment for Humanities Text Mining", Digital Humanities 2008, University of Oulu, Oulu, Finland, June 25-29, 2008
- with Charles Cooney, Mark Olsen, Glenn Roe, and Robert Voyer, "Hidden Roads and Twisted Paths: Intertextual Discovery using Clusters, Classifications, and Similarities", Digital Humanities 2008, University of Oulu, Oulu, Finland, June 25-29, 2008
- with Charles Cooney, Mark Olsen, Glenn Roe, and Robert Voyer, "Feature Creep: Evaluating Feature Sets for Text Mining Literary Corpora", Digital Humanities 2008, University of Oulu, Oulu, Finland, June 25-29, 2008
- with Charles Cooney, Robert Morrissey, Mark Olsen, Glenn Roe, and Robert Voyer, "Re-engineering the tree of knowledge: Vector space analysis and centroid-based clustering in the Encyclopédie", Digital Humanities 2008, University of Oulu, Oulu, Finland, June 25-29, 2008
- with Shlomo Argamon, Mark Olsen and Sterling Stein, "Gender, Race, and Nationality in Black Drama, 1850-2000: Mining Differences in Language Use in Authors and their Characters", Digital Humanities 2007, University of Illinois, June 2007
- Shlomo Argamon, Jean-Baptiste Goulain, Russell Horton and Mark Olsen, "Discourse, power and écriture féminine: Text mining gender difference in 18th and 19th century French literature", Digital Humanities 2007, University of Illinois, June 2007
- with Robert Morrissey, Mark Olsen, Glenn Roe and Robert Voyer,"Mining Eighteenth Century Ontologies: Machine Learning and Knowledge Classification in the Encyclopédie", Digital Humanities 2007, University of Illinois, June 2007.
- with Charles Cooney, Mark Olsen, Glenn Roe, and Robert Voyer, "Extending PhiloLogic", Digital Humanities 2007, University of Illinois, June 2007
Invited Presentations
- "'Ever since nineteen, had a perfect rhyme scheme': A corpus study of English rap rhyme" Networks and Network Analysis for the Humanities: Reunion Conference, The University of California at Los Angeles, Oct 20 - 22 2011. (html)
- with Mark Olsen. Sequence Alignment, Shared Services, and Digital Humanities, Project Bamboo Workshop, Tucson, Arizona, January 2009.
- with Robert Morrissey, "The ARTFL Project: From words to works", The Dilemmas of Digitization, Oxford University, May 22-24, 2008.
Posters
- with Cody Brimhall and Emily Morgan, "A machine learning approach to rhythmic classification of languages", Journal of the Acoustic Society of America, Volume 128, Issue 4, pp. 2478-2478 (2010) (html).
Patents
- with Hu, Si Ying D. and Medapati, Suri B, "Automatic Media Summary Creation Systems and Methods" (pending) -- United States 20150300, March 2015
NLP and machine learning techniques for computational media summarization and indexing
- with Allen, Timothy and McKean, Erin, "Data Mining for Free Range Definitions" -- United States WORD 1000-1, February 2011
Technique for data mining sentences that best illustrate the meaning of candidate words
Misc
- Winner, 2010 NAACL HLT Poetry Contest :)
- Erdős number 4: Shlomo Argamon → Sarit Kraus → Menachem Magidor → Paul Erdős