This event has passed.

ICPSR – Machine Learning for the Analysis of Text as Data

Name: ICPSR – Machine Learning for the Analysis of Text as Data
Start: 2018-07-16T09:00:00-04:00
End: 2018-07-20T17:00:00-04:00
Location: 219 Davis Library

July 16, 2018 @ 9:00 am - July 20, 2018 @ 5:00 pm

Instructor(s):

Brice Acree, Ohio State University

Quantitative analysis of digitized text represents an exciting and challenging frontier of data science across a broad spectrum of disciplines. From the analysis of physicians’ notes to identify patients with diabetes, to the assessment of global happiness through the analysis of speech on Twitter, patterns in massive text corpora have led to important scientific advancements. In this course we will cover several central computational and statistical methods for the analysis of text as data. Topics will include the manipulation and summarization of text data, dictionary methods of text analysis, prediction and classification with textual data, document clustering, text reuse measurement, and statistical topic models. Each method will be illustrated with hands-on examples using R. Participants will develop an understanding of the challenges and opportunities presented by the analysis of text as data, as well as the practical computational skills to complete independent analyses. The R packages covered in this course include tm, lda, textreuse, glmnet and openNLP.

One distinguishing focus of this course will be the use of text analytics for the reliable and valid development and testing of scientific theory. Most methods of text analysis have been developed with predictive or descriptive motivations. For each method we cover in the current course, we will review how the method has been and can be applied to draw theoretical inferences regarding processes surrounding text generation.

Prerequisites: Participants should be familiar with linear and generalized linear models (e.g. logit, poisson, etc.), and have at least some exposure to the R environment before the workshop. The class will review aspects of R on the first day. No prior knowledge of text processing or modeling is assumed.

Fee: Members = $1700; Non-members = $3200

For registration details, click here.

Details

Start:: July 16, 2018 @ 9:00 am
End:: July 20, 2018 @ 5:00 pm
Event Category:: Short Course

Organizer

: odum_bull2

Venue

: 219 Davis Library
: 208 Raleigh Street
Chapel Hill, NC 27514 United States + Google Map
Phone: (919) 962-1151
: View Venue Website

ICPSR – Machine Learning for the Analysis of Text as Data

July 16, 2018 @ 9:00 am - July 20, 2018 @ 5:00 pm

Details

Organizer

Venue

Related Events

Google Earth Engine for Urban Studies (Online)

Version Control with Git and GitHub (Online)

Version Control with Git and GitHub (Online)