- This event has passed.
Introduction to Big Data and Machine Learning for Survey Researchers and Social Scientists with Trent Buskirk
November 8, 2018 @ 9:00 am - 4:30 pm
Introduction to Big Data and Machine Learning for Survey Researchers and Social Scientists with Trent Buskirk
The amount of data generated as a by-product in society is growing fast including data from satellites, sensors, transactions, social media and smartphones, just to name a few. Such data are often referred to as “big data”, and can be used to create value in different areas such as health and crime prevention, commerce and fraud detection.
An emerging practice in many areas is to append or link big data sources with more specific and smaller scale sources that often contain much more limited information. This practice has been used for some time by survey researchers in constructing frames by appending auxiliary information that is often not directly available on the frame, but can be obtained from an external source.
Using Big Data has the potential to go beyond the sampling phase for survey researchers and in fact has the potential to influence the social sciences in general. Big Data is of interest for public opinion researchers and agencies that produce statistics to find alternative data sources either to reduce costs, to improve estimates or to produce estimates in a more timely fashion.
However, Big Data pose several interesting and new challenges to survey researchers and others who want to extract information from data. As Robert Groves (2012) pointedly commented, the era is “appropriately called Big Data and not Big Information”, because there is a lot of work for analysts before information can be gained from “auxiliary traces of some process that is going on in society.”
In this course we explore how Big Data concepts, processes and machine learning methods can be used within the context of Survey and Social Science Research. Throughout this course we will illustrate key concepts using specific survey research examples including tailored survey designs and nonresponse adjustments and evaluation. This course will offer participants:
• an overview of key Big Data terminology and concepts
• an introduction to common data generating processes
• a discussion of some primary issues with linking Big Data with Survey Data
• issues of coverage and measurement errors within the Big Data context
• a discussion of information extraction and signal detection in the context of Big Data
• a discussion of the similarities and differences in model building for inference versus prediction
• an overview of four popular machine learning methods including k-means clustering, hierarchical clustering, classification and regression trees and random forests using R with example code provided
• an discussion and illustration about how these and other methods can be used in the survey research process
This course will count as 7.0 CPSM short course credits.
Instructor: Trent D. Buskirk
Trent D. Buskirk, PhD received his Ph.D. in Statistics from Arizona State University in 1999 with emphasis in Survey Sampling. Since that time Trent has developed expertise and extensive experience in sampling statistics, survey and data collection methodology with specific expertise on Mobile Survey Designs.
His research interests include dual frame weighting for smartphone surveys as well as mode effects related to smartphone surveys, online and in-person surveys and the use of data mining methods for predicting nonresponse in ABS samples using auxiliary frame data. Dr. Buskirk has also conducted research using both Probability-based and non-probability based panels in the context of nonresponse bias adjustments, mode effect evaluation and coverage issues.
Trent is currently the President of the Midwest Association of Public Opinion Research and the Publications Officer for the Survey Research Methods Section of the American Statistical Association. Prior to joining MSG, Dr. Buskirk conducted research and development at The Nielsen Company and prior to that was an Associate Professor of Biostatistics at the Saint Louis University School of Public Health. Dr. Buskirk also served as a faculty member in the Survey Research and Methodology Program and Statistics Unit at University of Nebraska-Lincoln.
When Trent is not working or thinking about surveys, sampling, smartphones ad research in general, you can find him playing resident prince to his two princesses or playing an action packed game of Pickle ball!
Register here.
Registrations will not be accepted on or after 11/5/18
Registration Fees:
- CPSM Students – $40
- UNC Students – $65
- Others – $90
-
Cancellation/ Refund Policy: A full refund will be given to those who cancel their registration no later than 10 days prior to the course. If you cancel within the 10 days prior to the class, no refund will be given. Please allow 30 days to receive your refund.
Waitlist/ Walk-ins: There may be a waitlist for the courses. Walk-ins will not be accepted. Each attendee must register and pay prior to 3 days before the start of the course.