Working with Messy Data (Online)
April 21 @ 10:00 am - 4:00 pm
This course will be offered via Zoom only. However, this course will not be recorded.
When working with data, one thing is fairly certain: data is rarely in an optimal format. A misplaced space here, or an extra comma there, can mean the difference between two clicks and two hours of work. In this course, we will work with ways to isolate, extract, and transform data from webpages, text files, and published datasets using Python and Pandas. This class will also introduce regular expressions, a language for matching specific parts of text.
-Why take this course?
The tools for handling data are often complicated, and the associated learning curves are very steep. Over these classes, we will cover a range of techniques with industry standard data-processing tools. If you are curious about how to handle tabular data but feel intimidated by the prospect of programming, this class will get you started on the path towards better data management.
-What will participants learn?
Participants in this course will learn basic and intermediate Python programming and scripting as it pertains to importing and exporting data.. We will cover some of the libraries associated with mathematical and statistical analysis, as well as text processing using regular expressions.
-Prerequisites and requirements:
This course is intended for data scientists with basic-to-intermediate understanding of one or more of: the Python programming language, data import/export formats, text processing, and some statistical analysis. This class assumes that you will have a computer with a running installation of Python 3, a text editor that supports regular expressions, and a web browser with internet connectivity. We will be using the Anaconda Individual Edition for Python 3, and Sublime Text.
Brown Biggers is the IT Operations Manager for the UNC Greensboro University Libraries. He holds a master’s degree in computer science, and has over 18 years of systems and network management experience in academic, public, and private sectors. His current research interests include natural language processing, text mining, data visualization, and social media crisis analytics.
– UNC CH Students: $0, with a $25 deposit to hold your spot (deposit is refundable upon your attendance for at least 66% of the course)
– UNC CH Faculty/Staff/Postdoc: $40
– Non UNC CH: $40
This class will be offered via Zoom ONLY. Registration closes at 12:01am on 4/18/2021. Once registration closes, no late registrations will be accepted. NO EXCEPTIONS!
* Cancellation/ Refund Policy: A full refund will be given to those who cancel their registration no later than 10 days prior to the course. If you cancel within the 10 days prior to the class, no refund will be given. Please allow 30 days to receive your refund.
Zoom link for this course will be sent prior to the course. Registration must be made at least 3 days prior to the course date to receive the Zoom link.