Language is the core of human communication, so it’s no surprise that a lot of data comes in the form of text. Customer emails, feedback forms, documents and reports; all of these data sources contain useful textual information. Text mining techniques allow you to turn text data into useful insights quickly, for example by detecting customer sentiment or identifying important topics and keywords.
This training provides a relevant basis on text mining by focusing on the problem of text classification. We will take look at how to use the Bag-of-Words approach to turn unstructured text into structured data that can be used to train classifiers using standard machine learning models. We will also discuss some variations on this approach that can help improve the performance of our models.
After this training, you should be able to:
- Do basic preprocessing of textual data
- Understand how to use and refine Bag-of-Words features
- Use classifiers that make sense and work well with those features
- Basic knowledge of the Pandas library
- Basic knowledge of Scikit-Learn
- Jupyter Notebooks