This assignment is due on Monday, September 16, 2024 before 11:59PM. This assignment may be done with a partner.
For this assignment, we’ll be building a text classifier. The goal of our text classifer will be to distinguish between words that are simple and words that are complex. Example simple words are heard, sat, feet, shops, and town, and example complex words are abdicate, detained, liaison, and vintners. Distinguishing between simple and complex words is the first step in a larger NLP task called text simplification, which aims to replace complex words with simpler synonyms. Text simplification is potentially useful for re-writing texts so that they can be more easily understood by younger readers, people learning English as a second language, or people with learning disabilities.
The learning goals of this assignment are:
We will provide you with training and development data that has been manually labeled. We will also give you a test set without labels. You will build a classifier to predict the labels on our test set. You can upload your classifier’s predictions to Gradescope. We will score its predictions and maintain a leaderboard showing whose classifier has the best performance.
Here are the materials that you should download for this assignment:
This assignment has several deliverables: