ML Engineer

Intermediate Task 1

Download and prepare data for MIT Indoor Classification. The final form of your submission will be a Jupyter Notebook. We suggest that you use a Google Colab. Also, we suggest that you use PyTorch to prepare your ML pipeline. When you are done with it, download your notebook and submit it to us at people@310.ai
Create a minimal baseline for the classification task using 20% of data as validation.Use balanced-accuracy as your metric. Name this notebook “Baseline”. In this part, we focus on your code being clean, simple and understandable. We expect a clear pipeline, going over data loading, model creation, training and evaluation.
Use any tricks you like (ex. model architecture, augmentation, etc.) to improve the metric. The only method you are not allowed to use is pretraining using other datasets (ex. ImageNet pretraining). A minimum balanced-accuracy of 55% is required. Your submission will be examined based on its performance on a random validation (not your own split). The focus on this part is to measure your ability to do critical thinking and come up with good ideas to attack a hard problem. Name this notebook “Challenge” in your submission.

Read this tutorial https://pytorch.org/tutorials/beginner/transformer_tutorial.html
Change the task from autoregressive to predict the masked words. You can mask 10% of the words at random.
Define a reasonable baseline and compare model performance with that.
Try to improve the model performance without using more resources.
Improve the model without any restrictions. You can update architecture, hyper parameters etc.

Read https://github.com/Shen-Lab/TALE
Move the code to Pytorch
Improve the performance with minor changes to the architecture and without using more resources
Improve the performance by designing a totally different architecture. You can design your own auxiliary tasks and even bring additional datasets.

Please send email to people@310.ai with the URL