Posts

Func-tastic Python: Mastering Functools

There’s your life as Python user before you knew about functools and then there’s your life as a Python user after. The same can be said about itertools & collections, but that’s a story for another time. People even go as far as saying that functools is life-changing. Seeing that I’m writing this post, it’ll be no surprise that I agree. Functools is part of the python standard library. The functools documentation itself is quite nice, but I thought I’d try presenting my own take on it....

Pre-commit for Data Scientists

There’s a nifty little python library called pre-commit that’s doing rounds around the internet. And it all starts with a little file called .pre-commit-config.yaml. But we’ll get into that later. First, let’s talk about pre-commits in general. As data scientists, we version our code, data and models. While versioning our code, we might use linters like flake8 and black to make sure our code conforms to the PEP guidelines. However, we usually have to run these manually and more often than not, we might forget to do so....

OOPs for Data Scientists

There are 4 principles of OOPs that everyone is expected to know. I used to be one of those people who thought that it was enough to know how to read and write OOPs code without actually knowing what those principles were as I thought they were more of a software engineering concept. But it makes sense to be aware of the common terms and verbalizations of the concepts within OOPs....

Numpy Exercises 1-50

First 50 numpy exercises This is a set of exercises collected by Rougier. All credits to Rougier for curating this list. I am simply trying to solve it for practice and hoping it serves as a reference for others. I am surprised I didn’t come across it before. View this post in a Jupyter Notebook This is intended to serve as a stepping stone to becoming a better Data Scientist / Machine Learning Researcher....

Demystifying the mathematics behind PCA

Demystifying the mathematics behind PCA We all know PCA and we all love PCA. Our friend that helps us deal with the curse of dimensionality. All data scientists have probably used PCA. I thought I knew PCA. Until I was asked to explain the mathematics behind PCA in an interview and all I could murmur was that it somehow maximizes the variance of the new features. The interviewer was even kind enough to throw me a hint about projections....

NHL Data Science Project

Implemented a complete Data Science pipeline: data collection, tidying data, creation of synthetic features, basic and advanced interactive visualizations using plotly, tracking models through CometML, deploying models through a REST API using Docker and Flask.

Crop Harvest Classification

Given meteorological and satellite data, predicted land as either crop or non-crop land. Used techniques such as AutoML (Light AutoML and PyCaret) as well as blending and stacking to reduce bias and generalize better. Check out the doc link for a more detailed overview of the project.

Weather Events Classification

Classified events as either standard background conditions / tropical cyclones or atmospheric rivers. Used techniques such as SMOTE and SMOTE Tomek to fix class imbalance, hyperparameter tuning using HalvingRandomGridSearchCV, manual feature engineering, and a plethora of sklearn classification algorithms. Check out the doc link for a more detailed overview of the project.

Anime Project

My current magnum opus. Consecutively scraped a ton of images for the top 100 anime followed by scraping even more than a ton of images of the top 10 characters in each anime. The goal was to try to identify anime using a ML model and when successful, we planned to identify the characters in the image as well. WIP.

Whitepaper: Ethically mitigating biases in DS / ML

The paper provides techniques and a checklist to prevent bias from creeping into Machine Learning models. (Co-author)