Hands-on Tutorials

Building CNN classifiers in Apache MxNet Gluon framework to classify articles of clothing.

Authors: Andrea Yoss and Caroline Harrison


Since many of the best models use millions of training instances and take weeks to run on robust computational resources, it is difficult for the everyday deep learning enthusiast to train comparable models from scratch. Fortunately, we can incorporate parts of those models into a completely different and domain specific model.

By using a pre-trained model, one can effectively transfer the learning from one model to another — a technique known as Transfer Learning — often used for domain adaptation and strengthening the accuracy of a model that is going to be trained on…

How to bypass limitations of the Twitter Official API when scraping twitter data.

GetOldTweets3 is a free Python 3 library that allows you to scrape data from twitter without requiring any API keys. It also allows you to scrape historical tweets > 1 week old, which you cannot do with the Twitter API.

Using GetOldTweets3, you can scrape tweets using a variety of search parameters such as start/ end dates, username(s), text query search, and reference location area. Additionally, you can specify which tweet attributes you would like to include. Some attributes include: username, tweet text, date, retweets and hashtags.

Let’s run through some examples to illustrate the different ways we can use…

Conceptual deep dive with step-by-step implementation in numpy and sklearn.

“Finding patterns is easy in any kind of data-rich environment… the key is in determining whether the patterns represent noise or signal.”

— Nate Silver


Bias-Variance Tradeoff

A typical issue students run into when fitting a model is balancing the model’s bias with its variance, known as the bias-variance tradeoff.

Bias is essentially a measure of “badness” — the higher the bias, the worse your model does when using the very data it was trained on. Typically, a model has high bias and is considered “underfit” when the model performs poorly on the training data because it is does not have enough…

A brief look at COVID-19 infection rates in South Korea and Italy.

COVID-19 Situation Overview

According to the World Health Organization (WHO), COVID-19, or “coronavirus disease 2019,” is a respiratory illness caused by a newly discovered coronavirus that is believed to have originated in Wuhan City, China, in December 2019. The virus has since spread to most of the globe, with confirmed cases increasing every day.

Using Seasonal Decomposition to Inform the SARIMA Model Selection of Soybean Prices in Python.

There are a variety of approaches you can use when working with time series data, such as linear models, ARIMA models, exponential smoothing methods, and recurrent neural networks (RNNs). In this post, I will focus on an extension of the ARIMA model that also accounts for seasonality in the data: the SARIMA model.

Brief Overview of SARIMA Model

A SARIMA model consists of four pieces,

  • Seasonal (S): Accounts for seasonality that occurs over a fixed time period.
  • Autoregressive (AR) : Accounts for any long-term trends in the data by regressing future values on past values.
  • Integrated (I): Ensures stationarity in the data, which is an…

Andrea Yoss

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store