How to bypass limitations of the Twitter Official API when scraping twitter data.

GetOldTweets3 is a free Python 3 library that allows you to scrape data from twitter without requiring any API keys. It also allows you to scrape historical tweets > 1 week old, which you cannot do with the Twitter API.

Using GetOldTweets3, you can scrape tweets using a variety of search parameters such as start/ end dates, username(s), text query search, and reference location area. Additionally, you can specify which tweet attributes you would like to include. Some attributes include: username, tweet text, date, retweets and hashtags.

Let’s run through some examples to illustrate the different ways we can use the GetOldTweets3 library to extract tweets:

Before you start, you will need to install GetOldTweets3.

Example 1

Top 100 Recent Tweets from Specified News Sources

Now we can run the function based on our desired criteria.

Here is a screenshot of this output.

Let’s try something. Let’s export this as an html file so we can view the tweet text in its entirety in our browser…

Here is a screenshot what it looks like when opened in my browser:

Now we have a personal news feed. Pretty cool, right?!

Example 2

First 1000 tweets that include the word “flood” in Wisconsin and its surrounding area, from 7/18/19 through 7/20/19.

(I used the following code in a recent project I completed, using social media to map floods in the US. Using data from FEMA, I identified periods with and without definite flooding in a variety of states, and then scraped twitter using GetOldTweets3. After cleaning the data and performing NLP, I converted the text data to numeric, and trained a classifier to identify which “flood” tweets corresponded to an actual US flood.)

Now we can run the function based on our desired criteria.

Here is a screenshot of the output:

Last update: April 13, 2020