GetOldTweets3

GetOldTweets3 is a free Python 3 library that allows you to scrape data from twitter without requiring any API keys. It also allows you to scrape historical tweets > 1 week old, which you cannot do with the Twitter API.

Using GetOldTweets3, you can scrape tweets using a variety of search parameters such as start/ end dates, username(s), text query search, and reference location area. Additionally, you can specify which tweet attributes you would like to include. Some attributes include: username, tweet text, date, retweets and hashtags.

Let’s run through some examples to illustrate the different ways we can use the GetOldTweets3 library to extract tweets:

Before you start, you will need to install GetOldTweets3.

!pip install GetOldTweets3

Example 1

Top 100 Recent Tweets from Specified News Sources

# Importing GetOldTweets3
import GetOldTweets3 as got
# Importing pandas
import pandas as pd

def get_tweets(username, top_only, start_date, end_date, max_tweets):

# specifying tweet search criteria
tweetCriteria = got.manager.TweetCriteria().setUsername(username)\
.setTopTweets(top_only)\
.setSince(start_date)\
.setUntil(end_date)\
.setMaxTweets(max_tweets)

# scraping tweets based on criteria
tweet = got.manager.TweetManager.getTweets(tweetCriteria)

# creating list of tweets with the tweet attributes
# specified in the list comprehension

text_tweets = [[tw.username,
tw.text,
tw.date,
tw.retweets,
tw.favorites,
tw.mentions,
tw.hashtags] for tw in tweet]

# creating dataframe, assigning column names to list of
# tweets corresponding to tweet attributes

news_df = pd.DataFrame(text_tweets,
columns = ['User', 'Text','Date', 'Favorites', 'Retweets', 'Mentions', 'HashTags'])

return news_df

Now we can run the function based on our desired criteria.

# Defining news sources I want to include
news_sources = ['nytimes', 'bbcbreaking', 'bbcnews', 'bbcworld', 'theeconomist', 'reuters','wsj', 'financialtimes', 'guardian']
# getting tweets from the defined new sources,
# only including top tweets,
# looking at the past week with the end_date not inclusive,
# and specifying that we want a max number of tweets = 100.
# also sorting the tweets by date, descending.
news_df = get_tweets(news_sources,
top_only = True,
start_date = "2020-04-07",
end_date = "2020-04-14",
max_tweets = 100).sort_values('Date', ascending=False)
news_df.head()

Here is a screenshot of this output.

Let’s try something. Let’s export this as an html file so we can view the tweet text in its entirety in our browser…

# exporting as an html file
news_df.to_html('news_twitter_summary.html')

Here is a screenshot what it looks like when opened in my browser:

Now we have a personal news feed. Pretty cool, right?!

Example 2

First 1000 tweets that include the word “flood” in Wisconsin and its surrounding area, from 7/18/19 through 7/20/19.

(I used the following code in a recent project I completed, using social media to map floods in the US. Using data from FEMA, I identified periods with and without definite flooding in a variety of states, and then scraped twitter using GetOldTweets3. After cleaning the data and performing NLP, I converted the text data to numeric, and trained a classifier to identify which “flood” tweets corresponded to an actual US flood.)

# Importing GetOldTweets3
import GetOldTweets3 as got
# Importing pandas
import pandas as pd
def get_tweets(state, startdate, enddate, maxtweet):
tweetCriteria = got.manager.TweetCriteria().setQuerySearch("Flood")\
.setSince(startdate)\
.setUntil(enddate)\
.setNear(state)\
.setWithin("500mi")\
.setMaxTweets(maxtweet)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)

text_tweets = [[tw.username,
tw.text,
tw.date,
tw.retweets,
tw.favorites,
tw.mentions,
tw.hashtags,
tw.geo] for tw in tweet]
df_state= pd.DataFrame(text_tweets, columns = ['User', 'Text', 'Date', 'Favorites', 'Retweets', 'Mentions','Hashtags', 'Geolocation'])

return df_state

Now we can run the function based on our desired criteria.

df_1 = get_tweets('Wisconsin', "2019-07-18", "2019-07-20", 1000)
df_1.head()

Here is a screenshot of the output:

Last update: April 13, 2020

Resources

--

--

--

LinkedIn: www.linkedin.com/in/andreayoss

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

A Brief Exploration of Country Difference Based on Stack Overflow Developer Survey 2018

Becoming a Data Scientist is a Perfect Match for those Who Work Part Time

Should I get a data science degree? (Part I)

When reading the criteria for the final project, the statement “Do something about the web on the…

Difference Between Data Science and Big Data Analytics

SAGE questions about Covid-19 – The R number

How Can I Scrape Emails From Text Files?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrea Yoss

Andrea Yoss

LinkedIn: www.linkedin.com/in/andreayoss

More from Medium

How to Extract Captions from YouTube Using Python?

Billboard to Spotify in Python

Flight arrivals web scraper

Web Scraping the Phillips India website — Scraping the Headphone Category.