pytrends Package TutorialΒΆ
The following data tutorial will provide instructions on how to install and utilize the pytrends package for python, which can automatically download reports from Google Trends. This tutorial will include basic functions of the pytrends package and how to use it to extract data from Google Trends. This tutorial will also demonstrate how pytrends can be used to study the amount of google searches for depression around the past 5 years to speculate if the COVID-19 pandemic lead people to make such searches for their own purposes.
Installation InstructionsΒΆ
To install the package, run the following commands in the terminal:
pip install pytrends
After installation, we will import the necessary packages in Python
# load pytrends and pandas
import pytrends
import pandas as pd
Getting StartedΒΆ
Now that you have pytrends installed, let's begin with importing the necessary libraries:
# google_trends.py
import pandas as pd
from pytrends.request import TrendReq
Step 1) Setting up PytrendsΒΆ
After importing the necessary libraries, including TrendReq
from pytrends.request
, we can now create a TrendReq
object. The TrendReq
class is the main interface with which we can access and utilize Google Trends data.
# creating a TrendReq Object
pytrends = TrendReq(hl='en-US', tz=360)
trendReq(...) here initializes a new instance of the TrendReq class, allowing us to make requests to the Google Trends API.
hl='en-US': here we see the parameter that specifies the language and region for the trends data. hl stands for host language, and en-US specifies that we are searching for data in english, the the US region. If we wanted data in Korean for the Korean region, we would inpute kr-KR for the language-region code.
tz=360: here we specify the timezone offset from UTC --> the value of 360 indicates that all timestamps for returned data will be displayed in UTC + 6:00. Adjusting this value can help specify which timezone the timestamps should be displayed in
With this object, we will now be able to search for trending keywords.
Step 2) Searching for KeywordsΒΆ
In order to search for the trends data of certain keyword, we must begin by creating a variable and assigning a string value to it. This string value will be our keyword. For the purpose of this tutorial, let us set our keyword as "depression"
and assign it to a variable named keywords
.
# Defining the Keyword
keyword = 'depression'
After creating our keyword variable, we must build a payload. Payloads are data sent in a request to a server to specify what information you want to retrieve, or what action you want to perform. The payload is a combined bundle of requests that are directed at the API.
In our payload, we want to specify that we are searching for a keyword
, in a certain category(cat
), over a select timeframe
, in a specific geographical location(geo=''
), while requesting it to be filtered by certain Google properties such as web searches or YouTube(gprop=''
).
For our example, let us specify that we are searching for the keyword depression in no specific category, over the past 5 years, internationally, across all Google properties.
# Build the payload
pytrends.build_payload([keyword], cat=0, timeframe='today 5-y', geo='', gprop='')
The
cat=ββ
value is set to 0 in order to specify that we are not trying to look for our keyword in specific categories such as specific fields of research.Our
timeframe=ββ
value is set totoday 5-y
. This allows us to analyze data trends regarding our keyword over the past 5 years.By leaving the
geo=ββ
parameter empty, we are asking for global data, not native to a specific geographical region. If we wanted to limit our searches to a certain region, we would just need to input a region code corresponding to the desired region. (For instance,geo=βFRβ
would return data from France)The
gprop=ββ
parameter is also left empty, for we are looking for results across all Google properties such as YouTube and web searches. If we wanted to focus on YouTube trends, we could adjust the value toyoutube
by inputtinggprop=βyoutubeβ
With the payload created, we will now retrieve the data over time values from Google Trends regarding our keyword.
In order to do so, we will use the pytrends.interest_over_time()
command, which returns interest over time data for the specified keyword. In order to store this information, we will assign the value to a variable named data
, which is a Pandas DataFrame. This DataFrame will contain timestamps and corresponding interest values for the keyword over time.
data = pytrends.interest_over_time()
/home/bms2202/.local/lib/python3.12/site-packages/pytrends/request.py:260: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)` df = df.fillna(False)
Let's take a brief look at our DataFrame - the following line of code will output the first five rows of our DataFrame data
.
print(data.head())
depression isPartial date 2019-10-13 86 False 2019-10-20 87 False 2019-10-27 81 False 2019-11-03 85 False 2019-11-10 85 False
isPartial = True
:This means that the data for the given keyword and time period is incomplete. Google Trends might not have enough data to provide a full picture, especially for less popular search terms or for very recent time periods.
Step 3) Analyzing our dataΒΆ
In order to observe the search trends for our keyword over the past 10 years, we can create a graph representing the interest over time values stored in our DataFrame.
Import Matplotlib
The Matplotlib library is commonly employed for creating interactive visualizations in python. In order to import the
pyplot
module of the library that will help create these visualizations, run the following code.
# importing matplotlib
import matplotlib.pyplot as plt
By importing pyplot
as plt
, a common alias for pyplot
, we are able to access its functions easily.
Creating a Figure
In order to create a figure for plotting, let's run the following code:
# creating the figure
plt.figure(figsize=(20,10))
<Figure size 2000x1000 with 0 Axes>
<Figure size 2000x1000 with 0 Axes>
The plt.figure=(β)
line creates a new figure for plotting. The figsize=(20,10)
argument specifies that the size of the figure will be 20 inches wide and 10 inches tall. The values can be adjusted to improve visibility.
Plot the Data
Now that we have our figure, let's plot our data from our DataFrame.
plt.figure(figsize=(20,10))
plt.plot(data.index, data[keyword], marker='o')
[<matplotlib.lines.Line2D at 0x7f64eb57d100>]
plt.plot(...)
: This function creates a line plotdata.index
: The index of the DataFramedata
contains the timestamps(in dates) for the interest values.data.index
specifies the x-axis of our plotdata[keyword]
: Here we retrieve the interest values of our keyword βDepressionβ from the DataFrame. This function defines the y-axis of our plotmarker=βoβ
: This argument is going to add markers to our plot to improve the visibility of individual values along the line. The value ofβoβ
will turn the shape of our markers into circles.
Adding a Title
Letβs not forget to add a title. Run the following code:
plt.figure(figsize=(20,10))
plt.plot(data.index, data[keyword], marker='o')
plt.title(f'Interest Over Time for β{keyword}β')
Text(0.5, 1.0, 'Interest Over Time for βdepressionβ')
plt.title(...)
is a function setting the title of our plotf'Interest Over Time for β{keyword}ββ
: This is a formatted string(f-string) that is able to accomodate any changes made to our keyword variable. The f-string command provides greater ease, efficiency, and visual clarity in comparison to other formatting methods.
Label the Axes
Let's also remember to label our x and y axes! Run the following code:
plt.figure(figsize=(20,10))
plt.plot(data.index, data[keyword], marker='o')
plt.title(f'Interest Over Time for β{keyword}β')
plt.xlabel('Date')
plt.ylabel('Interest')
Text(0, 0.5, 'Interest')
plt.xlabel(βDateβ)
: This labels the x-axis as βDateβplt.ylabel(βInterestβ)
: This labels the y-axis as βInterestβ, which denotes the interest values for our keyword.
Adding a Grid
For improved visibility, let's add a grid to our plot. Run the following code:
plt.figure(figsize=(20,10))
plt.plot(data.index, data[keyword], marker='o')
plt.title(f'Interest Over Time for β{keyword}β')
plt.xlabel('Date')
plt.ylabel('Interest')
plt.grid()
ConclusionΒΆ
With this tutorial, you are now able to use the pytrends package to search for Google trends data regarding different topics while controlling the search parameters such as date and time frame, language, or region.