pytrends Package TutorialΒΆ

The following data tutorial will provide instructions on how to install and utilize the pytrends package for python, which can automatically download reports from Google Trends. This tutorial will include basic functions of the pytrends package and how to use it to extract data from Google Trends. This tutorial will also demonstrate how pytrends can be used to study the amount of google searches for depression around the past 5 years to speculate if the COVID-19 pandemic lead people to make such searches for their own purposes.

Installation InstructionsΒΆ

To install the package, run the following commands in the terminal:

pip install pytrends

After installation, we will import the necessary packages in Python

In [1]:
# load pytrends and pandas
import pytrends
import pandas as pd

Getting StartedΒΆ

Now that you have pytrends installed, let's begin with importing the necessary libraries:

In [2]:
# google_trends.py
import pandas as pd
from pytrends.request import TrendReq

Step 1) Setting up PytrendsΒΆ

After importing the necessary libraries, including TrendReq from pytrends.request, we can now create a TrendReq object. The TrendReq class is the main interface with which we can access and utilize Google Trends data.

In [3]:
# creating a TrendReq Object
pytrends = TrendReq(hl='en-US', tz=360)
  • trendReq(...) here initializes a new instance of the TrendReq class, allowing us to make requests to the Google Trends API.

  • hl='en-US': here we see the parameter that specifies the language and region for the trends data. hl stands for host language, and en-US specifies that we are searching for data in english, the the US region. If we wanted data in Korean for the Korean region, we would inpute kr-KR for the language-region code.

  • tz=360: here we specify the timezone offset from UTC --> the value of 360 indicates that all timestamps for returned data will be displayed in UTC + 6:00. Adjusting this value can help specify which timezone the timestamps should be displayed in

With this object, we will now be able to search for trending keywords.

Step 2) Searching for KeywordsΒΆ

In order to search for the trends data of certain keyword, we must begin by creating a variable and assigning a string value to it. This string value will be our keyword. For the purpose of this tutorial, let us set our keyword as "depression" and assign it to a variable named keywords.

In [4]:
# Defining the Keyword
keyword = 'depression'

After creating our keyword variable, we must build a payload. Payloads are data sent in a request to a server to specify what information you want to retrieve, or what action you want to perform. The payload is a combined bundle of requests that are directed at the API.

In our payload, we want to specify that we are searching for a keyword, in a certain category(cat), over a select timeframe, in a specific geographical location(geo=''), while requesting it to be filtered by certain Google properties such as web searches or YouTube(gprop=''). For our example, let us specify that we are searching for the keyword depression in no specific category, over the past 5 years, internationally, across all Google properties.

In [5]:
# Build the payload
pytrends.build_payload([keyword], cat=0, timeframe='today 5-y', geo='', gprop='')
  • The cat=’’ value is set to 0 in order to specify that we are not trying to look for our keyword in specific categories such as specific fields of research.

  • Our timeframe=’’ value is set to today 5-y. This allows us to analyze data trends regarding our keyword over the past 5 years.

  • By leaving the geo=’’ parameter empty, we are asking for global data, not native to a specific geographical region. If we wanted to limit our searches to a certain region, we would just need to input a region code corresponding to the desired region. (For instance, geo=’FR’ would return data from France)

  • The gprop=’’ parameter is also left empty, for we are looking for results across all Google properties such as YouTube and web searches. If we wanted to focus on YouTube trends, we could adjust the value to youtube by inputting gprop=’youtube’

With the payload created, we will now retrieve the data over time values from Google Trends regarding our keyword. In order to do so, we will use the pytrends.interest_over_time() command, which returns interest over time data for the specified keyword. In order to store this information, we will assign the value to a variable named data, which is a Pandas DataFrame. This DataFrame will contain timestamps and corresponding interest values for the keyword over time.

In [6]:
data = pytrends.interest_over_time()
/home/bms2202/.local/lib/python3.12/site-packages/pytrends/request.py:260: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df = df.fillna(False)

Let's take a brief look at our DataFrame - the following line of code will output the first five rows of our DataFrame data.

In [63]:
print(data.head())
            depression  isPartial
date                             
2019-10-13          86      False
2019-10-20          87      False
2019-10-27          81      False
2019-11-03          85      False
2019-11-10          85      False

isPartial = True:This means that the data for the given keyword and time period is incomplete. Google Trends might not have enough data to provide a full picture, especially for less popular search terms or for very recent time periods.

Step 3) Analyzing our dataΒΆ

In order to observe the search trends for our keyword over the past 10 years, we can create a graph representing the interest over time values stored in our DataFrame.

  1. Import Matplotlib

    The Matplotlib library is commonly employed for creating interactive visualizations in python. In order to import the pyplot module of the library that will help create these visualizations, run the following code.

In [7]:
# importing matplotlib
import matplotlib.pyplot as plt

By importing pyplot as plt, a common alias for pyplot, we are able to access its functions easily.

  1. Creating a Figure

    In order to create a figure for plotting, let's run the following code:

In [8]:
#  creating the figure
plt.figure(figsize=(20,10))
Out[8]:
<Figure size 2000x1000 with 0 Axes>
<Figure size 2000x1000 with 0 Axes>

The plt.figure=(’) line creates a new figure for plotting. The figsize=(20,10) argument specifies that the size of the figure will be 20 inches wide and 10 inches tall. The values can be adjusted to improve visibility.

  1. Plot the Data

    Now that we have our figure, let's plot our data from our DataFrame.

In [9]:
plt.figure(figsize=(20,10))

plt.plot(data.index, data[keyword], marker='o')
Out[9]:
[<matplotlib.lines.Line2D at 0x7f64eb57d100>]
No description has been provided for this image
  • plt.plot(...): This function creates a line plot

  • data.index: The index of the DataFrame data contains the timestamps(in dates) for the interest values. data.index specifies the x-axis of our plot

  • data[keyword]: Here we retrieve the interest values of our keyword β€œDepression” from the DataFrame. This function defines the y-axis of our plot

  • marker=’o’: This argument is going to add markers to our plot to improve the visibility of individual values along the line. The value of ’o’ will turn the shape of our markers into circles.

  1. Adding a Title

    Let’s not forget to add a title. Run the following code:

In [68]:
plt.figure(figsize=(20,10))
plt.plot(data.index, data[keyword], marker='o')

plt.title(f'Interest Over Time for β€œ{keyword}”')
Out[68]:
Text(0.5, 1.0, 'Interest Over Time for β€œdepression”')
No description has been provided for this image
  • plt.title(...) is a function setting the title of our plot

  • f'Interest Over Time for β€œ{keyword}”’: This is a formatted string(f-string) that is able to accomodate any changes made to our keyword variable. The f-string command provides greater ease, efficiency, and visual clarity in comparison to other formatting methods.

  1. Label the Axes

    Let's also remember to label our x and y axes! Run the following code:

In [69]:
plt.figure(figsize=(20,10))
plt.plot(data.index, data[keyword], marker='o')
plt.title(f'Interest Over Time for β€œ{keyword}”')

plt.xlabel('Date')
plt.ylabel('Interest')
Out[69]:
Text(0, 0.5, 'Interest')
No description has been provided for this image
  • plt.xlabel(β€˜Date’): This labels the x-axis as β€œDate”
  • plt.ylabel(β€˜Interest’): This labels the y-axis as β€œInterest”, which denotes the interest values for our keyword.
  1. Adding a Grid

    For improved visibility, let's add a grid to our plot. Run the following code:

In [70]:
plt.figure(figsize=(20,10))
plt.plot(data.index, data[keyword], marker='o')
plt.title(f'Interest Over Time for β€œ{keyword}”')
plt.xlabel('Date')
plt.ylabel('Interest')

plt.grid()
No description has been provided for this image

ConclusionΒΆ

With this tutorial, you are now able to use the pytrends package to search for Google trends data regarding different topics while controlling the search parameters such as date and time frame, language, or region.