HOW TO INSTALL A REDDIT DATA MINER, a tutorial by Andre Williams
This is a tutorial on how to install PRAW, a software package for Reddit. PRAW is an acronym that stands for "Python Reddit API Wrapper," which is to say that it gives you access to Reddit's API.
To install PRAW, the first thing you need to do is go to your terminal and insert the code "pip install praw"
Now, to import praw into this console so we can use it, we need to run the following script in Python:
import praw
Just like we had to do in the practice tutorials with other software packages, we have to obtain a developer ID that gives us access to Reddit's API. To do that, we need to go to this link after creating a reddit account: https://www.reddit.com/prefs/apps
Click the bubble that says "script," and put in whatever gibberish you want into the necessary columns.
Once you get a developer account, insert the following lines of code into Python:
reddit = praw.Reddit(
client_id="my client id",
client_secret="my client secret",
user_agent="my user agent",
)
after you run that script, we can start doing some simple things. For instance, if you want to have a list of the top 10 all time-rated posts from a given subreddit, you could do something like the following
subreddit = reddit.subreddit('sports')
for submission in subreddit.top(limit=10):
print(submission.title)
Weightlifter promised his wife to win an Olympic gold medal before she died in a car accident The LA Rams have an assistant coach whose job is to make sure Head Coach Sean McVay doesn't run into the officials A Pelicans fan snuck on to the court for warmups, stretched and put up a shot before the police escorted him off Jon Rahm skips the ball across the pond for the hole-in-one! "Just stay in there, you're done for tonight" The Monterrey Stadium. Mexico. Dwyane Wade was very pleased with this no-look pass from LeBron Synced videos of the Eagles fan running into the pillar Boxing referee Steve Willis really loves his job Mario Balotelli absolutely filthy goal earlier today.
Now we have list of the top 10 sweet stories and awesome moments from various sports lore.
(This was a simple "print" command, like we've done previously in class.)
In addition to extracting top 10 lists, we can also extract comments from other users.
If we wanted to get the gossip from a high-ranking post from r/politics, we can enter the following command:
submission = reddit.submission(id='1fsv6df')
submission.comments.replace_more(limit=0)
for comment in submission.comments.list():
print(comment.body)
To find a post's POST_ID, look for the 7 character string on a given post's link. For instance, in the post: https://www.reddit.com/r/politics/comments/1fsv6df/nyt_endorses_harris_as_the_only_choice_for/ the POST_ID is '1fsv6df'
That script got us a bounty of political gossip, but if we wanted to get just the top-rated stuff on the same post, we'd enter something like:
submission = reddit.submission(id='1fsv6df')
submission.comments.replace_more(limit=0)
top_comments = submission.comments.list()[:10]
for comment in top_comments:
print(comment.body)
As a reminder, this subreddit [is for civil discussion.](/r/politics/wiki/index#wiki_be_civil) In general, be courteous to others. Debate/discuss/argue the merits of ideas, don't attack people. Personal insults, shill or troll accusations, hate speech, any suggestion or support of harm, violence, or death, and other rule violations can result in a permanent ban. If you see comments in violation of our rules, please report them. For those who have questions regarding any media outlets being posted on this subreddit, please click [here](https://www.reddit.com/r/politics/wiki/approveddomainslist) to review our details as to our approved domains list and outlet criteria. We are actively looking for new moderators. If you have any interest in helping to make this subreddit a place for quality discussion, please fill out [this form](https://docs.google.com/forms/d/1y2swHD0KXFhStGFjW6k54r9iuMjzcFqDIVwuvdLBjSA). *** *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/politics) if you have any questions or concerns.* The New York Times editorial board on Monday endorsed Vice President Kamala Harris, calling her “the only patriotic choice for president” while painting a grim picture of a second term for former President Donald Trump. Rather than praise for its preferred candidate, the board led its endorsement of Harris by listing off disqualifying arguments against Trump. “It is hard to imagine a candidate more unworthy to serve as president of the United States,” the Times editorial board wrote. “This unequivocal, dispiriting truth — Donald Trump is not fit to be president — should be enough for any voter who cares about the health of our country and the stability of our democracy to deny him re-election,” the board, made up of 14 opinion journalists, wrote. “For this reason, regardless of any political disagreements voters might have with her, Kamala Harris is the only patriotic choice for president.” The endorsement of Harris is unsurprising — the editorial board has not backed a Republican for president since Dwight Eisenhower in 1956 — though still important given the paper’s influence. In July, 10 days before President Joe Biden left the race (and after the board called on him to do so), the board published a five-part, scathing editorial against Trump that struck many of the same chords as Monday’s story. NYT: “We’re endorsing Harris, why that’s bad for Biden” "And while the board admitted some of Harris’ plans are not as detailed as voters would like." What are people expecting here? She has been detailing her plans at her rallies and interviews. She is no more vague than any other past candidate. Her website has even more details. Trump can have concepts of a plan but Harris has to be detailed down to the letter. but trump is the only choice for evil tyrant maniac, so there's that each is the only choice for what they really are Yes. the amount of embarrassment the US has suffered by having 46% of the country take him seriously has been a devastating hit on our global credibility. Great, now maybe they can stop with their need to sanewash Trump and Vance while nitpicking Harris and Walz. A little late, NYT… but thanks for finally joining the rest of us in reality Also NYT: Trump calls for [pillow fight] for just one really rough hour, which has historical precedent ~~by the Nazis~~. Wow thats surprising given how hard they blow Trump and say how bad the Dems are for the country
Another cool thing you can do on PRAW is livestream comments on a post with the following script (be sure to be careful with the indentation of the lines of code. Python is a touchy console):
submission = reddit.submission(id='1fsv6df')
seen_comments = set()
while True:
submission.comments.replace_more(limit=None)
for comment in submission.comments:
if comment.id not in seen_comments:
seen_comments.add(comment.id)
print(f'New comment by {comment.author}: {comment.body}')
time.sleep(5)
If, for a research project, we wanted to look at the kinds of comments that tend to be the most high-ranking on a specific subreddit, we could use the following command to generate the top ten comments on a given post and the scores they get.
submission_id = '1fsv6df'
submission = reddit.submission(id=submission_id)
print(f"Post Title: {submission.title}")
print(f"Post Score: {submission.score}\n")
submission.comments.replace_more(limit=0)
top_comments = submission.comments.list()[:10]
for comment in top_comments:
print(f"Comment: {comment.body}")
print(f"Comment Score: {comment.score}\n")
Post Title: NYT endorses Harris as ‘the only choice’ for president. Post Score: 33003 Comment: As a reminder, this subreddit [is for civil discussion.](/r/politics/wiki/index#wiki_be_civil) In general, be courteous to others. Debate/discuss/argue the merits of ideas, don't attack people. Personal insults, shill or troll accusations, hate speech, any suggestion or support of harm, violence, or death, and other rule violations can result in a permanent ban. If you see comments in violation of our rules, please report them. For those who have questions regarding any media outlets being posted on this subreddit, please click [here](https://www.reddit.com/r/politics/wiki/approveddomainslist) to review our details as to our approved domains list and outlet criteria. We are actively looking for new moderators. If you have any interest in helping to make this subreddit a place for quality discussion, please fill out [this form](https://docs.google.com/forms/d/1y2swHD0KXFhStGFjW6k54r9iuMjzcFqDIVwuvdLBjSA). *** *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/politics) if you have any questions or concerns.* Comment Score: 1 Comment: The New York Times editorial board on Monday endorsed Vice President Kamala Harris, calling her “the only patriotic choice for president” while painting a grim picture of a second term for former President Donald Trump. Rather than praise for its preferred candidate, the board led its endorsement of Harris by listing off disqualifying arguments against Trump. “It is hard to imagine a candidate more unworthy to serve as president of the United States,” the Times editorial board wrote. “This unequivocal, dispiriting truth — Donald Trump is not fit to be president — should be enough for any voter who cares about the health of our country and the stability of our democracy to deny him re-election,” the board, made up of 14 opinion journalists, wrote. “For this reason, regardless of any political disagreements voters might have with her, Kamala Harris is the only patriotic choice for president.” The endorsement of Harris is unsurprising — the editorial board has not backed a Republican for president since Dwight Eisenhower in 1956 — though still important given the paper’s influence. In July, 10 days before President Joe Biden left the race (and after the board called on him to do so), the board published a five-part, scathing editorial against Trump that struck many of the same chords as Monday’s story. Comment Score: 3520 Comment: NYT: “We’re endorsing Harris, why that’s bad for Biden” Comment Score: 6330 Comment: "And while the board admitted some of Harris’ plans are not as detailed as voters would like." What are people expecting here? She has been detailing her plans at her rallies and interviews. She is no more vague than any other past candidate. Her website has even more details. Trump can have concepts of a plan but Harris has to be detailed down to the letter. Comment Score: 1880 Comment: but trump is the only choice for evil tyrant maniac, so there's that each is the only choice for what they really are Comment Score: 372 Comment: Yes. the amount of embarrassment the US has suffered by having 46% of the country take him seriously has been a devastating hit on our global credibility. Comment Score: 207 Comment: Great, now maybe they can stop with their need to sanewash Trump and Vance while nitpicking Harris and Walz. Comment Score: 139 Comment: A little late, NYT… but thanks for finally joining the rest of us in reality Comment Score: 626 Comment: Also NYT: Trump calls for [pillow fight] for just one really rough hour, which has historical precedent ~~by the Nazis~~. Comment Score: 178 Comment: Wow thats surprising given how hard they blow Trump and say how bad the Dems are for the country Comment Score: 372
After running this, we could find other submissions in the same subreddit, use the same lines of code to retrieve the top comments on those posts, and then see if we find any linguistic similarities across all the data we've gathered.