Loading [MathJax]/extensions/MathMenu.js

Monday, 18 September 2017

Putting some of my Python knowledge to a good use: a Reddit reading bot!

One of the perks of knowing a programming language is that you can build your own tools and applications. Depending on what you need, it may even be a fast process since you usually do not need to write production grade code and a detailed documentation (although it might still be helpful in the future).

I’ve got used to read news stuff on Reddit, however, it sometimes can be a bit time consuming since it tends to keep you wandering through every and each rabbit hole that pops up. This is fine if you are commuting and have just some spare time to spend on browsing the web but sometimes I just need a quick glance at what’s new and relevant to my interests.

In order to automate this search process, I’ve written a bot that takes as input a list of subreddits, a list of keywords and flags and browses each subreddit looking for the given keywords.

If a keyword is found inside either the body or in the title of a post which has been submitted in one of the selected subreddits, the post title and the links are either printed in the console or saved in a file (in this case the file name must be supplied when starting the search).

The bot is written using praw.


What do I need to use the bot?

In order to use the bot you’ll need to set up an app using your Reddit account and save the client_id, client_secret, username and password in a file named config_data.py which should be stored in the same folder as the reddit_browsing_bot_main.py script.

How does the bot work?

The bot is designed to be a command line application and can be used either in Linux terminal or in the PowerShell if you are the Windows type ;)

This choice of adopting a CLI was undoubtebly a bad choice if I wanted to make other people use the application but in my case I am the end user, and I like command line tools, a lot.

For each subreddit entered, the bot checks if that subreddit exists, if it doesn’t the subreddit is discarded. Then, within each subreddit, the bot searches the first –l posts and returns the posts that contained at least a keyword.

This is an example of use:

reddit_browsing_bot_main.py -s python -k pycon -l 80 -f new -o output.txt –v
In the example above I am searching the first 80 posts in the “new” section of the python 
subreddit for posts that mention pycon. The –o flag tells the program to output the results
of the search to the output.txt file. The –v flag makes the program print the output to the
console.
You can search in more subreddits and/or use more keywords, just separate each new
subreddit/keyword with a comma. If you did not supply an output file, the program will
just output the results to the console.
Type:
reddit_browsing_bot_main.py -h
for a help menu. Maybe in the future I’ll add some features but for now this is pretty much it. 

Is it ok to use the bot?

As far as I know, the bot is not violating any of the terms written in the Reddit’s API. Also,
the API calls are already limited by the praw module in order to comply with the Reddit’s
API limits. The bot is not downvoting nor upvoting any post, it just reads what is online.
Anyway, should you want to check the code yourself, it is available on its dedicated 
GitHub repository or on the project page.
I’ve also copied and pasted a gist below so that you can have a look at the code here:

#!usr/bin/python3
"""
Created on Mon Sep 4 15:06:35 2017
@author: Michy
"""
import os
import praw
import logging
import argparse
import config_data
from datetime import datetime
from prawcore import NotFound
VERSION = '1.0'
def find_relevant_posts(reddit_obj, subreddit_name, keyword, limit=50, flag='new'):
# This function looks for relevant posts in a given subreddit using the supplied
# keywords.
#
# Params:
# @reddit_obj: a Reddit instance.
# @subreddit_name: name of the subreddit to be searched (string)
# @keyword: keyword to be used for the search (string)
# @limit: maximum number of posts searched (integer).
# @flag: Reddit's posts flag (string).
#
# Returns a tuple of two lists, titles and urls containing the titles and
# the urls of the relevant posts, respectively.
#
subreddit = reddit.subreddit(subreddit_name)
if flag == 'new':
new_submissions = subreddit.new(limit=limit)
elif flag == 'rising':
new_submissions = subreddit.rising(limit=limit)
elif flag == 'controversial':
new_submissions = subreddit.controversial(limit=limit)
elif flag == 'top':
new_submissions = subreddit.top(limit=limit)
else:
new_submissions = subreddit.new(limit=limit)
urls = []
titles = []
for submission in new_submissions:
if not submission.stickied:
if keyword in submission.title.lower() or keyword in submission.selftext.lower():
urls.append(submission.url)
titles.append(submission.title)
return titles, urls
def find_relevant_posts_wider(reddit_obj, subreddit_names, keywords, limit=50, flag='new'):
# This function looks for relevant posts in each subreddit supplied using the
# keywords supplied in the keywords argument.
#
# Params:
# @reddit_obj: a Reddit instance.
# @subreddit_names: names of the subreddit to be searched (list of strings)
# @keywords: keywords to be used for the search (list of string)
# @limit: maximum number of posts searched (integer).
# @flag: Reddit's posts flag (string).
#
# Returns a tuple of two lists, titles_wider and urls_wider containing the
# titles and the urls of the relevant posts, respectively.
#
titles_wider = []
urls_wider = []
for subreddit in subreddit_names:
for keyword in keywords:
titles, urls = find_relevant_posts(reddit_obj, subreddit, keyword, limit, flag)
for t, u in zip(titles, urls):
titles_wider.append(t)
urls_wider.append(u)
return titles_wider, urls_wider
def save_findings(titles, urls, filename):
# This function saves the results of the search.
#
# Params:
# @titles: titles of the posts (list of strings).
# @urls: urls of the posts (list of strings).
# @filename: name of the file to save (string).
#
# Returns void.
#
filename = os.path.join(os.getcwd(), filename)
if os.path.exists(filename):
mode = 'a'
else:
mode = 'w'
with open(filename, mode) as f:
for t, u in zip(titles, urls):
f.write('\n'.join([t, u]))
f.write('\n\n')
print("Search results saved in {}".format(filename))
def check_subreddit_exists(reddit, subreddit):
# This function checks if a subreddit exists.
#
# Params:
# @reddit: a Reddit instance.
# @subreddit: subreddit to be checked (string).
#
# Returns: True if the subreddit exists, false otherwise.
#
exists = True
try:
reddit.subreddits.search_by_name(subreddit, exact=True)
except NotFound:
exists = False
return exists
def check_limit_range(limit):
# This function checks that the limit parameter is in the 1-500 range.
# If limit is not within the selected range, an ArgumentTypeError is raised.
#
# Params:
# @limit: limit to be checked (integer)
#
# Returns: limit
#
limit = int(limit)
if limit <= 0 or limit > 500:
raise argparse.ArgumentTypeError("{} is not a valid value".format(limit))
return limit
def setup_argparser():
# This function sets up the argument parser.
#
# Returns the arguments
#
parser = argparse.ArgumentParser(description='Reddit Browsing Bot version {}'.format(VERSION))
parser.add_argument('-s','--subreddits', type=str, required=True, help='Subreddits to look into.')
parser.add_argument('-k', '--keywords', type=str, required=True, help='Keywords to search for.')
parser.add_argument('-l', '--limit', type=check_limit_range, default=50, help='Maximum number of searches. Must be included in the range 1 - 500')
parser.add_argument('-f', '--flag', type=str, default='new', choices=['new', 'rising', 'controversial', 'top'], help='Reddit flags.')
parser.add_argument('-o', '--output', type=str, help='Output file name.')
parser.add_argument('-v', '--verbose', action='store_true', help='Be verbose? Prints output if flag is set.')
args = parser.parse_args()
return args
def setup_logger():
# This function sets up the logger.
#
# Returns logger.
#
logging.basicConfig(filename='reddit_bot_log.log', level=logging.DEBUG)
logger = logging.getLogger(name='Reddit Browsing Bot V. {}'.format(VERSION))
return logger
# Main
if __name__ == '__main__':
# Setup argument parser
args = setup_argparser()
# Initialize logger
logger = setup_logger()
# Retrieve arguments
subreddits = args.subreddits
keywords = args.keywords
limit = args.limit
flag = args.flag
filename = args.output
verbose = args.verbose
# Initialize reddit instance
reddit = praw.Reddit(client_id = config_data.client_id,
client_secret = config_data.client_secret,
username = config_data.username,
password = config_data.password,
user_agent = 'Reading bot looking for hot topics')
logger.log(logging.INFO, "Reddit instance initiated.")
# Check if every subreddits exist. Ignore those that do not exist
subreddits = [sub.lower() for sub in subreddits if check_subreddit_exists(reddit, sub.lower())]
# Check that length of keywords is > 1. Ignore keywords whose length is < 1
keywords = [key.lower() for key in keywords if len(key) > 1]
print("Subreddits searched: {} \nKeywords used {}\n\n".format(subreddits, keywords))
# Start search
logger.log(logging.INFO,
"Started search for {} in {} at {}".format(keywords,
subreddits,
datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
titles, urls = find_relevant_posts_wider(reddit, subreddits, keywords, limit, flag)
logger.log(logging.INFO, "Search ended.")
# Save findings if a filename has been provided.
if filename is not None:
logger.log(logging.INFO, "Saving data.")
save_findings(titles, urls, filename)
# If the program needs to be verbose or if filename has not been provided,
# print output to the console
if verbose or filename is None:
for t, u in zip(titles, urls):
print(t, u, sep='\n', end='\n\n')
# Main ended
logger.log(logging.INFO, "Main executon ended successfully.")
print("\n\nExiting....")

16 comments:

  1. I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
    python training in chennai

    ReplyDelete
  2. Through this post, I know that your good knowledge in playing with all the pieces was very helpful. I notify that this is the first place where I find issues I've been searching for. You have a clever yet attractive way of writing.data science course in malaysia

    ReplyDelete
  3. Great tips and very easy to understand. This will definitely be very useful for me when I get a chance to start my blog.data science certification malaysia

    ReplyDelete
  4. This article is a great article that I have seen in my python programming career so far.

    website development company in Surat Gujarat

    ReplyDelete
  5. Python Training in Pune
    Thanks for sharing the information about the python and keep updating us.This information is really useful to me.

    Python Classes in Pune

    ReplyDelete

  6. Thank you for sharing wonderful information with us to get some idea about that content.
    Python Online Training Course
    Python Developer Course Online Training

    ReplyDelete
  7. Hi, I read your whole blog. This is very nice. Good to know about the career in. Python Training & Certification , anyone interested can Python Training for making their career in this field.

    ReplyDelete
  8. Thanks for sharing such a useful and wonderful post.
    Python Course in Nagpur

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. amazing write , keep posting and if you are intresting in big data coder and code developer then checkout python classes in pune

    ReplyDelete
  11. Thanks for the post.
    also, check Python Classes in Pune

    ReplyDelete