This tutorial explains how to scrape Reddit topics from any subreddit using Bash. To do this, here I will mention a Bash script that you can run from the terminal to get list of topics from any subreddit. This script takes name of a subreddit from you and then scrapes list of topics from it. It saves the list of topics with their corresponding URL, permalink in a TXT file that it creates in the current working directory. Along with the script file, you will have to install an external dependency and then you just have to run the script and get the results. Simple as that.
If you have PC running any version of Linux or you have Linux Subsystem installed on Windows 10 then you can easily execute this script. You can give it any subreddit and then it will take care of the rest. Reddit is the front page of internet and provides API access, but if you want to get data from Reddit without making API calls, then this method will be handy. You just run a single command and get all the topics from the specified subreddit scraped into a text file.
How to Scrape Reddit Topics from any Subreddit Using Bash?
Scraping Reddit using Bash is very simple actually. You just run terminal and then enter the following command. It will install any missing dependencies which are required for the script to work.
sudo apt-get install curl jq
Now, when the above command finishes, you can simply start using the following script. You can either download this from here or you can simply copy the following code. Save the file as “reddit.sh” at any location you want and then open the terminal in that directory.
#!/bin/bash
if [ -z "$1" ]
then
echo "Please specify a subreddit"
exit 1
fi
SUBREDDIT=$1
NOW=$(date +"%m_%d_%y-%H_%M")
OUTPUT_FILE="${SUBREDDIT}_${NOW}.txt"
curl -s -A "bash-scrape-topics" https://www.reddit.com/r/${SUBREDDIT}.json | \
jq '.data.children | .[] | .data.title, .data.url, .data.permalink' | \
while read -r TITLE; do
read -r URL
read -r PERMALINK
echo -e "${TITLE}\t${URL}\t${PERMALINK}" | tr --delete \" >> ${OUTPUT_FILE}
done
Before running the script, you will have to make it executable. To do that, run the following command. After that, the script is all ready to run from the terminal.
chmod a+x ./reddit.sh
Now, simply run the script like this. Along with the name of the script, you have to specify the name of the subreddit from which, you want to get the list of topics. It will take a couple of seconds to process the JSON data. When the command finishes, you will see a text file in the current working directory with name of the subreddit.
./reddit.sh IndianPeopleQuora
That’s it. In this way, you can use this simple script to scrape topics from Reddit using Bash. You just run the script and it will take care of the rest. The text file that it creates contains the title of the topic, its URL, and permalink. You can see the format of the text file above. Also, if you have some technical knowledge then you can opt to make this script save the final file in CSV format.
Related posts:
Final thoughts
Scraping Reddit is all fun and games and if you want to do that the above script will help you. You can simply download all the hot topics from any subreddit easily. Just give name of any subreddit to this script and it will scrape them for you. If you are a Linux expert then you can put this script on autopilot to automatically keep scraping topics. You can use cron jobs for that.