A Python utility for mirroring personal reddit.com/saved feeds, and the content on these feeds. This project is in early development, and being modified in wild, major, inconsistent ways.
Requires youtube-dl
, requests
, ffmpeg
and Python 3.6 or newer. bottle.py is required and packaged with this repo.
It is in this author's interest to avoid forcing a non-trivial amount of effort on the user to run this program, to the point of foregoing the use of official APIs, or anything that requires a specially generated API key. If you are able to access it in your web browser: You should be able to access it locally without any extra effort.
(Hopefully)
Clone repo, and create a user for your reddit account using the saved.json feed URL found on this page.
$ python3 create_user.py https://www.reddit.com/saved.json?feed=558862fc6069139f1b02bbb226a9cfcdaa0207cf&user=saucecode
If done correctly, it will create some folders under you username in the user
folder. Next you need to make a local copy of all your saved posts.
$ python3 download_user.py [your reddit username]
This will start downloading all your saved reddit posts (but not the content of these posts). It takes me around 15 seconds to pull close to 1000 of them. You should see some new files appearing in your user folder. Once this is done you can run
$ python3 review_user.py [your reddit username]
This (for now) creates a file index_review.txt
in your user folder. If it shows an approximate view of what your own reddit.com/saved page looks like, then you know its done its job.
If that all worked, you're all set to start downloading the actual pictures/videos. Beware, this can take some time, especially if you save a lot of videos.
$ python3 scrape_for_user.py [your reddit username]
You can configure a few aspects of this process in the rsaved.json
and config.json
files created in your user's folder. Not everything is implemented.
You can now view your local mirror by running a built in web server! Just run
$ python3 server.py
to launch a bottle.py server on port 8080. You can then go to http://localhost:8080/ and start browsing!
Every user gets two configuration files: rsaved.json
and config.json
.
rsaved.json
controls what you end up downloading. config.json
controls how you download it. In the config.json
you can set a custom User-Agent and specify a proxy (only SOCKS5 tested - HTTP/HTTPS will probably work).
This is where a lot of the magic happens - this file (once updated) contains the information about every saved post for this user. Let me tell you how to use it.
import rsaved
index = rsaved.load_index('your_username') # returns the content of index.pickle.gz
# print the URL of all the save posts from /r/aww.
for item in index:
if item['data']['subreddit'] == 'aww':
print(item['data']['url'])
# if you're not yet familiar with the reddit object structure, familiarize yourself now
import json
print( json.dumps(index[0], indent=4) )
- Because reddit won't display more than 1000 posts from your saved feed. Ever.
- So that it can be searched, filtered, and analysed.
- So that you can rip content which may one day be deleted.
- And most importantly, so that it can be searched.