Inspired by a Medium post by Maxim Piessen, I created this Instagram following network visualiziation project. This repository contains my own implementation of:
- an Instagram Selenium web scraper (located in
/scraper
) which iterates through a list of users and stores their followed accounts in a JSON file. - a D3.js visualization (located in
/presentation
) which uses the data created by the scraper to visualize my Instagram network
This project was a lot of fun and I found out a lot of interesting facts about the connections between my different circles of friends.
Of course it would be a lot cooler to have a website where you can just log in with Instagram and have the application visualize the network for you automatically. However, at the time of writing (September 2022), Instagram has not published any APIs to make something like this possible (and very probably never will)
That's why scraping the data from Instagram is a long and annoying process of repeatedly scraping, getting blocked by Instagram and scraping again over the course of several weeks (depending on how many people you follow on Instagram).
If you want to undergo the process and use this application yourself, here's how you do it.
Before we start, a word of warning: Instagram doesn't want you to do this — I don't exactly know why, as it is not in any way a malicious activity to scrape following data in my opinion, but Instagram will (temporarily) block your account/IP from receiving following data once it detects automated behavior.
If you use this application, you use it AT YOUR OWN RISK. I am NOT RESPONSIBLE if you get permanently banned from Instagram or otherwise lose access to your account by using this application. I haven't had my own account banned by using this script and I doubt that it will happen to you, but it might very well be possible that Instagram changes this and starts actively locking accounts which show signs of automated behavior.
Clone the repository and install scraper
and presentation
git clone [email protected]:bemoty/instagram-following-network.git
cd instagram-following-network/scraper
yarn install
cd ../presentation
yarn install
Instagram frequently changes querySelector strings with every front-end redeploy. This is why you have to make sure that the queryStrings at the top of scraper/src/interact.ts
are up to date.
You can copy a querySelector string from DevTools by right-clicking the element in the Element tree and clicking Copy -> Copy selector
Depending on your operating system, there may be different steps needed to install Chromedriver. See the official Chromedriver website for more information on this topic.
Before you can scrape the followings of others, you will need to scrape your own followings. For this, you can simply use the create command
yarn start create <username>
The program will then scrape all your followings and save them in the import.json
file. The first time you do this, the application will ask you to log in, as it otherwise cannot retrieve the following count. Accept essential cookies, log in and re-run the app.
You might also want to manually inspect to import.json
file to remove any users that you do not want to scrape.
Now, simply run the scraper by opening a terminal in the /scraper
directory and running yarn start
.
A Chromium browser window will open and automatically navigate to the profiles scraped in step 2. The scraper will automatically shut down once it detects that Instagram has blocked you and exits.
You then have to re-run the the scraper on the next day (you're usually unblocked by then) and continue until the import file is empty.
Run the visualization app in the /presentation
directory by running yarn dev
there. Open the app in your web browser and wait for it to load. (This can take a while, be patient!)
After that you're done. You've successfully visualized your Instagram network. 🎉
You can click through the network manually or tinker with the presentation code to filter specific connections. Nodes closer together (clusters) usually mean that there is a mutual connection (e. g. same high school class, members of your football club, etc.)
Nodes further away from the clicked node are usually the most interesting ones as they indicate that a person knows a person from a different cluster.
Since I don't care about celebrity / corporate accounts and meme pages in my visualization, I created a ignored.json
file in /presentation/public
which contains a non-exhaustive list of accounts which I don't want to appear in my visualization.
Feel free to add more accounts to this list (you can use the Copy active node
button to copy usernames) to further filter your network.
This project is licensed under the terms of the MIT License.