Can't scrape Instagram.com?
See original GitHub issueSubject of the issue
Using chrome’s selector to scrape data on Instagram’s website yields nothing.
Your environment
- version of node: v9.11.2
- version of npm: 5.6.0
Steps to reproduce
copy the selector of an element in Chrome’s dev tools.
function getInstagramFollowers(username) {
let url = `https://www.instagram.com/${username}/`;
let selector = '#react-root > section > main > div > ul > li:nth-child(2) > span > span';
x(url, selector)((err, count) => {
console.log('COUNT', count)
})
}
getInstagramFollowers('facebook')
Expected behaviour
Get the number of followers back
Actual behaviour
count is an empty string
If you set the selector to just ‘html’, you get back what appears to be the raw js used before React.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5
Top Results From Across the Web
Can't scrape Instagram profile [closed] - Stack Overflow
The page is storing the data in Javascript variable inside the page. You can use this script to get the date from it:...
Read more >Why has my account been restricted for data scraping and ...
Data scraping goes against our Terms of Use for accessing and collecting information in unauthorized ways.
Read more >How to scrape Instagram posts, comments, and photos
How to scrape Instagram posts, comments, and photos · Step 1. Go to Apify Store for Instagram Scraper · Step 2. Insert your...
Read more >Scrape Instagram Data WITHOUT Getting Blocked or Banned
If you use ANY automated tool to scrape Instagram data, Instagram will detect it and block or ban ... Your browser can't play...
Read more >Instagram Scraper: How to Scrape Data From Instagram [2022]
Scrape Instagram followers, hashtags, comments, stories, posts, likes, emails, bio and other data with Python/Github.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@MiLeung instagram.com is a client-side React app, which means the HTML isn’t present in the request that comes from the server, instead the HTML is constructed using JavaScript. If you take a look at the HTML source of the site you’ll see the only plain HTML inside the
<body />
is<span id="react-root"></span>
.This means you’ll need to use a driver that understands JavaScript – you could try x-ray-phantom which will require
phantomjs
be installed on your computer/server.However the data you’re looking for (follower count) is still available elsewhere in the HTML source of the page.
For example, in the header you’ll find:
Which could be selected and parsed.
However an even better source of data would be the
<script>
tag in the<body>
which contains the initial data object used by their React app to render. We can usex-ray
or just about anything to reach in and grab that data.You can see a working online example here: https://repl.it/@levibuzolic/x-ray-instagram-followers
This whole thing of course is pretty brittle and like scraping of any website relies on Instagram not changing their HTML or JS data structure for it to be able to continue working. Instagram has an API you could just use, or there’s 3rd party sites/tools that will get this data for you and they’ll take on the burden of keeping their service working for you.
@MiLeung while not specific to React, the fact you should be able to tell you’re dealing with a client side app by looking at the difference between the HTML that comes back in the request (view source) vs the HTML that’s present after JS has run (inspect elements).