question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't scrape Instagram.com?

See original GitHub issue

Subject of the issue

Using chrome’s selector to scrape data on Instagram’s website yields nothing.

Your environment

  • version of node: v9.11.2
  • version of npm: 5.6.0

Steps to reproduce

copy the selector of an element in Chrome’s dev tools.

function getInstagramFollowers(username) {
  let url = `https://www.instagram.com/${username}/`;
  let selector = '#react-root > section > main > div > ul > li:nth-child(2) > span > span';

  x(url, selector)((err, count) => {
    console.log('COUNT', count)
  })
}

getInstagramFollowers('facebook')

Expected behaviour

Get the number of followers back

Actual behaviour

count is an empty string

If you set the selector to just ‘html’, you get back what appears to be the raw js used before React.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
levibuzoliccommented, Jun 28, 2018

@MiLeung instagram.com is a client-side React app, which means the HTML isn’t present in the request that comes from the server, instead the HTML is constructed using JavaScript. If you take a look at the HTML source of the site you’ll see the only plain HTML inside the <body /> is <span id="react-root"></span>.

This means you’ll need to use a driver that understands JavaScript – you could try x-ray-phantom which will require phantomjs be installed on your computer/server.

However the data you’re looking for (follower count) is still available elsewhere in the HTML source of the page.

For example, in the header you’ll find:

<meta property="og:description" content="3m Followers, 9 Following, 315 Posts - See Instagram photos and videos from Facebook (@facebook)" />

Which could be selected and parsed.

However an even better source of data would be the <script> tag in the <body> which contains the initial data object used by their React app to render. We can use x-ray or just about anything to reach in and grab that data.

const XRay = require('x-ray');
const x = XRay();

const url = 'https://instagram.com/facebook';

x(url, 'body script@html').then(res => {
  // First strip variable declaration
  res = res.replace('window._sharedData = ', '');

  // Next strip the trailing semi-colon as that's not valid JSON
  res = res.replace(/;$/, '');

  // Now we parse the string as JSON
  const data = JSON.parse(res);

  // Now we deeply select the user object from the data
  const user = data.entry_data.ProfilePage[0].graphql.user;

  // And console log just the follower count
  // however there's heaps of useful data in the user object
  console.log(user.edge_followed_by.count);
});

You can see a working online example here: https://repl.it/@levibuzolic/x-ray-instagram-followers

This whole thing of course is pretty brittle and like scraping of any website relies on Instagram not changing their HTML or JS data structure for it to be able to continue working. Instagram has an API you could just use, or there’s 3rd party sites/tools that will get this data for you and they’ll take on the burden of keeping their service working for you.

1reaction
levibuzoliccommented, Jun 29, 2018

@MiLeung while not specific to React, the fact you should be able to tell you’re dealing with a client side app by looking at the difference between the HTML that comes back in the request (view source) vs the HTML that’s present after JS has run (inspect elements).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Can't scrape Instagram profile [closed] - Stack Overflow
The page is storing the data in Javascript variable inside the page. You can use this script to get the date from it:...
Read more >
Why has my account been restricted for data scraping and ...
Data scraping goes against our Terms of Use for accessing and collecting information in unauthorized ways.
Read more >
How to scrape Instagram posts, comments, and photos
How to scrape Instagram posts, comments, and photos · Step 1. Go to Apify Store for Instagram Scraper · Step 2. Insert your...
Read more >
Scrape Instagram Data WITHOUT Getting Blocked or Banned
If you use ANY automated tool to scrape Instagram data, Instagram will detect it and block or ban ... Your browser can't play...
Read more >
Instagram Scraper: How to Scrape Data From Instagram [2022]
Scrape Instagram followers, hashtags, comments, stories, posts, likes, emails, bio and other data with Python/Github.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found