Skip to content Skip to sidebar Skip to footer

How To Scrape Web Page That Doesn't Show Its Data?

I want to scrape the following web page: https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/16/2019 As you can see, there is lot

Solution 1:

The website https://charlotte.realforeclose.com uses AJAX. You need to do some reverse engineering job to find out how does it work.

Open Chrome, press F12 to open Developer Tools or choose the option from the menu.

open chrome dev tools

Open Network tab, choose XHR filter, paste the URL https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/16/2019 to the browser address bar and press enter. Check XHRs logged on Network tab while the webpage is loading. First of all inspect XHRs having bigger response size.

XHRs

Click on the request in the list and check details. Here are URL, headers and parameters for request.

XHR request details

And the response content.

XHR response

Since the requests method is GET, you can just paste the URLs to address bar and retrieve the content. The URLs for me are:

https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563171184890&bypassPage=1&test=1&_=1563171184890
https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=C&PageDir=0&doR=1&tx=1563171185129&bypassPage=0&test=1&_=1563171185129

After playing a bit, you can easily find that parameter AREA=W is for "Auctions Waiting" section, and AREA=C is for "Auctions Closed or Canceled" section. Seems the parameters tx, bypassPage, test and _ are not necessary at all.

Open the first page with PageDir=0 and doR=1, after that navigate to next page with PageDir=1 and doR=0, and to previous page with PageDir=-1 and doR=0.

The first page https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1

response for first page

And the next page https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=1&doR=0

response for next page

Finally you just need to reproduce that XHRs from your application and parse the responses. Depending on implementation of HTTP requests you may need to add the necessary headers and cookies processing also.

Post a Comment for "How To Scrape Web Page That Doesn't Show Its Data?"