Skip to content Skip to sidebar Skip to footer

Excel-vba: Buttons And Drop-down Lists Before Scraping Data On .aspx Web Pages

I'd like to use VBA to get a table from a webpage, but I cannot access it directly because I need to switch from a tab to another. The problem relies on the .aspx end of the web pa

Solution 1:

__doPostBack and onclick event:

When you inspect the HTML associated with selecting Main Content div bar e.g. Long Terme, you can see that there is a java script __doPostBack function associated with the onclick event for the individual bar items.

Observe the HTML in question:

__doPostBack

To quote from my link above:

The function takes the following two arguments:

eventTarget - This contains the ID of the control that caused the post back. eventArgument - This contains any additional data associated with the control.

In any ASP.NET page the two hidden fields: __EVENTTARGET and __EVENTARGUMENT are automatically declared. When a page is posted back to the server ASP.NET inspects __EVENTTARGET and __EVENTARGUMENT values and this way it can decide which of the controls caused the page to be posted back and what is the event that has to be handled.

tldr;

In the "olden" days of ASP there often had to be a form to capture user inputs and then further pages created to accept those inputs (GET or POST), validate, perform actions and the like. With ASP.NET you can declare controls on the server which accept the above arguments and post back to the same page after inspecting the values of

The first argument tells you which control was fired and the second argument provides additional information, which in this case determines what tab information is returned.

We can see from the above that TabAction is the control and that the number following it corresponds to the tab of interest e.g. 2 for Long Terme (as 0 - indexed).

In VBA we can execute this JS function in a number of ways but I will use:

.document.parentWindow.execScript "__doPostBack('EVENTTARGET', 'EVENTARGUMENT')"

This becomes:

.document.parentWindow.execScript "__doPostBack('TabAction', '2')"

Which I re-write to accept the EVENTARGUMENT as a constant, OPTION_CHOSEN, so different tabs can be retrieved by changing the its value at the top.

After executing the function a little time is left to refresh the page and then the table is grabbed by its id:

SethTable= .document.getElementById("ctl00_ctl00_MainContent_Layout_1MainContent_gridResult")

Then the table is looped along its rows and columns (columns being the table cells along the length of each row).


Examples from page:

Page


Examples from code output:

Example code output


Full code:

Option Explicit
Public Sub GetTable()
    Dim IE AsNew InternetExplorer
    Const OPTION_CHOSEN As Long = 2'0 Aperçu; 1 Court terme; 2 Long terme; 3 Portefeuille; 4 Frais & Détails

    Application.ScreenUpdating = True
    With IE
        .Visible = True
        .navigate "http://www.morningstar.fr/fr/fundquickrank/default.aspx"
        While .readyState < 4: DoEvents: Wend

        .document.parentWindow.execScript "__doPostBack('TabAction', '" & OPTION_CHOSEN & "')"

        Do While .Busy = True Or .readyState <> 4: DoEvents: Loop

        Dim hTable As HTMLTable, tRow As HTMLTableRow, tCell As HTMLTableCell

        Set hTable = .document.getElementById("ctl00_ctl00_MainContent_Layout_1MainContent_gridResult")
        Dim c As Long, r As Long
        With ActiveSheet
            For Each tRow In hTable.Rows
                For Each tCell In tRow.Cells
                    c = c + 1: .Cells(r + 1, c) = tCell.innerText
                Next tCell
                c = 0: r = r + 1
            Next tRow
            .Columns("A:A").Delete
            .UsedRange.Columns.AutoFit
        End With
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub

References (VBE > Tools > References):

  1. Microsoft Internet Controls

Solution 2:

Sub Get_Info()

Dim Elems, e As Variant

Const READYSTATE_COMPLETE&=4&
Dim ie As Object
Set ie = Nothing
DoEvents

Set ie = CreateObject("InternetExplorer.Application")
DoEvents

With ie
    .Visible =True
    .Navigate "http://www.morningstar.fr/fr/fundquickrank/default.aspx"
    While Not .readyState = READYSTATE_COMPLETE
        DoEvents
    Wend
EndWithWith ie.Document
    Set Elems = .getElementsByTagName("span")
    DoEvents
    ForEach e In Elems
        If e.getAttribute("onclick") = "__doPostBack('TabAction', '2')" Then
            e.Click
            'try to insert your table export code here
            Exit For
        End If
    Next e

End With

Set Elems = Nothing
Set e = Nothing

'ie.Quit Quit Internet Explorer once the exporting is done
Set ie = Nothing

MsgBox "Done"
End Sub

The code above navigates directly to the tab you want. Try to combine it with your code to export the table and maybe it will work. ie.Visible=True just to make sure you navigate to the right URL, but make it false once you see it works. Hope it helps!

Solution 3:

If you wish to get the tabular data from that target page using IE then this is one such way to achieve that.

Sub Fetch_Data()
    Dim IE AsNew InternetExplorer, html As HTMLDocument
    Dim posts AsObject, post AsObject, elem AsObject, trow AsObject

    With IE
        .Visible = True
        .navigate "http://www.morningstar.fr/fr/fundquickrank/default.aspx"While .readyState < 4: DoEvents: Wend
        Set html = .document
    End With

    For Each post In html.getElementsByClassName("ms_tab_inactivetext")
        If InStr(post.innerText, "Long terme") > 0 Then post.ParentNode.Click: ExitFor
    Next post

    DoWhile IE.Busy = TrueOr IE.readyState <> 4: DoEvents: Loop

    Set posts = html.getElementById("ctl00_ctl00_MainContent_Layout_1MainContent_gridResult")

    For Each elem In posts.Rows
        For Each trow In elem.Cells
            c = c + 1: Cells(r + 1, c) = trow.innerText
        Next trow
        c = 0: r = r + 1
    Next elem
    IE.Quit
End Sub

Reference to add to the library:

1. Microsoft HTMLObject Library
2. Microsoft Internet Controls

Post a Comment for "Excel-vba: Buttons And Drop-down Lists Before Scraping Data On .aspx Web Pages"