Beautifulsoup Unable To Extract Data Using Attrs=class

February 28, 2024 Post a Comment

I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'}) in many websites but this particular website, WebSite Link

Solution 1:

My code is working fine, with requests

import requests
from BeautifulSoup import BeautifulSoup as bs
#grab HTML
r = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
html = r.text
#parse the HTML
soup = bs(html)

results= soup.findAll('div', attrs={'class': 'data'})

print results

Solution 2:

If you or anyone reading this question would like to know the reason that the code wasn't able to find the attrs value using the code you've given (copied below):

soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
    print div

The issue is when you attempted to create a BeautifulSoup object soup = bs(urlopen(url)) as the value of urlopen(url) is a response object and not the DOM.

I'm sure any issues you had encountered could have been more easily resolved by using bs(urlopen(url).read()) instead.

Html5 Code Review

Beautifulsoup Unable To Extract Data Using Attrs=class

Solution 1:

Solution 2:

Post a Comment for "Beautifulsoup Unable To Extract Data Using Attrs=class"