Skip to content Skip to sidebar Skip to footer

Unable To Scrape The Text From A Certain Li Element

I am scraping this URL. I have to scrape the main content of the page like Room Features and Internet Access Here is my code: for h3s in Column: # Suppose this is div.RightColum

Solution 1:

BeautifulSoup elements only have a .string value if that string is the only child in the element. Your <li> tag has a <span> element as well as a text.

Use the .text attribute instead to extract all strings as one:

print(v.text.strip())

or use the element.get_text() method:

print(v.get_text().strip())

which also takes a handy strip flag to remove extra whitespace:

print(v.get_text(' ', strip=True))

The first argument is the separator used to join the various strings together; I used a space here.

Demo:

>>>from bs4 import BeautifulSoup>>>sample = '''\...<h3>Internet Access</h3>...<ul>...    <li>Wired High Speed Internet Access in All Guest Rooms...     <span class="fee">...        25 USD per day...     </span>...  </li> ... </ul>...'''>>>soup = BeautifulSoup(sample)>>>soup.li
<li>Wired High Speed Internet Access in All Guest Rooms
     <span class="fee">
        25 USD per day
     </span>
</li>
>>>soup.li.string>>>soup.li.text
u'Wired High Speed Internet Access in All Guest Rooms\n     \n        25 USD per day\n     \n'
>>>soup.li.get_text(' ', strip=True)
u'Wired High Speed Internet Access in All Guest Rooms 25 USD per day'

Do make sure you call it on the element:

for index, test in enumerate(h3s.select("h3")):
    print("Feature title: ", test.text)
    ul = h3s.select("ul")[index]
    print(ul.get_text(' ', strip=True))

You could use the find_next_sibling() function here instead of indexing into a .select():

for header in h3s.select("h3"):
    print("Feature title: ", header.text)
    ul = header.find_next_sibling("ul")
    print(ul.get_text(' ', strip=True))

Demo:

>>>for header in h3s.select("h3"):...print("Feature title: ", header.text)...    ul = header.find_next_sibling("ul")...print(ul.get_text(' ', strip=True))... 
Feature title: Room Features
Non-Smoking Room Connecting Rooms Available Private Terrace Sea View Room Suites Available Private Balcony Bay View Room Honeymoon Suite Starwood Preferred Guest Room Room with Sitting Area
Feature title: Internet Access
Wired High Speed Internet Access in All Guest Rooms 25 USD per day

Post a Comment for "Unable To Scrape The Text From A Certain Li Element"