Skip to content Skip to sidebar Skip to footer

Querying An Html Page With Xpath In Java

Can anyone advise me a library for Java that allows me to perform an XPath Query over an html page? I tried using JAXP but it keeps giving me a strange error that I cannot seem to

Solution 1:

Setting the parser to "non validating" just turns off validation; it does not inhibit fetching of DTD's. Fetching of DTD is needed not just for validation, but also for entity expansion... as far as I recall.

If you want to suppress fetching of DTD's, you need to register a proper EntityResolver to the DocumentBuilderFactory or DocumentBuilder. Implement the EntityResolver's resolveEntity method to always return an empty string.

Solution 2:

Take a look at this:

http://www.w3.org/2005/06/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

Probably you have the parser set to perform DOM validation, and it is trying to retrieve the DTD. JAXP should have a way to disable DTD validation, and just run XPATH against a document assumed to be valid. I haven't used JAXP is many years so I'm sorry I couldn't be more helpful.

Post a Comment for "Querying An Html Page With Xpath In Java"