yysun's space

Data Extraction / Screen Scraping Using XPath

Posted on: May 16, 2007

Google AJAXSLT is an implementation of XSL-T in JavaScript, intended for use in fat web pages, which are nowadays referred to as AJAX applications. Because XSL-T uses XPath, it is also an implementation of XPath that can be used independently of XSL-T.

Selenium Core uses AJAXSLT’s XPath function to locate element on plain html. Selenium IDE can generate the XPath very much same as Solvent.

Here is a sample to extracting data from a web page using XPath using AJAXSLT.

<html><head>
<title>XPath-Test</title>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/misc.js”></script>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/dom.js”></script>
<
script language=”JavaScript” type=”text/javascript” src=”xpath/xpath.js”></script>
<
script language=”JavaScript” type=”text/javascript”>
function
findElementUsingFullXPath(xpath, inDocument) {
 
var context = new ExprContext(inDocument);
 
var xpathObj = xpathParse(xpath);
 
var xpathResult = xpathObj.evaluate(context);
 
if (xpathResult && xpathResult.value) {
    return xpathResult.value[0];
  }
  return null;
};
function start() {
  alert(findElementUsingFullXPath(
“//*[.=’b’]”, window.document).innerHTML);
}
</script>
</
head>
<
body onload=”start()”>
<
div><p>aaaaa</p>test<div>bb</div><a>b</a></div>
</
body>
</
html>

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


  • None
  • Pk: The horizontal view is a pleasure to use! Good thinking
  • randyburden: A valiant and commendable effort. Your use of Tuple is a little weird but it offers a feature that most other microORMs don't. Your use of a static Gu
  • reav: great work done on Rabbit Framework. just started to learning it, and by now i think it will solve all my problems and questions, that i had in webpag

Categories

%d bloggers like this: