PHP5: Screen scraping with DOM and XPath
This tutorial is continuation from previous yahoo screen-scraping using PHP4 tutorial. First, a bit knowledge of XPath is required. More about XPATH can be read on: http://www.zvon.org/xxl/XPathTutorial/General/examples.html Also there's small concern that using XPATH is a bit slower than pure DOM Traversal. Read Speed: DOM traversal vs. XPath in PHP 5 Let's start. First we diagnose document structure using Mozilla Firebug. /html/body/center/table[8]/tbody/tr/td[5]/table[4]/tbody/tr/td/font/b Now we get our first XPath query: /html/body/center/table[8]/tr/td[5]/table[4]/tr[1]/td/font Next harder case is to grab contents. /html/body/center/table[8]/tbody/tr/td[5]/table[4]/tbody/tr[2]/td[2]/a/font/b Final XPath query for content is: /html/body/center/table[8]/tr/td[5]/table[4]/tr/td[2]/a/font/b Now final step is to put all two XPath queries into few lines of code, and we're done:
We will try different method using DOM and XPath which only supported in PHP5.
But i personally also think that XPath is neat and easier.
Try a very easy case, which is to grab the title "Top Movies":
Copy XPath using Firebug and get this query:
XPath query from Firebug is: