Querying HTML and XML Documents

Zend\Dom\Query provides mechanisms for querying XML and HTML documents utilizing either XPath or CSS selectors. It was developed to aid with functional testing of MVC applications, but could also be used for development of screen scrapers.

CSS selector notation is provided as a simpler and more familiar notation for web developers to utilize when querying documents with XML structures. The notation should be familiar to anybody who has developed Cascading Style Sheets or who utilizes javascript toolkits that provide functionality for selecting nodes utilizing CSS selectors. Prototype's $$(), Dojo's dojo.query, and jQuery were all inspirations for the component.

Theory of Operation

To use Zend\Dom\Query, you instantiate a Zend\Dom\Query object, optionally passing a document to query (a string). Once you have a document, you can use either the execute() or queryXpath() methods; each method will return a Zend\Dom\NodeList object with any matching nodes.

The primary difference between Zend\Dom\Query and using DOMDocument + DOMXPath is the ability to select against CSS + selectors. You can utilize any of the following, in any combination:

Once you've performed your query, you can then work with the result object to determine information about the nodes, as well as to pull them and/or their content directly for examination and manipulation. Zend\Dom\NodeList implements Countable and Iterator, and stores the results internally as a DOMDocument and DOMNodeList.

As an example, consider the following call, that selects against the HTML above:

use Zend\Dom\Query;

$dom = new Query($html);
$results = $dom->execute('.foo .bar a');

$count = count($results); // get number of matches: 4
foreach ($results as $result) {
    // $result is a DOMElement
}

Zend\Dom\Query also allows straight XPath queries utilizing the queryXpath() method; you can pass any valid XPath query to this method, and it will return a Zend\Dom\NodeList object.

Methods Available

Below is a listing of methods available in the various classes exposed by zend-dom.

Zend\Dom\Query

The following methods are available to Zend\Dom\Query:

Zend\Dom\NodeList

As mentioned previously, Zend\Dom\NodeList implements both Iterator and Countable, and as such can be used in a foreach() loop as well as with the count() function. Additionally, it exposes the following methods: