Method: PHPCrawler::handleDocumentInfo()

Override this method to get access to all information about a page or file the crawler found and received.

public handleDocumentInfo(PHPCrawlerDocumentInfo $PageInfo)


$PageInfo PHPCrawlerDocumentInfo A PHPCrawlerDocumentInfo-object containing all information about the currently received document.
Please see the reference of the PHPCrawlerDocumentInfo-class for detailed information.


int  The crawling-process will stop immedeatly if you let this method return any negative value.


Everytime the crawler found and received a document on it's way this method will be called.
The crawler passes all information about the currently received page or file to this method
by a PHPCrawlerDocumentInfo-object.

Please see the PHPCrawlerDocumentInfo documentation for a list of all properties describing the

Example:class MyCrawler extends PHPCrawler
  function handleDocumentInfo($PageInfo)
    // Print the URL of the document
    echo "URL: ".$PageInfo->url."<br />";

    // Print the http-status-code
    echo "HTTP-statuscode: ".$PageInfo->http_status_code."<br />";

    // Print the number of found links in this document
    echo "Links found: ".count($PageInfo->links_found_url_descriptors)."<br />";
    // ..