Documentation for method: PHPCrawler::resume()

Method: PHPCrawler::resume()



Resumes the crawling-process with the given crawler-ID
Signature:

public resume($crawler_id)

Parameters:

$crawler_id int The crawler-ID of the crawling-process that should be resumed.
(see getCrawlerId())

Returns:

No information

Description:

If a crawling-process was aborted (for whatever reasons), it is possible
to resume it by calling the resume()-method before calling the go() or goMultiProcessed() method
and passing the crawler-ID of the aborted process to it (as returned by getCrawlerId()).

In order to be able to resume a process, it is necessary that it was initially
started with resumption enabled (by calling the enableResumption() method).

This method throws an exception if resuming of a crawling-process failed.


Example of a resumeable crawler-script:// ...
$crawler = new MyCrawler();
$crawler->enableResumption();
$crawler->setURL("www.url123.com");

// If process was started the first time:
// Get the crawler-ID and store it somewhere in order to be able to resume the process later on
if (!file_exists("/tmp/crawlerid_for_url123.tmp"))
{
  $crawler_id = $crawler->getCrawlerId();
  file_put_contents("/tmp/crawlerid_for_url123.tmp", $crawler_id);
}

// If process was restarted again (after a termination):
// Read the crawler-id and resume the process
else
{
  $crawler_id = file_get_contents("/tmp/crawlerid_for_url123.tmp");
  $crawler->resume($crawler_id);
}

// ...

// Start your crawling process
$crawler->goMultiProcessed(5);

// After the process is finished completely: Delete the crawler-ID
unlink("/tmp/crawlerid_for_url123.tmp");