Welcome to Abdul Malik Ikhsan's Blog

Zend Framework 2 : Grab web content using Zend\Http\Client and Zend\Dom\Query

Posted in Tutorial PHP, Zend Framework 2 by samsonasik on October 17, 2012

Sometime, we need to grab web content for our application, and then, get the content of selection element ( by tagname for example ). We need DOMElement nodeValue by atributeNode and its textContent. Zend Framework has components that utilize this needed. The components are Zend\Http and Zend\Dom.


For example, i want to grab the titles of the post and links from my latest blog post instead of using RSS, and show in my application like the following :

This is the easy and simple code :

namespace SampleModule;

use Zend\Mvc\Controller\AbstractActionController;
use Zend\Http\Client as HttpClient;
use Zend\Dom\Query;

class GrabsampleController extends AbstractActionController
{
    public function grabAction()
    {
        $client = new HttpClient();
        $client->setAdapter('Zend\Http\Client\Adapter\Curl');
        
        $response = $this->getResponse();
        //set content-type
        $response->getHeaders()->addHeaderLine('content-type', 'text/html; charset=utf-8'); 
        
        $client->setUri('https://samsonasik.wordpress.com/');
        $result                 = $client->send();
        //content of the web
        $body                   = $result->getBody();
        
        $dom = new Query($body);
        //get div with id="content" and h2's NodeList
        $title = $dom->execute('#content h2');
        
        $content = '';
        foreach($title as $key=>$r)
        {
            //per h2 NodeList, has element with tagName = 'a'
            //DOMElement get Element with tagName = 'a'
            $aelement     = $r->getElementsByTagName("a")->item(0);    
            
            if ($aelement->hasAttributes()) {
                $content .= '* ';                    
                $content .= '<a href='.$aelement->getAttributeNode('href')->nodeValue.'>';
                $content .= $aelement->textContent;
                $content .= '</a>';
                
                $content .= "<br />";
            }
        }
        
        $response->setContent($content);
        
        return $response;
    }
}

15 Responses

Subscribe to comments with RSS.

  1. jsamos said, on October 26, 2012 at 5:34 am

    Hi I have really been trying to do this, but for most the endpoints on our API, I have been getting “headers already sent” error once the view loads any idea why that would be?

    • jsamos said, on October 26, 2012 at 6:08 am

      Actually its turns out to be image urls that are causing the problem. Still no idea why.

      • samsonasik said, on October 26, 2012 at 12:54 pm

        have you active curl in your php configuration ? if header already sent, you can add

        ob_start();
        

        in first line of the codes.

  2. iostream said, on January 27, 2013 at 10:46 pm

    i’m trying to login to a website and save the cookies or HttpClient for later use. The objective is that, the application is not needed to login again while requesting pages that required login. However, by saving the HttpClient and/or cookies as class attribute doesn’t seem to work. The application has to post credentials to the target website, everytime it requests for a page. Is there away to save the login session for later use? Thank you.

  3. samsonasik said, on January 28, 2013 at 5:45 am

    in general, websites not give privileges to post data from local to remote via non direct access for security reason. If this is your site, You need to create web service to handle this or enable Cross Origin Resource Sharing to post data. If no, you should contact the web master of the site to create some “api” to handle way to login.

  4. Pananagiotis said, on November 2, 2013 at 9:07 pm

    Hi there, I find your article very interesting, but since I am new to Zend Framework, can you please guide step by step how to grab content and then display it using Zend_Dom_Quey. I mean: take this piece of ….. and put it in a controller, then this piece of code and put it in the Model, then take this piece of code and put it in the view. It would really help me!!!

    PA

  5. anshu said, on March 21, 2014 at 7:58 pm

    I got response back from url that 404 bad request using http client. If i am hitting url directly or ajax, it works fine. Why http client isn’t working??

  6. foysal said, on April 22, 2014 at 1:18 am

    What I have to do if I would like to grab content form HTTPS urls?? Thanks

    • samsonasik said, on April 22, 2014 at 9:55 am

      I remember you already asked at ZF2 group, please ask in there and do learn and effort.


Leave a comment