Zend Framework 2 : Grab web content using Zend\Http\Client and Zend\Dom\Query
Sometime, we need to grab web content for our application, and then, get the content of selection element ( by tagname for example ). We need DOMElement nodeValue by atributeNode and its textContent. Zend Framework has components that utilize this needed. The components are Zend\Http and Zend\Dom.
For example, i want to grab the titles of the post and links from my latest blog post instead of using RSS, and show in my application like the following :
This is the easy and simple code :
namespace SampleModule; use Zend\Mvc\Controller\AbstractActionController; use Zend\Http\Client as HttpClient; use Zend\Dom\Query; class GrabsampleController extends AbstractActionController { public function grabAction() { $client = new HttpClient(); $client->setAdapter('Zend\Http\Client\Adapter\Curl'); $response = $this->getResponse(); //set content-type $response->getHeaders()->addHeaderLine('content-type', 'text/html; charset=utf-8'); $client->setUri('https://samsonasik.wordpress.com/'); $result = $client->send(); //content of the web $body = $result->getBody(); $dom = new Query($body); //get div with id="content" and h2's NodeList $title = $dom->execute('#content h2'); $content = ''; foreach($title as $key=>$r) { //per h2 NodeList, has element with tagName = 'a' //DOMElement get Element with tagName = 'a' $aelement = $r->getElementsByTagName("a")->item(0); if ($aelement->hasAttributes()) { $content .= '* '; $content .= '<a href='.$aelement->getAttributeNode('href')->nodeValue.'>'; $content .= $aelement->textContent; $content .= '</a>'; $content .= "<br />"; } } $response->setContent($content); return $response; } }
Hi I have really been trying to do this, but for most the endpoints on our API, I have been getting “headers already sent” error once the view loads any idea why that would be?
Actually its turns out to be image urls that are causing the problem. Still no idea why.
have you active curl in your php configuration ? if header already sent, you can add
in first line of the codes.
[…] https://samsonasik.wordpress.com/2012/10/17/zend-framework-2-grab-web-content-using-zend-http-client-… […]
i’m trying to login to a website and save the cookies or HttpClient for later use. The objective is that, the application is not needed to login again while requesting pages that required login. However, by saving the HttpClient and/or cookies as class attribute doesn’t seem to work. The application has to post credentials to the target website, everytime it requests for a page. Is there away to save the login session for later use? Thank you.
in general, websites not give privileges to post data from local to remote via non direct access for security reason. If this is your site, You need to create web service to handle this or enable Cross Origin Resource Sharing to post data. If no, you should contact the web master of the site to create some “api” to handle way to login.
ooo, thank you. 🙂
You’re welcome 😉
Hi there, I find your article very interesting, but since I am new to Zend Framework, can you please guide step by step how to grab content and then display it using Zend_Dom_Quey. I mean: take this piece of ….. and put it in a controller, then this piece of code and put it in the Model, then take this piece of code and put it in the view. It would really help me!!!
PA
[…] https://samsonasik.wordpress.com/2012/10/17/zend-framework-2-grab-web-content-using-zend-http-client-… […]
[…] https://samsonasik.wordpress.com/2012/10/17/zend-framework-2-grab-web-content-using-zend-http-client-… […]
[…] https://samsonasik.wordpress.com/2012/10/17/zend-framework-2-grab-web-content-using-zend-http-client-… […]
I got response back from url that 404 bad request using http client. If i am hitting url directly or ajax, it works fine. Why http client isn’t working??
What I have to do if I would like to grab content form HTTPS urls?? Thanks
I remember you already asked at ZF2 group, please ask in there and do learn and effort.