入门客AI创业平台(我带你入门,你带我飞行)
博文笔记

使用php simple html dom parser解析html标签

创建时间:2012-12-17 投稿人: 浏览次数:3518

 

使用php simple html dom parser解析html标签

用了一下 解析HTML页面,感觉还不错,它能创建一个DOM tree方便你解析html里面的内容。用来抓东西挺好的。   附带一个例子,你也到sourceforge下载压缩包看里面的例子:   PHP Simple HTML DOM Parser , written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts using complicated regexes to extract information from web pages. Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website: view plain copy to clipboard print ?   Php代码
  1. // Create DOM from URL or file   
  2. $html = file_get_html("http://www.microsoft.com/");   
  3.   
  4. // Extract links   
  5. foreach($html->find("a") as $element)   
  6.        echo $element->href . "<br>";    
  7.   
  8. // Extract images   
  9. foreach($html->find("img") as $element)   
  10.        echo $element->src . "<br>";  
The parser can also be used to modify HTML elements: view plain copy to clipboard print ?   Php代码
  1. // Create DOM from string   
  2. $html = str_get_html("<div id="simple">Simple</div><div id="parser">Parser</div>");   
  3.   
  4. $html->find("div", 1)->class = "bar";   
  5.   
  6. $html->find("div[id=simple]", 0)->innertext = "Foo";   
  7.   
  8. // Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>   
  9. echo $html;  
Do you wish to retrieve content without any tags? view plain copy to clipboard print ?   Php代码
  1. echo file_get_html("http://www.yahoo.com/")->plaintext;  
In the package files of this parser ([url]http://simplehtmldom.sourceforge.net/[/url]) you can find some scraping examples from digg, imdb, slashdot. Let’s create one that extracts the first 10 results (titles only) for the keyword “php” from Google: view plain copy to clipboard print ?   Php代码
  1. $url = "http://www.google.com/search?hl=en&q=php&btnG=Search";   
  2.   
  3. // Create DOM from URL   
  4. $html = file_get_html($url);   
  5.   
  6. // Match all "A" tags that have the class attribute equal with "l"   
  7. foreach($html->find("a[class=l]") as $key => $info)   
  8. {   
  9. echo ($key + 1).". ".$info->plaintext."<br /> ";   
  10. }  
NOTE Make sure to include the parser before using any functions of it: view plain copy to clipboard print ? Php代码
  1. include "simple_html_dom.php";  
For more information regarding the usage of this function consider checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the package files use the following URL: [url]
声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。