我有银子弹-HTML的parser 工具列表及示范链接

2019-04-15 13:22发布

[img]http://bd7lx.iteye.com/upload/attachment/pic/2420/58acaa6a-bd8a-4661-bb32-46932f1f4e4b-thumb.jpg[/img]

:D

[img]http://bd7lx.iteye.com/upload/attachment/pic/2421/d7f401d5-5a9a-48a3-beb8-d37d901f89dc-thumb.jpg[/img]

其实想说的是鸡汤, 美丽的rubyful soup 和Hpricot 的 HTML Parser for Ruby

http://www.crummy.com/software/BeautifulSoup/

Rubyful Soup 1.0.4 released February 1, 2006

http://www.crummy.com/software/RubyfulSoup/

http://code.whytheluckystiff.net/hpricot/

接下来将解释如何用Html的解析工具,把网站上想要的内容刮下来, 请稍候。

可以看看已经讨论过的相关内容先

http://www.railscn.com/viewtopic.php?t=473

http://www.railscn.com/viewtopic.php?t=1038

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails/

这个WWW::Mechanize, a handy web browsing ruby object 也被用作HTML 解析用.
http://rubyforge.org/projects/mechanize/


[img]http://code.whytheluckystiff.net/hpricot/chrome/site/images/hpricot-small.png[/img]

Hpricot处理Html快,解析XML也是相当的快
http://www.rubyinside.com/parse-xml-quickly-and-easily-with-hpricot-166.html

偷上瘾了,因为太简单了, 今天最新的新闻贴:

初步鉴定结果:
技术含量 一个星 代码量 五颗 文章长度 6颗星

THE Unbelievably Easy Way to Steal Other Web Sites: Addictively Amazing!

http://web2withrubyonrails.gauldong.net/2006/11/02/the-unbelievably-easy-way-to-steal-other-web-sites-addictively-amazing/