Saturday 25 August 2012



Text Extraction / Web Page Cleaning

CrawlingIndia provides easy-to-use mechanisms to extract page text and title information from any web page.

A HTML page cleaning facility is provided, which normalizes / cleans HTML content (removing ads, navigation links, and other unimportant content), enabling extraction of only the important article text.





Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home