You need to convert HTML to readable, formatted plain text.
Use the html2text class. Example 13-11 shows it in action.
Example 13-11. Converting HTML to plain text
1 2 3 4 5 |
require_once 'class.html2text.inc'; /* Give file_get_contents() the path or URL of the HTML you want to process */ $html = file_get_contents(__DIR__ . '/article.html'); $converter = new html2text($html); $plain_text = $converter->get_text(); |
The html2text class has a large number of formatting rules built in so your generated plain text has some visual layout for headings, paragraphs, and so on.
It also includes a list of all the links in the HTML at the bottom of the text it generates.