Overview
Summarizer is a standalone PHP class that allows the quick creation of a summary for a given text or HTML page. This tool will give a summary for the given text by ranking each sentence by its relevance.
Basic Usage
Here is the basic usage:
Text summary
<?php
require_once(dirname(__FILE__) . '/Summarizer.php');
$text = file_get_contents(dirname(__FILE__) . '/test_files/cap1.txt');
$summarizer = new Summarizer();
$summarizer->loadText($text);
$summary = $summarizer->run();
print_r($summary);
?>
Url summary
<?php
require_once(dirname(__FILE__) . '/Summarizer.php');
$url = 'http://edition.cnn.com/2011/LIVING/02/07/russell.simmons.super.rich/index.html?hpt=C2';
$summarizer = new Summarizer();
$summarizer->loadUrl($url);
$summary = $summarizer->run();
print_r($summary);
?>
Complex usage
<?php
require_once(dirname(dirname(__FILE__)) . '/library/Summarizer.php');
// options for the summarizer
$options = array(// minimum sentence length
Summarizer::OPTION_MIN_SENTENCE_LENGTH => 50, // minimum word length
Summarizer::OPTION_MIN_WORD_LENGTH => 4, // treshold
Summarizer::OPTION_TRESHOLD => 0.7, // first best lines
Summarizer::OPTION_FIRST_BEST => 10, // document is in HTML format
Summarizer::OPTION_HTML => true, // split text into sentences
Summarizer::OPTION_SPLIT_SENTENCES => true);
$text = file_get_contents(dirname(__FILE__) . '/test_files/cap1.txt');
try {
$summarizer = new Summarizer($options);
$summarizer->loadText($text);
$summary = $summarizer->run();
}
catch (Exception $ex) {
echo 'Failed to summarize: ' . $ex->getMessage();
exit;
}
// get cleaned text
$cleanedText = $summarizer->getText();
$bestWords = $summarizer->getBestWords(10);
$bestSentences = $summarizer->getBestSentences(10);
$sentences = $summarizer->getSentences();
echo 'Summary: ' . PHP_EOL;
print_r($summary);
echo 'Best words:' . PHP_EOL;
print_r($bestWords);
echo 'Extracted sentences:' . PHP_EOL;
print_r($sentences);
echo 'Best sentences:' . PHP_EOL;
print_r($bestSentences);
?>
Requirements
- PHP 5
Features
- Summary output with configurable treshold - only the lines with a frequency over the treshold will be returned
- Best words extraction - most relevant keyword will be extracted (ordered by their relevance)
- Sentences splitter - the given text is automatically split into sentences
- Common words skip - in order to provide better results, common words are skipped based on a dictionary (only for English language provided)
- Minimal dependencies - all you need is PHP 5 to run it
- Incredibly fast - in most cases, the summary is returned in less than 0.1 seconds
- Low memory usage - with regular articles less than 1MB of memory is used
Demo
Documentation
To view all of the available class methods take a look at the API reference.
Buy it
You can buy it now from binpress.com.
Wordpress Widget
You can now add a widget for the Summarizer tool to your Wordpress blog! It's easy and it's FREE.
Help me!
You have problems with the Summarize tool? Or perhaps you want to know its full potential?
Read this quick guide and see how you can improve your results.
Report a bug
We don't like bugs either, so if you spot one, please let us know and we'll do our best to fix it.
Buy script
If you want to buy this script you can see the Summarizer script page for documentation and pricing.
