![]() |
Features Try
Extractor Purchase
Extractor
API History Customers Using Extractor Frequently Asked Questions Credits Supporting Publications Press Release Contact Us |
Details ...
Create relevancy from disparate sources of information. Integrate the Extractor Engine into your software applications enabling the summarization of documents into lists of keywords and key phrases with contextual links back to the originating document(s).
What
is text summarization?
By definition text summarization is: To comprise in, or reduce to, a
summary; to present briefly; quickly executed.
In terms of computer automated text summarization there are
many definitions
and implementations including Bayesian, Heurstics and
linguistics. Extractor
uses a Genetic approach which in itself
provides a learning process. This is
important for the summarization
utility to move from one domain to another,
versus other approaches which
are traditionally domain specific and thereby
require greater human
intervention to adjust from one domain to another.
For a detailed discussion please see "Learning
Algorithms for Keyphrase Extraction"
The Extractor API's have been design for
maximum flexibility allowing a wide variety of applications to take advantage of
this unparalleled technology... supported development languages include:
¤ C (C, C++, VC++)
¤ Java
¤ Visual Basic
¤ Python
¤ Perl
There are 26 primary API function calls that
provide the development team full control of the Extractor DLL and presentation
of the extracted results. For more information on the Extractor dll
application programming interface (API) please click here.
Out of the box, Extractor supports Windows,
Solaris and Linux computing platforms. Other computing platforms such as
HP/UX, AIX or the Mac O/S can be custom compiled. Upon confirmation of
computing platform and engagement of the custom compilation, the process can
take from one to two weeks for final testing and release.
Multiple Threads with the Extractor API ... The API for Extractor allows several documents to be processed simultaneously, using separate threads for each document. This is useful, for example, when processing web pages. A major bottle-neck when downloading web pages is waiting for web servers to respond to requests for pages. One way around this bottle-neck is to download several pages simultaneously, using a separate thread to process each page.
Extractor is fully reentrant, to allow multithreading without the use of Win32 services such as semaphores and the EnterCriticalSection and LeaveCriticalSection functions. There should be a one-to-one relationship between threads and DocumentMemory values, so only one thread reads or writes to a given DocumentMemory. On the other hand, there may be a many-to-one relationship between threads and StopMemory values. That is, many threads may simultaneously read one StopMemory.
Most functions that take StopMemory as an argument only read StopMemory; they do not write. This is why many threads can safely access the same StopMemory. However, the functions ExtrAddStopWord and ExtrAddStopPhrase write StopMemory. These two functions should be called (one after the other; not at the same time) before any other threads access StopMemory. If one thread calls ExtrAddStopWord or ExtrAddStopPhrase with a given value of StopMemory while a second thread calls any function with the same value of StopMemory, the memory may become corrupted.
Applications
of Text Summarization concepts: Text summarization is used in many
applications. Most notably text summarization is used for...
¤ Content review - defining document suitability.
¤ Pre-Sort document summaries for Cataloging.
¤ Creating document Indexes.
¤ Providing interactive query refinement.
¤ Defining document trends - performing document trend analysis.
¤ Assisting in web page content analysis. Determining web
page content accuracy
¤ Enhancing Document Management systems.
Version History: The Extractor technology started as a machine learning and artificial intelligence research project at the National Research Council of Canada in the mid 1990's. In January of 1997 the initial result of that R&D effort was the release of the first version of Extractor. To this day research and development is ongoing through the exceptional efforts of Dr. Peter Turney at the Interactive Information Technology Group at the National Research Council of Canada and DBI Technologies Inc. For full product version history please click here.
Supporting Publications: The following publications provide technical discussion of the of the concepts employed in the Extractor technology.
Adaptation of a Keyphrase Extractor for Japanese Text
Extraction of Keyphrases from Text: Evaluation of four Algorithms
Learning to Extract Keyphrases from Text
Answering Subcognitive Turing Test Questions: A Reply to French
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL