![]() |
Features Try
Extractor Purchase
Extractor
API History Customers Using Extractor Frequently Asked Questions Credits Supporting Publications Press Release Contact Us |
Extractor API ...
This document presents the API (Application Program Interface) for the Extractor DLL (Dynamically Linked Library). This API is designed to allow Extractor to be easily embedded in experimental or commercial software products.
The following table lists the functions that developers can call in their code. The functions are listed in (approximately) the order in which they would usually be called. The demo package contains some sample code, test_api.c, that illustrates how the API can be used. The API is designed for flexibility; it can be used in many different ways, depending on the intended applications.
This API allows several documents to be processed simultaneously, using separate threads for each document. This is useful, for example, when processing web pages.
Number | API Function | Type | Dependencies |
1 | ExtrCreateDocumentMemory | Required | None |
2 | ExtrCreateStopMemory | Required | None |
3 | ExtrActivateHighlights | Required for Highlights | 1 |
4 | ExtrActivateHTMLFilter | Optional | 1, 2 |
5 | ExtrActivateEmailFilter | Optional | 1, 2 |
6 (New) | ExtrDeactivateTextFilter | Optional | 1 |
7 | ExtrSetInputCode | Required for Japanese and Korean | 1 |
8 | ExtrSetOutputCode | Required for Japanese and Korean | 1 |
9 | ExtrSetDocumentLanguage | Required for Japanese and Korean | 1 |
10 | ExtrSetNumberPhrases | Optional | 1 |
11 | ExtrSetHighlightType | Optional | 1 |
12 | ExtrAddStopWord | Optional | 2 |
13 | ExtrAddStopPhrase | Optional | 2 |
14 | ExtrAddGoPhrase | Optional | 2 |
15 | ExtrReadDocumentBuffer | Required | 1, 2, 3, ..., 14 |
16 | ExtrSignalDocumentEnd | Required | 1, 2, 3, ..., 14, 15 |
17 | ExtrGetPhraseListSize | Required for Keyphrases | 1, 2, 3, ..., 14, 15, 16 |
18 | ExtrGetPhraseByIndex | Required for Keyphrases | 1, 2, 3, ..., 14, 15, 16, 17 |
19 | ExtrGetScoreByIndex | Optional | 1, 2, 3, ..., 14, 15, 16, 17 |
20 | ExtrGetDocumentLanguage | Optional | 1, 2, 3, ..., 14, 15, 16 |
21 | ExtrGetHighlightListSize | Required for Highlights | 1, 2, 3, ..., 14, 15, 16 |
22 | ExtrGetHighlightByIndex | Required for Highlights | 1, 2, 3, ..., 14, 15, 16, 21 |
23 | ExtrGetDocumentProperties | Optional | 1, 2, 3, ..., 14, 15, 16 |
24 | ExtrGetErrorMessage | Optional | None |
25 | ExtrClearDocumentMemory | Required | 1 |
26 | ExtrClearStopMemory | Required | 2 |
API Function: If you click on a function name, you will get a description of the function usage.
Type: The Required functions are functions that must be called for Extractor to work properly. The Optional functions are functions that can be called to override the default settings of Extractor or to get additional information from Extractor. The Required for Japanese functions are required for processing Japanese text, but optional for other languages. The Required for Korean functions are required for processing Korean text, but optional for other languages. The Required for Keyphrases functions are required if you wish to extract key phrases from the text, but optional otherwise. The Required for Highlights functions are required if you wish to extract key sentences from the text, but optional otherwise.
Dependencies: This column of the table shows the order in which functions must be called. For example, function number 12, ExtrAddStopWord, should only be called after function number 2, ExtrCreateStopMemory, has been called. As another example, function number 18, ExtrGetPhraseByIndex, depends on the optional function number 8, ExtrSetOutputCode. This means, although it is not necessary to call function number 8, if you do intend to call function number 8, then you must call it before you call function number 18. For efficiency reasons, Extractor does not verify that the functions are called in the correct order. The programmer is responsible for ensuring that the dependencies are observed in the program that invokes Extractor.
Function header declaration:
int ExtrCreateDocumentMemory(void **DocumentMemory);
Input and output function arguments:
DocumentMemory: output
Example of usage:
void *DocumentMemory; int ErrorCode; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory);
Description:
This function creates a block of memory for storing data about a single document. It returns a pointer value that is a unique identifier for this block of memory. This pointer is later passed to any other functions that process the given document.
A document is processed as a sequence of memory blocks, by calling ExtrReadDocumentBuffer. A typical document will involve multiple calls to ExtrReadDocumentBuffer. Each call updates the state of the memory that is reserved for processing the given document, DocumentMemory.
In a typical application with multiple threads, there will be a one-to-one relationship between threads and DocumentMemory values, and also between DocumentMemory values and individual documents. On the other hand, threads may share StopMemory values, depending on whether it makes sense to use the same stop words and stop phrases for all of the documents that are currently being processed.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrCreateStopMemory(void **StopMemory);
Input and output function arguments:
StopMemory: output
Example of usage:
void *StopMemory; int ErrorCode; ErrorCode = ExtrCreateStopMemory(&StopMemory);
Description:
This function creates a block of memory for storing stop words and stop phrases. It returns a pointer value in StopMemory that is a unique identifier for this block of memory. This pointer is later passed to any other functions that use the stop words or stop phrases.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
A stop word is a word that is not allowed in a keyphrase. For example, "the" is a stop word. A stop phrase is a phrase that is not allowed as a keyphrase. The distinction between a stop word and a single-word stop phrase is that a keyphrase will be rejected if it contains a given stop word, but it will only be rejected if it exactly matches a given stop phrase. For example, if "access" is a stop word, then the phrase "information access" will be rejected. If "access" is a stop phrase, then the phrase "information access" is acceptable, although the single-word phrase "access" will be rejected.
Calling ExtrCreateStopMemory will initialize the stop word list with some standard stop words (including "the", for example). The standard list may be extended by calling ExtrAddStopWord or ExtrAddStopPhrase.
Function header declaration:
int ExtrActivateHighlights(void *DocumentMemory);
Input and output function arguments:
DocumentMemory: input
Example of usage:
void *DocumentMemory; int ErrorCode; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrActivateHighlights(DocumentMemory);
Description:
A highlight is a key sentence. This function activates the highlight extraction feature for DocumentMemory. By default, it is assumed that the user does not want highlight extraction. ExtrActivateHighlights should be called before any calls to ExtrReadDocumentBuffer, since it will affect how the document is read. The main result of calling ExtrActivateHighlights is that the functions ExtrGetHighlightListSize and ExtrGetHighlightByIndex will return some highlights selected by Extractor.
Extractor attempts to find one key sentence for each keyphrase that it finds. For a given keyphrase, it is possible that Extractor may not be able to find a good example of a sentence that contains the keyphrase. The function ExtrGetHighlightListSize will return the number of highlights that were generated. This number is always less than or equal to the number of keyphrases that were generated, as given by ExtrGetPhraseListSize.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrActivateHTMLFilter(void *DocumentMemory, void *StopMemory);
Input and output function arguments:
DocumentMemory: input StopMemory: input
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrActivateHTMLFilter(DocumentMemory, StopMemory);
Description:
This function signals that the document DocumentMemory contains HTML tags. By default, it is assumed that the document does not contain HTML tags. ExtrActivateHTMLFilter should be called before any calls to ExtrReadDocumentBuffer, since it will affect how the document is read. The main result of calling ExtrActivateHTMLFilter is that HTML tags will be parsed. Most tags are ignored, but some tags are used to identify sentence boundaries.
The HTML filter will also convert special symbol codes to the symbols that they represent. For example, "é" will be converted to "é".
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrActivateEmailFilter(void *DocumentMemory, void *StopMemory);
Input and output function arguments:
DocumentMemory: input StopMemory: input
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrActivateEmailFilter(DocumentMemory, StopMemory);
Description:
This function signals that the document DocumentMemory contains an e-mail header. By default, it is assumed that the document does not contain an e-mail header. ExtrActivateEmailFilter should be called before any calls to ExtrReadDocumentBuffer, since it will affect how the document is read. The main result of calling ExtrActivateEmailFilter is that the e-mail header will be ignored, except for the "Subject" field.
Many e-mail gateways cannot handle 8 bit character codes. Often 8 bit character codes will be converted to 7 bit codes, for safe mailing. The e-mail filter will convert MIME quoted-printable 7 bit character codes back to 8 bit codes.
The e-mail filter understands MIME types. E-mail attachments will be treated according to their MIME types. Keyphrases will be extracted from plain text and HTML attachments. Other types of attachments will be ignored. The HTML filter will be automatically activated if the MIME type indicates that the attachment is HTML. Therefore ExtrActivateHTMLFilter should not be called by the user when processing e-mail.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Note: Activating the e-mail filter with Japanese or Korean text will have no effect. It is not yet supported for Japanese or Korean.
Function header declaration:
int ExtrDeactivateTextFilter(void *DocumentMemory);
Input and output function arguments:
DocumentMemory: input
Example of usage:
void *DocumentMemory; int ErrorCode; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrDeactivateTextFilter(DocumentMemory);
Description:
This function deactivates the plain text filter for DocumentMemory. By default, when the following conditions are met, the input document is assumed to be plain text:
When these conditions are met, the plain text filter is activated. The plain text filter will attempt to remove non-textual items from the input document, such as tables and addresses. It will also attempt to use white space to determine the boundaries between titles, section headings, and regular paragraphs. If you do not want the plain text filter to process the input document in these ways, then call ExtrDeactivateTextFilter. Since calling ExtrDeactivateTextFilter will affect how the document is read, it should be called before any calls to ExtrReadDocumentBuffer.
If the input document contains tabs, the text filter may interpret the lines with tabs as table rows. These lines may be skipped. If you suspect that the text filter is skipping lines that should be processed, then try calling ExtrDeactivateTextFilter.
Internally, Extractor uses the characters 1D (hex) to mark a phrase boundary and 1E (hex) to mark a sentence boundary. The text filter automatically inserts these characters in a plain text document, by analyzing the white space in the document (i.e., line feeds, blanks, tabs, and carriage returns). For example, if two lines are separated by several line feeds (significant vertical white space), then the text filter will remove the white space and insert a sentence boundary marker. This automatic process works well for most plain text documents, but you may wish to write your own filter for a certain type of input document (e.g., a certain type of word processor file). You can run the document through your own filter program, and then send the resulting plain text to Extractor. In this case, you should call ExtrDeactivateTextFilter, but do not call ExtrActivateHTMLFilter or ExtrActivateEmailFilter. Your filter program can help Extractor by inserting markers for phrase boundaries (1D) and sentence boundaries (1E) in the appropriate places.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrSetInputCode(void *DocumentMemory, int CharCodeID);
Input and output function arguments:
DocumentMemory: input CharCodeID: input
Example of usage:
void *DocumentMemory; int ErrorCode; int CharCodeID; CharCodeID = 1; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrSetInputCode(DocumentMemory, CharCodeID);
Description:
A call to ExtrSetInputCode sets the document character code that Extractor uses to process the input text buffer. The character code is given by CharCodeID. ExtrCreateDocumentMemory must be called before ExtrSetInputCode.
Character Code | Compatible languages | Description | |
ISO-8859-1 | English, French, German, Spanish | ISO-8859-1 is also known as ISO Latin-1. | |
MS-DOS | English, French, German, Spanish | MS-DOS is also known as MS-DOS Code Page 437. | |
Unicode UCS2 | All | Unicode UCS2 double-byte characters, in native byte order. | |
Shift-JIS | Japanese only | SJIS, MS-Kanji, Code Page 932. | |
JIS | Japanese only | New, Old, NEC, ISO-2022-JP. | |
EUC-JP | Japanese only | Extended UNIX Code, Packed Format for Japanese. | |
EUC-KR | Korean only | KS C 5601-1987, KSC5601, Extended UNIX Code, Packed Format for Korean, Code Page 949. | |
Johap | Korean only | Johab, KS X 1001:1992 alternate encoding. |
The supported Japanese character sets for all the Japanese encodings are:
The supported Korean character sets for all the Korean encodings are:
ISO-8859-1 and MS-DOS Code Page 437 agree on the coding of non-accented alphabetical characters. If there are no accents in the input text, and the text is in single-byte characters, then the choice between the two should not matter.
Unicode UCS2 uses double-byte characters. UCS2 is sensitive to the byte ordering of the hardware platform (big endian versus little endian). Extractor handles UCS2 characters using the byte ordering of the hardware for which it is compiled (native byte ordering).
This function is optional for English, French, German, and Spanish, but required for Japanese and Korean. The default value of CharCodeID is zero.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrSetOutputCode(void *DocumentMemory, int CharCodeID);
Input and output function arguments:
DocumentMemory: input CharCodeID: input
Example of usage:
void *DocumentMemory; int ErrorCode; int CharCodeID; CharCodeID = 1; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrSetOutputCode(DocumentMemory, CharCodeID);
Description:
A call to ExtrSetOutputCode sets the document character code that Extractor uses for the output list of keyphrases. The character code is given by CharCodeID. ExtrCreateDocumentMemory must be called before ExtrSetOutputCode.
Character Code | Compatible languages | Description | |
ISO-8859-1 | English, French, German, Spanish | ISO-8859-1 is also known as ISO Latin-1. | |
MS-DOS | English, French, German, Spanish | MS-DOS is also known as MS-DOS Code Page 437. | |
Unicode UCS2 | All | Unicode UCS2 double-byte characters, in native byte order. | |
Shift-JIS | Japanese only | SJIS, MS-Kanji, Code Page 932. | |
JIS | Japanese only | New, Old, NEC, ISO-2022-JP. | |
EUC-JP | Japanese only | Extended UNIX Code, Packed Format for Japanese. | |
EUC-KR | Korean only | KS C 5601-1987, KSC5601, Extended UNIX Code, Packed Format for Korean, Code Page 949. | |
Johap | Korean only | Johab, KS X 1001:1992 alternate encoding. |
The supported Japanese character sets for all the Japanese encodings are:
The supported Korean character sets for all the Korean encodings are:
ISO-8859-1 and MS-DOS Code Page 437 agree on the coding of non-accented alphabetical characters. If there are no accents in the input text, and the text is in single-byte characters, then the choice between the two should not matter.
Unicode UCS2 uses double-byte characters. UCS2 is sensitive to the byte ordering of the hardware platform (big endian versus little endian). Extractor handles UCS2 characters using the byte ordering of the hardware for which it is compiled (native byte ordering).
This function is optional for English, French, German, and Spanish, but required for Japanese and Korean. The default value of CharCodeID is zero.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrSetDocumentLanguage(void *DocumentMemory, int LanguageID);
Input and output function arguments:
DocumentMemory: input LanguageID: input
Example of usage:
void *DocumentMemory; int ErrorCode; int LanguageID; LanguageID = 1; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrSetDocumentLanguage(DocumentMemory, LanguageID);
Description:
A call to ExtrSetDocumentLanguage sets the language that Extractor uses to process the input text buffer. The language is given by LanguageID. ExtrCreateDocumentMemory must be called before ExtrSetDocumentLanguage.
Language | Description | |
Automatic | Let Extractor automatically detect the language (for English, French, German, Spanish). | |
English | Force Extractor to interpret the document as English. | |
French | Force Extractor to interpret the document as French. | |
Japanese | Force Extractor to interpret the document as Japanese. | |
German | Force Extractor to interpret the document as German. | |
Spanish | Force Extractor to interpret the document as Spanish. | |
Korean | Force Extractor to interpret the document as Korean. |
This function is optional for English, French, German, and Spanish, but required for Japanese and Korean text. The default value of LanguageID is zero.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrSetNumberPhrases(void *DocumentMemory, double DesiredNumber);
Input and output function arguments:
DocumentMemory: input DesiredNumber: input
Example of usage:
void *DocumentMemory; int ErrorCode; double DesiredNumber; DesiredNumber = 9; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrSetNumberPhrases(DocumentMemory, DesiredNumber);
Description:
This function sets the desired number of output phrases. The default number is seven. This is the number that will be generated on average; the actual number of phrases that are output for a given document may be slightly less or slightly more than the number specified by DesiredNumber. Note that DesiredNumber is only set for the given document DocumentMemory. This is so that several documents may be processed simultaneously, each with a different desired number of keyphrases.
The DesiredNumber must be between 3 and 30. Values outside of this range will be converted to the closest value inside the range. No error message will be generated when values are out of range.
This function is optional. There is no need to call it unless you wish to override the default value of seven phrases.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrSetHighlightType(void *DocumentMemory, int HighlightType);
Input and output function arguments:
DocumentMemory: input HighlightType: input
Example of usage:
void *DocumentMemory; int ErrorCode; int HighlightType; HighlightType = 1 + 2 + 8; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrActivateHighlights(DocumentMemory); ErrorCode = ExtrSetHighlightType(DocumentMemory, HighlightType);
Description:
A highlight is a key sentence. If ExtrActivateHighlights has been called, then Extractor attempts to find one key sentence for each keyphrase that it finds. The ExtrSetHighlightType function sets the type (i.e., style) of highlight that is generated. The following types of highlights are supported:
as integer |
as bit string |
Description of Type of Highlight |
These types can be added; for example, type 5 is the combination of types 1 and 4 (duplicates removed, full sentences).
This function is optional. There is no need to call it unless you wish to override the default value of zero.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrAddStopWord(void *StopMemory, int LanguageID, int CharCodeID, void *Word);
Input and output function arguments:
StopMemory: input LanguageID: input CharCodeID: input Word: input
Example of usage:
void *StopMemory; int ErrorCode; int LanguageID = 1; int CharCodeID = 1; char *Word = "the"; ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrAddStopWord(StopMemory, LanguageID, CharCodeID, (void *) Word);
Description:
This function adds the string Word to the list of stop words stored in the memory at StopMemory. The stop words are stored in a hash table. It does no harm to try to store the same word twice. It is assumed that Word is in lower case and that Word is a single word (containing no white space).
Stop words are stored separately for each language. The language is given by LanguageID. ExtrAddStopWord will return a non-zero error code if LanguageID is invalid or if Word contains anything other than lower case characters.
Language | Description | |
English | Add the given stop word to the English stop words. | |
French | Add the given stop word to the French stop words. | |
German | Add the given stop word to the German stop words. | |
Spanish | Add the given stop word to the Spanish stop words. | |
Korean | Add the given stop word to the Korean stop words. |
The character code is given by CharCodeID. Word is of type void * so that either single-byte or double-byte character strings can be passed to this function.
Character Code | Language | Description | |
ISO-8859-1 | English, French, German, Spanish | ISO-8859-1 is also known as ISO Latin-1. | |
MS-DOS | English, French, German, Spanish | MS-DOS is also known as MS-DOS Code Page 437. | |
Unicode UCS2 | All | Unicode UCS2 double-byte characters, in native byte order. | |
EUC-KR | Korean only | KS C 5601-1987, KSC5601, Extended UNIX Code, Packed Format for Korean, Code Page 949. | |
Johap | Korean only | Johab, KS X 1001:1992 alternate encoding. |
ExtrAddStopWord should be called before any calls to ExtrReadDocumentBuffer, since it will affect how the document is read.
When the stop word list is first created, by ExtrCreateStopMemory, it is initialized with a list of common stop words. It may not be necessary to add any extra stop words. That is, it may not be necessary to call ExtrAddStopWord.
A stop word is a word that is not allowed in a keyphrase. For example, "the" is a stop word. A stop phrase is a phrase that is not allowed as a keyphrase. The distinction between a stop word and a single-word stop phrase is that a keyphrase will be rejected if it contains a given stop word, but it will only be rejected if it exactly matches a given stop phrase. For example, if "access" is a stop word, then the phrase "information access" will be rejected. If "access" is a stop phrase, then the phrase "information access" is acceptable, although the single-word phrase "access" will be rejected.
Note: At this time, you cannot add new stop words for Japanese text. However, you can add new Japanese stop phrases.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrAddStopPhrase(void *StopMemory, int LanguageID, int CharCodeID, void *Phrase);
Input and output function arguments:
StopMemory: input LanguageID: input CharCodeID: input Phrase: input
Example of usage:
void *StopMemory; int ErrorCode; char *Phrase = "access"; int LanguageID = 1; int CharCodeID = 1; ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrAddStopPhrase(StopMemory, LanguageID, CharCodeID, (void *) Phrase);
Description:
This function adds the string Phrase to the list of stop phrases stored in the memory at StopMemory. The stop phrases are stored in a hash table. It does no harm to try to store the same phrase twice. It is assumed that Phrase is in lower case. Phrase may be one, two, or three words, separated by a single space.
Stop phrases are stored separately for each language. The language is given by LanguageID. ExtrAddStopPhrase will return a non-zero error code if LanguageID is invalid or if Phrase contains anything other than lower case characters and spaces.
Language | Description | |
English | Add the given stop phrase to the English stop phrases. | |
French | Add the given stop phrase to the French stop phrases. | |
Japanese | Add the given stop phrase to the Japanese stop phrases. | |
German | Add the given stop phrase to the German stop phrases. | |
Spanish | Add the given stop phrase to the Spanish stop phrases. | |
Korean | Add the given stop phrase to the Korean stop phrases. |
The character code is given by CharCodeID. Phrase is of type void * so that either single-byte or double-byte character strings can be passed to this function.
Character Code | Language | Description | |
ISO-8859-1 | English, French, German, Spanish | ISO-8859-1 is also known as ISO Latin-1. | |
MS-DOS | English, French, German, Spanish | MS-DOS is also known as MS-DOS Code Page 437. | |
Unicode UCS2 | All | Unicode UCS2 double-byte characters, in native byte order. | |
Shift-JIS | Japanese only | SJIS, MS-Kanji, Code Page 932. | |
JIS | Japanese only | New, Old, NEC, ISO-2022-JP. | |
EUC-JP | Japanese only | Extended UNIX Code, Packed Format for Japanese. | |
EUC-KR | Korean only | KS C 5601-1987, KSC5601, Extended UNIX Code, Packed Format for Korean, Code Page 949. | |
Johap | Korean only | Johab, KS X 1001:1992 alternate encoding. |
The supported Japanese character sets for all the Japanese encodings are:
The supported Korean character sets for all the Korean encodings are:
When the stop phrase list is first created, by ExtrCreateStopMemory, it is initialized with a list of common stop phrases. It may not be necessary to add any extra stop phrases. That is, it may not be necessary to call ExtrAddStopPhrase.
A stop word is a word that is not allowed in a keyphrase. For example, "the" is a stop word. A stop phrase is a phrase that is not allowed as a keyphrase. The distinction between a stop word and a single-word stop phrase is that a keyphrase will be rejected if it contains a given stop word, but it will only be rejected if it exactly matches a given stop phrase. For example, if "access" is a stop word, then the phrase "information access" will be rejected. If "access" is a stop phrase, then the phrase "information access" is acceptable, although the single-word phrase "access" will be rejected.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrAddGoPhrase(void *StopMemory, int LanguageID, int CharCodeID, void *Phrase, int MatchType);
Input and output function arguments:
StopMemory: input LanguageID: input CharCodeID: input Phrase: input MatchType: input
Example of usage:
void *StopMemory; int ErrorCode; char *Phrase = "National Research Council"; int LanguageID = 1; int CharCodeID = 1; int MatchType = 3; ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrAddGoPhrase(StopMemory, LanguageID, CharCodeID, (void *) Phrase, MatchType);
Description:
If the input document was found by issuing a query to a search engine, the user may have a special interest in whether the query terms appear in the document, and the context in which the query terms appear. This can be achieved by calling the function ExtrAddGoPhrase with each of the terms in the query.
This function adds the string Phrase to the list of go phrases stored in the memory at StopMemory. A go phrase is a phrase that will be treated as if it were a key phrase, if it appears in the input document. Go phrases are stored in a list and each sentence in the input document is scanned for each go phrase in the list. This has two important implications: (1) A large list of go phrases may slow the execution of Extractor. (2) A go phrase in the input document will not be detected if it spans a sentence boundary.
A go phrase may consist of one or more words or fragments of words. Any character sequence is permitted, except for an empty string. The letters may be in upper or lower case. A go phrase may range from a single character to a full sentence. A go phrase may contain punctuation.
Go phrases are stored separately for each language. The language is given by LanguageID. ExtrAddGoPhrase will return a non-zero error code if LanguageID is invalid or if CharCodeID is not compatible with LanguageID.
The following types of matches are supported:
as integer |
as bit string |
Description of MatchType |
These types can be added; for example, type 5 is the combination of types 1 and 4 (exact case and exact width). The strictest matching is type 15 (1 + 2 + 4 + 8). The most liberal matching is type 112 (16 + 32 + 64). Type 0 is relatively liberal, but avoids some of the more computationally intensive matching operations. It strikes a balance between liberalness and efficiency. If a given type of matching does not make sense with the given character set (e.g., exact width does not make sense with ISO-8859-1), then it will be ignored. (It won't cause any harm.)
When go phrases are found in the input document, they will be inserted at the top of the keyphrase list. They will take priority over the regular keyphrases. The length of the keyphrase list will be kept at the value set by ExtrSetNumberPhrases. For each go phrase that is added to the top of the keyphrase list, a regular keyphrase will be deleted from the bottom of the keyphrase list. (Note that Extractor ranks the keyphrases in order of decreasing estimated importance.) A go phrase can be distinguished from a regular keyphrase (a keyphrase generated automatically by Extractor) by its score. All go phrases are given a score of zero, but a regular keyphrase never has a score of zero.
When a go phrase is found, it is inserted into the keyphrase list in exactly the same form as it was given to ExtrAddGoPhrase. This may be different from the form it has in the input document, depending on MatchType.
If highlights have been activated (by ExtrActivateHighlights), then each go phrase that is found in the input document will have a corresponding highlight. Extractor attempts to find a good sentence to illustrate each go phrase. If bold markup is set (by ExtrSetHighlightType, then the go phrases will be marked in bold within the corresponding highlights. Neighbouring words and characters may also be marked in bold, if they appear to be closely connected to the go phrase.
A go phrase might appear in the document, and yet not be found by Extractor. If the go phrase spans a sentence boundary, it will not be detected. For example, "home cooking" will not be found in the text "Pasta is popular in our home. Cooking pasta is easy." Also, if the input document is very long, Extractor may not read the full document, since it should be possible to make a good summary without reading the full text. Therefore, if the go phrase only appears at the end of a very long document, it might not be detected by Extractor. Finally, the number of go phrases that will be found is limited by the desired number of keyphrases, set by ExtrSetNumberPhrases. If the number of go phrases in the input document is greater than the desired number of keyphrases, then the go phrases that appear earlier in the text will be given priority.
The following languages are supported:
Language | Description | |
English | Add the given go phrase to the English go phrases. | |
French | Add the given go phrase to the French go phrases. | |
Japanese | Add the given go phrase to the Japanese go phrases. | |
German | Add the given go phrase to the German go phrases. | |
Spanish | Add the given go phrase to the Spanish go phrases. | |
Korean | Add the given go phrase to the Korean go phrases. |
The character code is given by CharCodeID. Phrase is of type void * so that either single-byte or double-byte character strings can be passed to this function.
Character Code | Language | Description | |
ISO-8859-1 | English, French, German, Spanish | ISO-8859-1 is also known as ISO Latin-1. | |
MS-DOS | English, French, German, Spanish | MS-DOS is also known as MS-DOS Code Page 437. | |
Unicode UCS2 | All | Unicode UCS2 double-byte characters, in native byte order. | |
Shift-JIS | Japanese only | SJIS, MS-Kanji, Code Page 932. | |
JIS | Japanese only | New, Old, NEC, ISO-2022-JP. | |
EUC-JP | Japanese only | Extended UNIX Code, Packed Format for Japanese. | |
EUC-KR | Korean only | KS C 5601-1987, KSC5601, Extended UNIX Code, Packed Format for Korean, Code Page 949. | |
Johap | Korean only | Johab, KS X 1001:1992 alternate encoding. |
The supported Japanese character sets for all the Japanese encodings are:
The supported Korean character sets for all the Korean encodings are:
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrReadDocumentBuffer(void *DocumentMemory, void *StopMemory, void *DocumentBuffer, int BufferLength);
Input and output function arguments:
DocumentMemory: input StopMemory: input DocumentBuffer: input BufferLength: input
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength);
Description:
This function reads the text in the buffer DocumentBuffer and updates the memory at DocumentMemory. The processing of the buffer is affected by StopMemory.
In a typical application, there will be a series of calls to ExtrReadDocumentBuffer for a given document DocumentMemory. The idea is that the document is read in chunks. A call to ExtrSignalDocumentEnd signals that the last chunk has been sent (the end of the given document has been reached).
A call to ExtrReadDocumentBuffer will change the memory at DocumentMemory, but the memory at StopMemory will not be modified. If there are multiple threads, each thread will have a unique value for DocumentMemory, but several threads may share StopMemory.
The buffer DocumentBuffer may contain single-byte or double-byte characters (see ExtrSetInputCode). This is why it is of type void *. The buffer length BufferLength specifies the number of bytes in the buffer, not the number of characters. When the character code (set by ExtrSetInputCode) indicates double-byte characters, BufferLength must be an even number. That is, the end of the buffer is not allowed to divide a double-byte character into two parts.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrSignalDocumentEnd(void *DocumentMemory, void *StopMemory);
Input and output function arguments:
DocumentMemory: input StopMemory: input
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); strcpy(DocumentBuffer, "Here is some more text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory);
Description:
A call to ExtrSignalDocumentEnd signals that the end of the document has been reached; there will be no further calls to ExtrReadDocumentBuffer with this particular DocumentMemory. This signal triggers the generation of the final list of keyphrases.
The phrases in the final list of keyphrases are compared with the list of stop phrases in StopMemory and any matching phrases are deleted from the final list of keyphrases. Case is ignored for matching, but otherwise an exact match is required.
ExtrSignalDocumentEnd should only be called once for a given document DocumentMemory. After ExtrSignalDocumentEnd has been called for a given document, that document has no further need for the stop words and stop phrases stored in StopMemory. Unless there are other documents that will need StopMemory, the memory used by StopMemory may be released after ExtrSignalDocumentEnd has been called.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrGetPhraseListSize(void *DocumentMemory, int *PhraseListSize);
Input and output function arguments:
DocumentMemory: input PhraseListSize: output
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; int PhraseListSize; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory); ErrorCode = ExtrGetPhraseListSize(DocumentMemory, &PhraseListSize);
Description:
The function ExtrGetPhraseListSize returns an integer value that is the number of keyphrases that were generated. If there is an error, PhraseListSize will be set to zero.
ExtrGetPhraseListSize may be called repeatedly for a given document. It does not modify the memory at DocumentMemory. ExtrGetPhraseListSize should not be called until after ExtrSignalDocumentEnd has been called.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrGetPhraseByIndex(void *DocumentMemory, int PhraseIndex, void **Phrase);
Input and output function arguments:
DocumentMemory: input PhraseIndex: input Phrase: output
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; int PhraseIndex; char *Phrase; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory); PhraseIndex = 3; ErrorCode = ExtrGetPhraseByIndex(DocumentMemory, PhraseIndex, (void **) &Phrase);
Description:
A call to ExtrGetPhraseByIndex returns a pointer to a string. The string is phrase number PhraseIndex. PhraseIndex ranges from zero to PhraseListSize minus one. Phrases are approximately in order of decreasing quality. ExtrSignalDocumentEnd must be called before ExtrGetPhraseByIndex.
The string Phrase may contain single-byte or double-byte characters (see ExtrSetOutputCode). This is why it is of type void **.
The memory where Phrase is stored will be cleared when ExtrClearDocumentMemory is called. The application should copy Phrase into a more permanent location.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrGetScoreByIndex(void *DocumentMemory, int PhraseIndex, double *Score);
Input and output function arguments:
DocumentMemory: input PhraseIndex: input Score: output
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; int PhraseIndex; double Score; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory); PhraseIndex = 3; ErrorCode = ExtrGetScoreByIndex(DocumentMemory, PhraseIndex, &Score);
Description:
A call to ExtrGetScoreByIndex copies a number into the location given by the pointer. The number is the score assigned to phrase number PhraseIndex. PhraseIndex ranges from zero to PhraseListSize minus one. The score of a phrase is an estimate of its value as a keyphrase. Keyphrases are ranked in order of descending score. ExtrSignalDocumentEnd must be called before ExtrGetScoreByIndex.
This function is optional. There is no need to call it unless you are curious about the score that is assigned to a phrase.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrGetDocumentLanguage(void *DocumentMemory, int *LanguageID);
Input and output function arguments:
DocumentMemory: input LanguageID: output
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; int LanguageID; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory); ErrorCode = ExtrGetDocumentLanguage(DocumentMemory, &LanguageID);
Description:
A call to ExtrGetDocumentLanguage gets the language of the document. If the language was set by a call to ExtrSetDocumentLanguage, then ExtrGetDocumentLanguage returns the same value that was specified with ExtrSetDocumentLanguage. If Extractor was allowed to guess the language, then ExtrGetDocumentLanguage returns the best guess. LanguageID is passed by reference and is modified in the function.
Language | Description | |
Unknown | Extractor was not able to guess, or the language is neither English, French, German, nor Spanish. | |
English | Extractor guessed English, or English was specified by ExtrSetDocumentLanguage. | |
French | Extractor guessed French, or French was specified by ExtrSetDocumentLanguage. | |
Japanese | Japanese was specified by ExtrSetDocumentLanguage. | |
German | Extractor guessed German, or German was specified by ExtrSetDocumentLanguage. | |
Spanish | Extractor guessed Spanish, or Spanish was specified by ExtrSetDocumentLanguage. | |
Korean | Korean was specified by ExtrSetDocumentLanguage. |
This function is optional. There is no need to call it unless you wish to know which language Extractor guessed (English, French, German, or Spanish). Note that language guessing is currently not available for Japanese or Korean.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrGetHighlightListSize(void *DocumentMemory, int *HighlightListSize);
Input and output function arguments:
DocumentMemory: input HighlightListSize: output
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; int HighlightListSize; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrActivateHighlights(DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory); ErrorCode = ExtrGetHighlightListSize(DocumentMemory, &HighlightListSize);
Description:
The function ExtrGetHighlightListSize returns an integer value that is the number of highlights that were generated. If there is an error, HighlightListSize will be set to zero.
The number of highlights will be less than or equal to the number of keyphrases. There are two reasons that the number of highlights might be less than the number of keyphrases. First, when HighlightType is an odd number, Extractor removes any duplicate highlights. Second, there may be keyphrases for which no acceptable highlights were found. Therefore, for all values of HighlightType, it cannot be assumed that the highlight list size equals the keyphrase list size.
ExtrGetHighlightListSize may be called repeatedly for a given document. It does not modify the memory at DocumentMemory. ExtrGetHighlightListSize should not be called until after ExtrSignalDocumentEnd has been called.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrGetHighlightByIndex(void *DocumentMemory, int HighlightIndex, void **Highlight);
Input and output function arguments:
DocumentMemory: input HighlightIndex: input Highlight: output
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; int HighlightIndex; char *Highlight; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrActivateHighlights(DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory); PhraseIndex = 0; ErrorCode = ExtrGetHighlightByIndex(DocumentMemory, HighlightIndex, (void **) &Highlight);
Description:
A call to ExtrGetHighlightByIndex returns a pointer to a string. The string is highlight number HighlightIndex. HighlightIndex ranges from zero to HighlightListSize minus one. ExtrSignalDocumentEnd must be called before ExtrGetHighlightByIndex.
The string Highlight may contain single-byte or double-byte characters (see ExtrSetOutputCode). This is why it is of type void **.
The memory where Highlight is stored will be cleared when ExtrClearDocumentMemory is called. The application should copy Highlight into a more permanent location.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrGetDocumentProperties(void *DocumentMemory, int PropID, int *PropValue);
Input and output function arguments:
DocumentMemory: input PropID: input PropValue: output
Example of usage:
void *DocumentMemory; void *StopMemory; int ErrorCode; int BufferLength; char DocumentBuffer[300]; int PropID; int PropValue; strcpy(DocumentBuffer, "This is an example of some text."); BufferLength = strlen(DocumentBuffer); ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrReadDocumentBuffer(DocumentMemory, StopMemory, (void *) DocumentBuffer, BufferLength); ErrorCode = ExtrSignalDocumentEnd(DocumentMemory, StopMemory); PropID = 1; ErrorCode = ExtrGetDocumentProperties(DocumentMemory, PropID, &PropValue);
Description:
A call to ExtrGetDocumentProperties gets various properties of the document. The following properties are currently defined:
Description | |
get the number of words that were read | |
get the number of non-stop words (content words) that were read | |
see whether the whole document was read (0 = only the beginning of the document was read; 1 = the whole document was read) |
The desired property is specified by setting PropID. The property value is returned in PropValue.
The values returned for PropID 1 and 2 depend on the language. For example, a word with an apostrophe counts as two words in French (e.g., "j'ai"), but as one word in English (e.g., "don't"). There are no spaces between words in Japanese, so the values returned for PropID 1 and 2 are rough approximations when the document is in Japanese. If ExtrGetDocumentProperties is called before the language has been determined, the values returned for PropID 1 and 2 will be zero.
If the document is exceptionally long, Extractor will only read as much of the document as it needs to generate a summary. In this case, PropID 3 will return a value of 0 and PropID 1 and 2 will return values that are less than the actual values for the whole document.
This function is optional. There is no need to call it unless you wish to know one or more of the above properties. The function may be called multiple times, in order to get multiple properties.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
void ExtrGetErrorMessage(int ErrorCode, char **ErrorMessage);
Input and output function arguments:
ErrorCode: input ErrorMessage: output
Example of usage:
void *StopMemory; int ErrorCode; char *ErrorMessage; ErrorCode = ExtrCreateStopMemory(&StopMemory); if (ErrorCode > 0) { ExtrGetErrorMessage(ErrorCode, &ErrorMessage); printf("Error %d = %s \n", ErrorCode, ErrorMessage); }
Description:
A call to ExtrGetErrorMessage returns a pointer to a character string. The string will contain a short description of the problem, such as, "ERROR: Memory allocation error. Out of RAM."
Function header declaration:
int ExtrClearDocumentMemory(void *DocumentMemory);
Input and output function arguments:
DocumentMemory: input
Example of usage:
void *DocumentMemory; int ErrorCode; ErrorCode = ExtrCreateDocumentMemory(&DocumentMemory); ErrorCode = ExtrClearDocumentMemory(DocumentMemory);
Description:
A call to ExtrClearDocumentMemory will free the memory that was allocated for processing a given document.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.
Function header declaration:
int ExtrClearStopMemory(void *StopMemory);
Input and output function arguments:
StopMemory: input
Example of usage:
void *StopMemory; int ErrorCode; ErrorCode = ExtrCreateStopMemory(&StopMemory); ErrorCode = ExtrClearStopMemory(StopMemory);
Description:
A call to ExtrClearStopMemory will free the memory that was allocated for stop words and stop phrases.
The function returns an error code in ErrorCode. If ErrorCode is zero, there are no problems. Otherwise, a call to ExtrGetErrorMessage will return an explanation for the given error code.