extractKeywords

extractKeywords

PREMIUM TRIAL / ADVANCED

Extract top keywords in a DOCX document.

Description
public extractKeywords ( $source [, array $options = array()] )

Extracts top keywords in a DOCX.

Supported AI integrations:

  • GPT OpenAI
  • phpdocx AI
Parameters GPT OpenAI

source

DOCX document.

options

An array with the available options.

The possible keys and values are:

key Type Description
frequency_penalty float Default as 0.8.
max_tokens int Default as 1000. Each OpenAI model limits max tokens.
model string Default as 'text-davinci-003'.
presence_penalty float Default as 0.0.
prompt string Default as 'Extract keywords from this text:'.
referenceNode array Default all paragraphs. DOCXPath options for custom queries:
  • 'type' (string) paragraph (default)
  • 'contains' (string)
  • 'occurrence' (int)
  • 'attributes' (array)
  • 'parent' (string) '/' (any parent, default), w:body or any other specific parent (/w:tbl/, /w:tc/, /w:r/...)
  • 'customQuery' (string) if set overwrites all previous references. It must be a valid XPath query
returnFullResponse bool If true returns the whole GPT response. Default as false.
target array Extract specific targets:
  • document
  • headers
  • footers
  • footnotes
  • endnotes
  • comments
temperature float Default as 0 (set 0.5 to generate related keywords).
top_p float Default as 1.0.
url string Default as 'https://api.openai.com/v1/completions'.
Parameters phpdocx AI

source

DOCX document.

options

An array with the available options.

The possible keys and values are:

key Type Description
maxKeywords int Maximum number of keywords to return. Default as unlimited.
minLength int Minimum length of keywords. Default as null.
referenceNode array Default all paragraphs. DOCXPath options for custom queries:
  • 'type' (string) paragraph (default)
  • 'contains' (string)
  • 'occurrence' (int)
  • 'attributes' (array)
  • 'parent' (string) '/' (any parent, default), w:body or any other specific parent (/w:tbl/, /w:tc/, /w:r/...)
  • 'customQuery' (string) if set overwrites all previous references. It must be a valid XPath query
regExprCleanWords string Regular expression to clean contents to remove extra symbols. Default as '/[^\p{L}\p{N}\s]/u'.
stopWords array Words to be ignored. Default as empty. https://github.com/stopwords-iso to get stop word lists for many languages.
target array Extract specific targets:
  • document
  • headers
  • footers
  • footnotes
  • endnotes
  • comments
Return values

string or array with the keywords

Exceptions

Not valid DOCX source.

Error connecting to GPT.

GPT error.

Code samples

Example #1

The resulting output looks like:

Example #2

The resulting output looks like:

Release notes
  • phpdocx 14.0:
    • new method.