NLUToken

An individual token — i.e. a word, punctuation symbol, whitespace, etc.

Implements

Fields

has_vector (Boolean)

A boolean value indicating whether a word vector is associated with the object.

text (String)

Verbatim text content.

text_with_ws (String)

Text content, with trailing space character if present.

vector ([Float])

A real-valued meaning representation.

vector_norm (Float)

The L2 norm of the document’s vector representation.

dependency (String)

Syntactic dependency relation.

entity_type (String)

Named entity type

is_alpha (Boolean)

Does the token consist of alphabetic characters?

is_ascii (Boolean)

Does the token consist of ASCII characters?

is_bracket (Boolean)

Is the token a bracket?

is_currency (Boolean)

Is the token a currency symbol?

is_digit (Boolean)

Does the token consist of digits?

is_left_punct (Boolean)

Is the token a left punctuation mark, e.g. "(" ?

is_lower (Boolean)

Is the token in lowercase?

is_oov (Boolean)

Is the token out-of-vocabulary (i.e. does it not have a word vector)?

is_punct (Boolean)

Is the token punctuation?

is_quote (Boolean)

Is the token a quotation mark?

is_right_punct (Boolean)

Is the token a right punctuation mark, e.g. ")" ?

is_sent_start (Boolean)

A boolean value indicating whether the token starts a sentence

is_space (Boolean)

Does the token consist of whitespace characters?

is_stop (Boolean)

Is the token part of a “stop list”?

is_title (Boolean)

Is the token in titlecase?

is_upper (Boolean)

Is the token in uppercase?

lemma (String)

Base form of the token, with no inflectional suffixes.

like_email (Boolean)

Does the token resemble an email address?

like_num (Boolean)

Does the token represent a number? e.g. "10.9", "10", "ten", etc.

like_url (Boolean)

Does the token resemble a URL?

log_probability (Float)

Smoothed log probability estimate of token's word type (context-independent entry in the vocabulary).

normalized (String)

The token's norm, i.e. a normalized form of the token text

part_of_speech (String)

Coarse-grained part-of-speech.

subtree ([NLUToken])

A sequence containing the token and all the token’s syntactic descendants.

tag (String)

Fine-grained part-of-speech.