Token
An individual token — i.e. a word, punctuation symbol, whitespace, etc.
Implements
Fields
Boolean
)
A boolean value indicating whether a word vector is associated with the object.
String
)
Verbatim text content.
String
)
Text content, with trailing space character if present.
[Float]
)
A real-valued meaning representation.
Float
)
The L2 norm of the document’s vector representation.
[Token]
)
The rightmost token of this token’s syntactic descendants.
[Token]
)
A sequence of the token’s immediate syntactic children.
Int
)
Brown cluster ID.
[Token]
)
A tuple of coordinated tokens, not including the token itself.
String
)
Syntactic dependency relation.
Int
)
The ending character offset of the token within the parent document.
String
)
IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 1 means it is inside an entity, and 0 means no entity tag is set.
String
)
Named entity type
TokenExtension
)
Token
)
The syntactic parent, or "governor", of this token.
Int
)
The index of the token within the parent document.
Boolean
)
Does the token consist of alphabetic characters?
Boolean
)
Does the token consist of ASCII characters?
Boolean
)
Is the token a bracket?
Boolean
)
Is the token a currency symbol?
Boolean
)
Does the token consist of digits?
Boolean
)
Is the token a left punctuation mark, e.g. "(" ?
Boolean
)
Is the token in lowercase?
Boolean
)
Is the token out-of-vocabulary (i.e. does it not have a word vector)?
Boolean
)
Is the token punctuation?
Boolean
)
Is the token a quotation mark?
Boolean
)
Is the token a right punctuation mark, e.g. ")" ?
Boolean
)
A boolean value indicating whether the token starts a sentence
Boolean
)
Does the token consist of whitespace characters?
Boolean
)
Is the token part of a “stop list”?
Boolean
)
Is the token in titlecase?
Boolean
)
Is the token in uppercase?
String
)
Language of the parent document’s vocabulary.
Token
)
The leftmost token of this token’s syntactic descendants.
[Token]
)
The leftward immediate children of the word in the syntactic dependency parse.
String
)
Base form of the token, with no inflectional suffixes.
Boolean
)
Does the token resemble an email address?
Boolean
)
Does the token represent a number? e.g. "10.9", "10", "ten", etc.
Boolean
)
Does the token resemble a URL?
String
)
Lowercase form of the token
String
)
The token's norm, i.e. a normalized form of the token text
String
)
Verbatim text content (identical to Token.text). Exists mostly for consistency with the other attributes.
String
)
Coarse-grained part-of-speech.
String
)
Hash value of a length-N substring from the start of the token
Float
)
Smoothed log probability estimate of token's word type (context-independent entry in the vocabulary).
Token
)
The rightmost token of this token’s syntactic descendants.
[Token]
)
The rightward immediate children of the word in the syntactic dependency parse.
String
)
Transform of the tokens’s string to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd"
Int
)
The starting character offset of the token within the parent document.
[Token]
)
A sequence containing the token and all the token’s syntactic descendants.
String
)
Hash value of a length-N substring from the end of the token
String
)
Fine-grained part-of-speech.
String
)
Trailing space character if present