A class for tree normalization. The default one does no normalization. Other tree normalizers will change various node labels, or perhaps the whole tree geometry (by doing such things as deleting functional tags or empty elements). Another operation that a
TreeNormalizer may wish to perform is interning the
Strings passed to it. Can be reused as a Singleton. Designed to be extended.
The
TreeNormalizer methods are in two groups. The contract for this class is that first normalizeTerminal or normalizeNonterminal will be called on each
String that will be put into a
Tree, when they are read from files or otherwise created. Then
normalizeWholeTree will be called on the
Tree. It normally walks the
Tree making whatever modifications it wishes to. A
TreeNormalizer need not make a deep copy of a
Tree. It is assumed to be able to work destructively, because afterwards we will only use the normalized
Tree.
Implementation note: This is a very old legacy class used in conjunction with PennTreeReader. It seems now that it would be better to move the String normalization into the tokenizer, and then we are just left with a (possibly destructive) TreeTransformer.
@author Christopher Manning