Examples of IncrementalSemanticAnalysis

ISA is notable in that it builds semantics incrementally using both information from the co-occurrence of a word and the semantics of the co-occurring word. Similar to Random Indexing (RI), ISA uses index vectors to reduce the number of dimensions needed to represent the full co-occurrence matrix. In contrast, other semantic space algorithms such as RI, HAL and BEAGLE, ISA uses the semantics of the co-occurring words to update the semantics of their neighbors. Formally, the semantics of a word wi are updated for the co-occurrence of another word wj as:

sem(wi) += i · (mc · sem(wj) + (1 - mc) · IV(wj))

where sem is the semantics for a word, and IV is the index vector for a word. i defines the impact rate, which is how much the co-occurrence affects the semantics. mc defines the degree to which the semantics affect the co-occurring word's semantics. This weighting factor is based on the frequency of occurrence; the semantics of frequently occurring words cause less impact. mc is formally defined as 1 ÷ efreq(word) ÷ km, where km is a weighting factor for determing how quickly the semantic of a a word diminish in their affect on co-occurring words.

This class defines the following configurable properties that may be set using either the System properties or using the {@link IncrementalSemanticAnalysis#IncrementalSemanticAnalysis(Properties)}constructor. The two most important properties for configuring ISA are {@value #IMPACT_RATE_PROPERTY} and {@value #HISTORY_DECAY_RATE_PROPERTY}. The values that these properties set have been initialized to the values specified in Baroni et al.

Property: {@value #IMPACT_RATE_PROPERTY}
Default: {@value #DEFAULT_IMPACT_RATE}
This property specifies the impact rate of co-occurrence, which specifies to what degree does the co-occurrence of one word affect the semantics of the other. This rate affects both the impact of the index vector for a co-occurring word as well as the impact of the semantics.

Property: {@value #HISTORY_DECAY_RATE_PROPERTY}
Default: {@value #DEFAULT_HISTORY_DECAY_RATE}
This property specifies the decay rate at which the semantics of co-occurring words lessen their impact. A word's frequency of occurrence is combined with the history decay rate to indicate the degree to which the word's semantics will influence (i.e. be added to) the semantics of a co-occurring word. High values will cause the semantics of frequently occurring words to have minimal impact on other words' semantics.

Property: {@value #WINDOW_SIZE_PROPERTY}
Default: {@value #DEFAULT_WINDOW_SIZE}
This property sets the number of words before and after that are counted as co-occurring. With the default value, {@code 5} words are counted before and {@code 5} words are counterafter. This class always uses a symmetric window.

Property: {@value #VECTOR_LENGTH_PROPERTY}
Default: {@value #DEFAULT_VECTOR_LENGTH}
This property sets the number of dimensions to be used for the index and semantic vectors.

Property: {@value #USE_PERMUTATIONS_PROPERTY}
Default: {@code false}
This property specifies whether to enable permuting the index vectors of co-occurring words. Enabling this option will cause the word semantics to include word-ordering information. However this option is best used with a larger corpus.

Property: {@value #PERMUTATION_FUNCTION_PROPERTY}
Default: {@link edu.ucla.sspace.index.DefaultPermutationFunction DefaultPermutationFunction}
This property specifies the fully qualified class name of a {@link PermutationFunction} instance that will be usedto permute index vectors. If the {@value #USE_PERMUTATIONS_PROPERTY} isset to {@code false}, the value of this property has no effect.

Property: {@value #USE_SPARSE_SEMANTICS_PROPERTY}
Default: {@code false}
This property specifies whether to use a sparse encoding for each word's semantics. Using a sparse encoding can result in a large saving in memory, while requiring more time to process each document.

Due to the incremental nature of ISA, instance of this class are not designed to be multi-threaded. Documents must be processed sequentially to properly model how the semantics of co-occurring words affect each other. Multi-threading would induce an ambiguous ordering to co-occurrence. @author David Jurgens


Examples of edu.ucla.sspace.isa.IncrementalSemanticAnalysis

            t.printStackTrace();
        }
    }
   
    protected SemanticSpace getSpace() {
        isa = new IncrementalSemanticAnalysis();

        // note that getSpace() is called after the arg options have been
        // parsed, so this call is safe.
        if (argOptions.hasOption("loadVectors")) {
            String fileName = argOptions.getStringOption("loadVectors");
View Full Code Here
TOP
Copyright © 2018 www.massapi.com. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.