Examples of IncrementalSemanticAnalysis

edu.ucla.sspace.isa.IncrementalSemanticAnalysis
cimec.unitn.it/marco/publications/acl2007/coglearningacl07.pdf">here

ISA is notable in that it builds semantics incrementally using both information from the co-occurrence of a word and the semantics of the co-occurring word. Similar to Random Indexing (RI), ISA uses index vectors to reduce the number of dimensions needed to represent the full co-occurrence matrix. In contrast, other semantic space algorithms such as RI, HAL and BEAGLE, ISA uses the semantics of the co-occurring words to update the semantics of their neighbors. Formally, the semantics of a word w_i are updated for the co-occurrence of another word w_j as:

sem(w_i) += i · (m_c · sem(w_j) + (1 - m_c) · IV(w_j))

where sem is the semantics for a word, and IV is the index vector for a word. i defines the impact rate, which is how much the co-occurrence affects the semantics. m_c defines the degree to which the semantics affect the co-occurring word's semantics. This weighting factor is based on the frequency of occurrence; the semantics of frequently occurring words cause less impact. m_c is formally defined as 1 ÷ e^{freq(word) ÷ k_m}, where k_m is a weighting factor for determing how quickly the semantic of a a word diminish in their affect on co-occurring words.

This class defines the following configurable properties that may be set using either the System properties or using the {@link IncrementalSemanticAnalysis#IncrementalSemanticAnalysis(Properties)}constructor. The two most important properties for configuring ISA are {@value #IMPACT_RATE_PROPERTY} and {@value #HISTORY_DECAY_RATE_PROPERTY}. The values that these properties set have been initialized to the values specified in Baroni et al.

Property: {@value #IMPACT_RATE_PROPERTY} Default: {@value #DEFAULT_IMPACT_RATE}: This property specifies the impact rate of co-occurrence, which specifies to what degree does the co-occurrence of one word affect the semantics of the other. This rate affects both the impact of the index vector for a co-occurring word as well as the impact of the semantics.
Property: {@value #HISTORY_DECAY_RATE_PROPERTY} Default: {@value #DEFAULT_HISTORY_DECAY_RATE}: This property specifies the decay rate at which the semantics of co-occurring words lessen their impact. A word's frequency of occurrence is combined with the history decay rate to indicate the degree to which the word's semantics will influence (i.e. be added to) the semantics of a co-occurring word. High values will cause the semantics of frequently occurring words to have minimal impact on other words' semantics.
Property: {@value #WINDOW_SIZE_PROPERTY} Default: {@value #DEFAULT_WINDOW_SIZE}: This property sets the number of words before and after that are counted as co-occurring. With the default value, {@code 5} words are counted before and {@code 5} words are counterafter. This class always uses a symmetric window.
Property: {@value #VECTOR_LENGTH_PROPERTY} Default: {@value #DEFAULT_VECTOR_LENGTH}: This property sets the number of dimensions to be used for the index and semantic vectors.
Property: {@value #USE_PERMUTATIONS_PROPERTY} Default: {@code false}: This property specifies whether to enable permuting the index vectors of co-occurring words. Enabling this option will cause the word semantics to include word-ordering information. However this option is best used with a larger corpus.
Property: {@value #PERMUTATION_FUNCTION_PROPERTY} Default: {@link edu.ucla.sspace.index.DefaultPermutationFunction DefaultPermutationFunction}: This property specifies the fully qualified class name of a {@link PermutationFunction} instance that will be usedto permute index vectors. If the {@value #USE_PERMUTATIONS_PROPERTY} isset to {@code false}, the value of this property has no effect.
Property: {@value #USE_SPARSE_SEMANTICS_PROPERTY} Default: {@code false}: This property specifies whether to use a sparse encoding for each word's semantics. Using a sparse encoding can result in a large saving in memory, while requiring more time to process each document.

Due to the incremental nature of ISA, instance of this class are not designed to be multi-threaded. Documents must be processed sequentially to properly model how the semantics of co-occurring words affect each other. Multi-threading would induce an ambiguous ordering to co-occurrence. @author David Jurgens

            t.printStackTrace();
        }
    }
    
    protected SemanticSpace getSpace() {
        isa = new IncrementalSemanticAnalysis();


        // note that getSpace() is called after the arg options have been
        // parsed, so this call is safe.
        if (argOptions.hasOption("loadVectors")) {
            String fileName = argOptions.getStringOption("loadVectors");

Examples of IncrementalSemanticAnalysis

Examples of edu.ucla.sspace.isa.IncrementalSemanticAnalysis