Package edu.ucla.sspace.lra

Examples of edu.ucla.sspace.lra.LatentRelationalAnalysis

LRA uses three main components to analyze a large corpus of text in order to measure relational similarity between pairs of words (i.e. analogies). LRA uses the search engine to find patterns based on the input set as well as its corresponding alternates (see {@link #loadAnalogiesFromFile(String)}). A sparse matrix is then generated, where each value in the matrix is the number of times the row's word pair occurs with the column's pattern between them.

After the matrix has been built, the Singular Value Decomposition (SVD) is used to reduce the dimensionality of the original word-document matrix, denoted as A. The SVD is a way of factoring any matrix A into three matrices U Σ VT such that Σ is a diagonal matrix containing the singular values of A. The singular values of Σ are ordered according to which causes the most variance in the values of A. The original matrix may be approximated by recomputing the matrix with only k of these singular values and setting the rest to 0. The approximated matrix  = Uk Σk VkT is the least squares best-fit rank-k approximation of A. LRA reduces the dimensions by keeping only the first k dimensions from the row vectors of U and the k dimensions from the column vectors of Σ. The projection matrix is then used to calculate the relational similarities between pairs using the row vectors corresponding to the word pairs.

This class uses the Apache Lucune Search Engine for optimal indexing and filtering of word pairs using any given corpus. This class also uses Wordnet through the JAWS interface in order to find alternate word pairs from given input pairs. @author Sky Lin


            String skipIndexProp = props.getProperty(
                    LatentRelationalAnalysis.LRA_SKIP_INDEX);
            if (skipIndexProp.equals("true")) {
                doIndex = false; //set as option later
            }
            LatentRelationalAnalysis lra = new LatentRelationalAnalysis(
                    corpusDir, indexDir, doIndex,
                    SVD.getFastestAvailableFactorization());

            //Steps 1-2. Load analogy input
            lra.loadAnalogiesFromFile(analogyFile);

            Matrix projection;

            // if we load a projection matrix from file, we can skip all the
            // preprocessing
            String readProjectionFile = props.getProperty(
                    LatentRelationalAnalysis.LRA_READ_MATRIX_FILE);
            if (readProjectionFile != null) {
                File readFile = new File(readProjectionFile);
                if (readFile.exists()) {
                    projection =
                        MatrixIO.readMatrix(new File(readProjectionFile),
                                            MatrixIO.Format.SVDLIBC_SPARSE_TEXT,
                                            Matrix.Type.SPARSE_IN_MEMORY);
                } else {
                    throw new IllegalArgumentException(
                        "specified projection file does not exist");
                }
            } else { //do normal LRA preprocessing...


                //Step 3. Get patterns Step 4. Filter top NUM_PATTERNS
                lra.findPatterns();

                //Step 5. Map phrases to rows
                lra.mapRows();
                //Step 6. Map patterns to columns
                lra.mapColumns();

                //Step 7. Create sparse matrix
                Matrix sparse_matrix = lra.createSparseMatrix();

                //Step 8. Calculate entropy
                sparse_matrix = lra.applyEntropyTransformations(sparse_matrix);

                //Step 9. Compute SVD on the pre-processed matrix.
                int dimensions = 300; //TODO: set as option
                String userSpecfiedDims = props.getProperty(
                        LatentRelationalAnalysis.LRA_DIMENSIONS_PROPERTY);
                if (userSpecfiedDims != null) {
                    try {
                        dimensions = Integer.parseInt(userSpecfiedDims);
                    } catch (NumberFormatException nfe) {
                        throw new IllegalArgumentException(
                            LatentRelationalAnalysis.LRA_DIMENSIONS_PROPERTY +
                            " is not an integer: " + userSpecfiedDims);
                    }
                }
                projection = lra.computeSVD(sparse_matrix, dimensions);

                //Step 10. Compute projection matrix from U and S.  This is
                //already returned by the matrix factorization.
            }

            String writeProjectionFile = props.getProperty(
                    LatentRelationalAnalysis.LRA_WRITE_MATRIX_FILE);
            if(writeProjectionFile != null) {
                MatrixIO.writeMatrix(projection,
                                     new File(writeProjectionFile),
                                     MatrixIO.Format.SVDLIBC_SPARSE_TEXT);
            }

            //Step 11. Get analogy input and Evaluate Alternatives
            lra.evaluateAnalogies(projection, testAnalogies, outputFile);
        } catch (Throwable t)  {
            t.printStackTrace();
        }
    }
View Full Code Here

TOP

Related Classes of edu.ucla.sspace.lra.LatentRelationalAnalysis

Copyright © 2018 www.massapicom. All rights reserved.
All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.