Examples of Automaton

dk.brics.automaton.Automaton
Finite-state automaton with regular expression operations.
Class invariants:
- An automaton is either represented explicitly (with {@link State} and {@link Transition} objects)or with a singleton string (see {@link #getSingleton()} and {@link #expandSingleton()}) in case the automaton is known to accept exactly one string. (Implicitly, all states and transitions of an automaton are reachable from its initial state.)
- Automata are always reduced (see {@link #reduce()}) and have no transitions to dead states (see {@link #removeDeadTransitions()}).
- If an automaton is nondeterministic, then {@link #isDeterministic()} returns false (butthe converse is not required).
- Automata provided as input to operations are generally assumed to be disjoint.
If the states or transitions are manipulated manually, the {@link #restoreInvariant()}and {@link #setDeterministic(boolean)} methods should be used afterwards to restore representation invariants that are assumed by the built-in automata operations. @author Anders Møller <amoeller@cs.au.dk>
net.sourceforge.chaperon.build.Automaton
This class contains a automaton of states. @author Stephan Michels @version CVS $Id: Automaton.java,v 1.8 2003/12/09 19:55:53 benedikta Exp $
org.apache.lucene.util.automaton.Automaton
Finite-state automaton with regular expression operations.
Class invariants:
- An automaton is either represented explicitly (with {@link State} and{@link Transition} objects) or with a singleton string (see{@link #getSingleton()} and {@link #expandSingleton()}) in case the automaton is known to accept exactly one string. (Implicitly, all states and transitions of an automaton are reachable from its initial state.)
- Automata are always reduced (see {@link #reduce()}) and have no transitions to dead states (see {@link #removeDeadTransitions()}).
- If an automaton is nondeterministic, then {@link #isDeterministic()}returns false (but the converse is not required).
- Automata provided as input to operations are generally assumed to be disjoint.
If the states or transitions are manipulated manually, the {@link #restoreInvariant()} and {@link #setDeterministic(boolean)} methodsshould be used afterwards to restore representation invariants that are assumed by the built-in automata operations.

Note: This class has internal mutable state and is not thread safe. It is the caller's responsibility to ensure any necessary synchronization if you wish to use the same Automaton from multiple threads. In general it is instead recommended to use a {@link RunAutomaton} for multithreaded matching: it is immutable, thread safe, and much faster.
@lucene.experimental
rationals.Automaton
statechum.analysis.learning.experiments.PaperUAS.TracesForSeed.Automaton
wyautl.core.Automaton
type term Bool // bool type term Int // int type term Not(Type) // negation type term Or{Type...} // union of zero or more types term And{Type...} // intersection of zero or more types define Type as Void | Bool | Int | Not | Or | And
In this simple language, we can express types such as the following:
- Not(Or{Int,Bool}) --- the set of values excluding the integers and booleans
- And{Int,Bool} --- the set of values in both the integers and booleans (i.e. the empty set).
We can also see how the various components correspond to states in an automaton. Consider the following examples:
- Not(Void) --- this corresponds to an automaton with two states: 1) a term with no child representing Void; 2) a term representing Not which has a single child referring to state 1.
- Or{Int,Bool} --- corresponds to an automaton with four states: 1) a term with no child representing Int; 2) a term with no child representing Bool; 3) a set with two children referring to states 1 and 2; 4) a term representing Or with a single child referring to state 3.
Notes
- Roots. States can be explicitly marked as roots to provide a way to track them through the various operations that might be performed on an automaton. In particular, as states are rewritten, the roots will be updated accordingly.
- Minimisation. An automaton which has the strong equivalence property is said to be minimised. Automata are generally kept in the minimised form, and only use of the set() method can break this. The strong equivalence property guarantees that there are no two distinct, but equivalent states. In order to restore this property, the minimise() function must be called explicitly.
  
  Compaction. An automaton which does not contain garbage states is said to be compacted. Automata are generally kept in compacted form, and only use of the set() method can break this. Garbage states are those not reachable from any marked root state. In order to restore this property, the compact() function must be called explicitly.
- Canonical Form. An automaton which is minimised is not guaranteed to be canonical. This means we can have automata which are effectively equivalent, but which not considered identical (i.e., where equals() returns false). In some circumstance, it is desirable to move an automaton into canonical form, and this can be achieved with the canonicalise() function.
- Virtual States. In the internal representation of automata, leaf states may be not be represented as actual states. This will occur if the leaf node does not include any supplementary data, and is primarily for space and performance optimisation. In such case, the node is represented as a child node using a negative index.
@author David J. Pearce
wyautl_old.lang.Automaton

A finite-state automaton for representing Whiley types. This is a machine for accepting matching inputs of a given language. An automaton is a directed graph whose nodes and edges are referred to as states and transitions. Each state has a "kind" which determines how the state behaves on given inputs. For example, a state with "OR" kind might accept an input if either of its children does; in contrast, and state of "AND" kind might accept an input only if all its children does.

The organisation of children is done according to two approaches: deterministic and non-deterministic. In the deterministic approach, the ordering of children is important; in the non-deterministic approach, the ordering of children is not important. A flag is used to indicate whether a state is deterministic or not.

Aside from having a particular kind, each state may also have supplementary material. This can be used, for example, to effectively provide labelled transitions. Another use of this might be to store a given string which must be matched.

NOTE: In the internal representation of automata, leaf states may be not be represented as actual nodes. This will occur if the leaf node does not include any supplementary data, and is primarily for space and performance optimisation. In such case, the node is represented as a child node using a negative index.
@author David J. Pearce

Examples of org.apache.lucene.util.automaton.Automaton


      // NOTE: not great that we ask the suggester to give
      // us the "answer key" (ie maybe we have a bug in
      // suggester.toLevA ...) ... but testRandom2() fixes
      // this:
      Automaton automaton = suggester.convertAutomaton(suggester.toLevenshteinAutomata(suggester.toLookupAutomaton(analyzedKey)));
      assertTrue(automaton.isDeterministic());
      // TODO: could be faster... but its slowCompletor for a reason
      BytesRef spare = new BytesRef();
      for (TermFreqPayload2 e : slowCompletor) {
        spare.copyChars(e.analyzedForm);
        Set<IntsRef> finiteStrings = suggester.toFiniteStrings(spare, tokenStreamToAutomaton);
        for (IntsRef intsRef : finiteStrings) {
          State p = automaton.getInitialState();
          BytesRef ref = Util.toBytesRef(intsRef, spare);
          boolean added = false;
          for (int i = ref.offset; i < ref.length; i++) {
            State q = p.step(ref.bytes[i] & 0xff);
            if (q == null) {

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

        System.out.println("TEST: got termsEnum=" + termsEnum);
      }
      BytesRef term;
      int ord = 0;


      Automaton automaton = new RegExp(".*", RegExp.NONE).toAutomaton();    
      final TermsEnum termsEnum2 = terms.intersect(new CompiledAutomaton(automaton, false, false), null);


      while((term = termsEnum.next()) != null) {
        BytesRef term2 = termsEnum2.next();
        assertNotNull(term2);

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

        maxDistance <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE) {
      LevenshteinAutomata builder = 
        new LevenshteinAutomata(UnicodeUtil.newString(termText, realPrefixLength, termText.length - realPrefixLength), transpositions);


      for (int i = runAutomata.size(); i <= maxDistance; i++) {
        Automaton a = builder.toAutomaton(i);
        //System.out.println("compute automaton n=" + i);
        // constant prefix
        if (realPrefixLength > 0) {
          Automaton prefix = BasicAutomata.makeString(
            UnicodeUtil.newString(termText, 0, realPrefixLength));
          a = BasicOperations.concatenate(prefix, a);
        }
        runAutomata.add(new CompiledAutomaton(a, true, false));
      }

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

   * determinized)
   */
  public void testNFA() throws IOException {
    // accept this or three, the union is an NFA (two transitions for 't' from
    // initial state)
    Automaton nfa = BasicOperations.union(BasicAutomata.makeString("this"),
        BasicAutomata.makeString("three"));
    assertAutomatonHits(2, nfa);
  }

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

  /**
   * Test that rewriting to a prefix query works as expected, preserves
   * MultiTermQuery semantics.
   */
  public void testRewritePrefix() throws IOException {
    Automaton pfx = BasicAutomata.makeString("do");
    pfx.expandSingleton(); // expand singleton representation for testing
    Automaton prefixAutomaton = BasicOperations.concatenate(pfx, BasicAutomata
        .makeAnyString());
    AutomatonQuery aq = new AutomatonQuery(newTerm("bogus"), prefixAutomaton);
    Terms terms = MultiFields.getTerms(searcher.getIndexReader(), FN);
    assertTrue(aq.getTermsEnum(terms) instanceof PrefixTermsEnum);
    assertEquals(3, automatonQueryNrHits(aq));

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

    // factor is appropriate (eg, say a fuzzy match must be at
    // least 2X better weight than the non-fuzzy match to
    // "compete") ... in which case I think the wFST needs
    // to be log weights or something ...


    Automaton levA = convertAutomaton(toLevenshteinAutomata(lookupAutomaton));
    /*
      Writer w = new OutputStreamWriter(new FileOutputStream("out.dot"), "UTF-8");
      w.write(levA.toDot());
      w.close();
      System.out.println("Wrote LevA to out.dot");

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

  }


  @Override
  protected Automaton convertAutomaton(Automaton a) {
    if (unicodeAware) {
      Automaton utf8automaton = new UTF32ToUTF8().convert(a);
      BasicOperations.determinize(utf8automaton);
      return utf8automaton;
    } else {
      return a;
    }

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

    return tsta;
  }


  Automaton toLevenshteinAutomata(Automaton automaton) {
    final Set<IntsRef> ref = SpecialOperations.getFiniteStrings(automaton, -1);
    Automaton subs[] = new Automaton[ref.size()];
    int upto = 0;
    for (IntsRef path : ref) {
      if (path.length <= nonFuzzyPrefix || path.length < minFuzzyLength) {
        subs[upto] = BasicAutomata.makeString(path.ints, path.offset, path.length);
        upto++;
      } else {
        Automaton prefix = BasicAutomata.makeString(path.ints, path.offset, nonFuzzyPrefix);
        int ints[] = new int[path.length-nonFuzzyPrefix];
        System.arraycopy(path.ints, path.offset+nonFuzzyPrefix, ints, 0, ints.length);
        // TODO: maybe add alphaMin to LevenshteinAutomata,
        // and pass 1 instead of 0?  We probably don't want
        // to allow the trailing dedup bytes to be
        // edited... but then 0 byte is "in general" allowed
        // on input (but not in UTF8).
        LevenshteinAutomata lev = new LevenshteinAutomata(ints, unicodeAware ? Character.MAX_CODE_POINT : 255, transpositions);
        Automaton levAutomaton = lev.toAutomaton(maxEdits);
        Automaton combined = BasicOperations.concatenate(Arrays.asList(prefix, levAutomaton));
        combined.setDeterministic(true); // its like the special case in concatenate itself, except we cloneExpanded already
        subs[upto] = combined;
        upto++;
      }
    }


    if (subs.length == 0) {
      // automaton is empty, there is no accepted paths through it
      return BasicAutomata.makeEmpty(); // matches nothing
    } else if (subs.length == 1) {
      // no synonyms or anything: just a single path through the tokenstream
      return subs[0];
    } else {
      // multiple paths: this is really scary! is it slow?
      // maybe we should not do this and throw UOE?
      Automaton a = BasicOperations.union(Arrays.asList(subs));
      // TODO: we could call toLevenshteinAutomata() before det? 
      // this only happens if you have multiple paths anyway (e.g. synonyms)
      BasicOperations.determinize(a);


      return a;

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

    for (int i = 0; i <= 0x10FFFF; i++) {
      if (Character.isLetter(i)) {
        initial.addTransition(new Transition(i, i, accept));
      }
    }
    Automaton single = new Automaton(initial);
    single.reduce();
    Automaton repeat = BasicOperations.repeat(single);
    jvmLetter = new CharacterRunAutomaton(repeat);
  }

View Full Code Here

Examples of org.apache.lucene.util.automaton.Automaton

      }
    }
    final BytesRef utf8Key = new BytesRef(key);
    try {


      Automaton lookupAutomaton = toLookupAutomaton(key);


      final CharsRef spare = new CharsRef();


      //System.out.println("  now intersect exactFirst=" + exactFirst);

View Full Code Here

0 1 2 3 4 5

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.

Examples of Automaton

Notes

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton

Examples of org.apache.lucene.util.automaton.Automaton