By using a Lucene index to store the information on disk, rather than some specialized file format, we get for "free" Lucene's correctness (especially regarding multi-process concurrency), and the ability to write to any implementation of Directory (and not just the file system).
In addition to the permanently-stored Lucene index, efficiency dictates that we also keep an in-memory cache of recently seen or all categories, so that we do not need to go back to disk for every category addition to see which ordinal this category already has, if any. A {@link TaxonomyWriterCache} object determines the specific caching algorithmused.
This class offers some hooks for extending classes to control the {@link IndexWriter} instance that is used. See {@link #openLuceneIndex} and{@link #closeLuceneIndex()} . @lucene.experimental
|
|