Apache Lucene is a high-performance, open source, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Lucene is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead. It has its own query syntax for performing searches.
Accurate and Efficient Search Algorithms
- - ranked or score based searching -- best results returned first
- - many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
- - fielded searching (e.g. title, author, contents)
- - sorting by any field
- - multiple-index searching with merged results
- - allows simultaneous update and searching
- - flexible faceting, highlighting, joins and result grouping
- - fast, memory-efficient and typo-tolerant suggesters
- - pluggable ranking models, including the Vector Space Model and Okapi BM25
- - configurable storage engine (codecs)
Scalable, High-Performance Indexing
- - more than 150GB/hour on modern hardware
- - small RAM requirements -- only 1MB heap
- - incremental indexing as fast as batch indexing
- - index size roughly 20-30% the size of text indexed
An Inverted Index is a data structure used to create full text search. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
- - Lucene stores input data in Inverted Index
- - It is designed to allow very fast full-text searches.
- - In an inverted index each indexed term points to a list of documents that contains the term.
- - It is much faster to find a term in an index, than to scale all the documents.