Solr Cassandra Contact

Apache Solr Tutorial

Apache Solr TutorialIntroduction to SolrOverview of SolrOverview of LuceneInstallation of SolrSolrCloud ModeSolr Terminology

Solr Terminology

Now, We will try to understand the real meaning of some of the terms that are frequently used with Apache Solr terminology and concepts. while working on Apache Solr.

Solr Instance
An solr-instance is a running solr service as application server, which runs inside the JVM or something like an runtime-environment. one or more cores can be configured to run in each instance.

Solr Core
A single Solr instance, which represents a single Solr index. A core has a different set of configuration files and schema definitions than other cores. Multiple cores can run on a single node.

A group of cores that together form a single logical index. A collection has a different set of configuration files and schema definitions than other collections.
In Solr Standalone, one or more Documents grouped together in a single logical index using a single configuration and Schema. In SolrCloud a collection may be divided up into multiple logical shards, which may in turn be distributed across many nodes, or in a Single node Solr installation, a collection may be a single Core.

Difference between Solr Core and Solr Collection?
Solr core or collection both simply refer to a Lucene index and a set of related Solr configuration files. Core is always about a very concrete index, it can be a whole index or a small part of a large index. A collection is about a cluster in which a large index is distributed across many machines. But they constitute a logical index.
In distributed search and index, a logical index may be distributed in many machines, each machine stores a part of the whole index, the part needs all the features that an old Solr core has, so the small part index still called a core and the whole index is referred with a new name called collection.

Cores, Collections and Clusters in Apache Solr?
Apache Solr in standalone mode, you have a single core for each index. You can have multiple cores, but they would all be separate indexes. But Solr in SolrCloud mode, you would have a core on each node of your cluster, and together those cores make up a collection. You can have multiple collections, for separate indexes.

Solr Shard is a logical section of a single collection. In SolrCloud or distributed environments, a logical partition of a single Collection. Every shard consists of at least one physical Replica, but there may be multiple Replicas distributed across multiple Nodes for fault tolerance. The data is partitioned between multiple Solr instances, where each chunk of data can be called as a Shard. It contains a subset of the whole index.

Solr is a document storage and retrieval engine. Every piece of data submitted to Solr for processing is a document. A group of fields and their values. Documents are the basic unit of data in a collection. Documents are assigned to shards using standard hashing, or by specifically assigning a shard within the document ID. Documents are versioned after each write operation. Suppose core is a table in DBMS then records are known as documents in Solr.

The content to be indexed/searched along with metadata defining how the content should be processed by Solr.