Solr Cassandra Contact

Apache Solr Tutorial

Apache Solr TutorialIntroduction to SolrOverview of SolrOverview of LuceneInstallation of SolrSolrCloud ModeSolr Terminology

SolrCloud Mode

Solr Standalone and SolrCloud are not separate things, Solr is the application while SolrCloud is a mode of running Apache Solr.
- Solr Standalone or Legacy Mode
- Solr Distributed or SolrCloud Mode

Solr Standalone or Legacy Mode

The legacy solr application run on a single server. In standalone mode, Solr still offers index replication and distributed queries in a master/slave model, but these activities are not coordinated with ZooKeeper but are managed manually.

What is Solr replication?
In Solr master/slave model, A method of copying a master index from one server to one or more "slave" or "child" servers. Slave is usually a secondary physical machine that imports data from the main publisher or master. The configuration is a master-slave setup where the master database is the original storage machine and the slave is the recipient of the replicated data.

Why Solr replication?
Replicating an index is useful when:
- Backup of the index is required.
- Distribute searches across multiple read-only copies of the index due to large search volume which one machine cannot handle.
- search performance reduces on the indexing machine due to high volume/high rate of indexing which consumes machine resources.

Solr Distributed or SolrCloud Mode

SolrCloud is a terminology used in distributed search and indexing. When we need to index huge amounts of data, we need to think of scalability and performance. This is where SolrCloud comes into the picture.

What Problem Does Distribution Solve?
- If searches are taking too long or the index is approaching the physical limitations of its machine, you should consider distributing the index across two or more Solr servers.
- To distribute an index, you divide the index into partitions called shards, each of which runs on a separate machine. Solr then partitions searches into sub-searches, which run on the individual shards, reporting results collectively.
- The architectural details underlying index sharding are invisible to end users, who simply experience faster performance on queries against very large indexes.