Welcome!

Enterprise DevOps, Log Management and Analytics

Sematext Blog

Subscribe to Sematext Blog: eMailAlertsEmail Alerts
Get Sematext Blog via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Intel XML, XML Magazine

Blog Feed Post

Migrating to SolrCloud from Solr Master-Slave

Nowadays there are more and more organizations searching for fault-tolerant and highly available solutions for various parts of their infrastructure, including search, which evolved from merely a “nice to have” feature to the first class citizen and a “must have” element.

Apache Solr is a mature search solution that has been available for over a decade now.  Its traditional master-slave deployment has been available since 2006, while the fully distributed deployment known as SolrCloud has been available for only a few years now. Thus, naturally, many organizations are in the process of migrating from Solr master-slave to SolrCloud, or are at least thinking about the move. In this article, we will give you an overview of what’s needed to be done for the migration to SolrCloud to be as smooth as it can be.

Step One: Hardware

The first thing that you need to think about when migrating from old master-slave environment is the hardware. You may wonder why the hardware should be different from what you have right now. There are a few reasons:

  • You want to keep the current production while migrating to new solution to avoid service interruption. Doing that will also let you test the new SolrCloud cluster and rollback to the old Solr master-slave if something goes awry. It is also a good idea to run various performance tests before going to production, so having a separate hardware for the SolrCloud cluster may be a very good idea if you can afford it.
  • You will want to prepare the infrastructure for your new search setup so that it can handle the load for the next N months or even a few years, so you don’t have to change the architecture or infrastructure again in the near future.
  • If your data is changing rapidly, with SolrCloud you will index the data to all nodes at the same time, not only to the master servers. This may require more resources on the nodes. More disk I/O will be needed. Because of constant data indexing, data will also be indexed on the replicas, so you need to account for that.
  • If you plan to have replicas, keep in mind they will do the same operations as the leader shards, so having more replicas means not only more storage requirements, but also more resources like CPU and memory.
  • Finally, you will need ZooKeeper ensemble to be working. SolrCloud uses ZooKeeper to make itself fully configurable. We talk about this in the next point.

Keep all of the above in mind when choosing your final hardware. If you don’t, you’ll run into situations where you will need to adjust your provisioned machines in the near future, and that requires the thing we tend to have the least of – time.

Step Two: Solr Setup

Setting up SolrCloud is a bit different than setting up Solr master-slave architecture. First of all, you will need a working Apache ZooKeeper (http://zookeeper.apache.org/) ensemble. SolrCloud uses ZooKeeper to store collections configuration, collections state, to keep track of nodes, for leader election, and so on. In general ZooKeeper is critical for SolrCloud – it’s like a heart of a SolrCloud cluster. When Zookeeper is not available no indexing operation will be successful, some queries may be – up to a point, where something happens to the cluster.

solrcloud-architecturehttps://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture-3... 300w, https://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture-7... 768w, https://sematext.com/wp-content/uploads/2016/12/SolrCloud-architecture.png 1116w" sizes="(max-width: 648px) 100vw, 648px" />

In order to setup a highly available and fault-tolerant ZooKeeper ensemble you need at least 3 instances. That allows for a single instance of ZooKeeper to go down and have the ensemble running. The basic idea is that you need to have at least 50% + 1 nodes to be operational in ZooKeeper ensemble for it to be running properly. So when you have 3 nodes, you need at least 2, when you have 5 nodes you need at least 3 and so on.

It is also a very good idea to point all SolrCloud instances to all  ZooKeeper nodes, not just one of them.. That will mean that the ZK_HOST property in your solr.in.sh will look something like this:

ZK_HOST=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181

Of course, you may wonder why we need a standalone ZooKeeper instance when SolrCloud provides embedded ZooKeeper version when run with -c switch without ZK_HOST specified (or without -z switch). At the time of this writing the embedded ZooKeeper has not been designed for production deployments in mind. One of the reasons is that it can’t be used in a distributed mode and having a single Zookeeper instance running inside the same JVM as SolrCloud node is asking for trouble. Imagine JVM going out of memory or SolrCloud node being restarted – the embedded ZooKeeper would go down as well, which means that the cluster would loose its heart for some time. This is something we want to avoid.

Step Three: Migrating the Configuration

The next step in your migration from Solr master-slave to SolrCloud will be preparation of the configuration files. There are at least two files – the schema.xml and solrconfig.xml that you need to take care of.

Note that there can also be additional files that might be required or may need to be removed, depending on the configuration changes. We think that removing all the unneeded configuration files is a good idea because that avoids confusion in the future.

One big thing to remember – if you are migrating to Solr 6, Java 8 is a must. Since support for Java 7 has ended, you should use a later version of Java everywhere. 

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.