Implement Sitecore's switch Solr indexes strategy on SearchStax
Originally Sitecore used Lucene as a indexing service; however, a disadvantage of this service is that the indexes were stored on disk. Each web instance, like the cm and the cd roles, had their own index on disk. Occasionally, these indexes were out of sync as they were managed by the running web instance. To avoid indexes being out of sync, Sitecore introduced Solr as the default indexing service to replace Lucene. Solr is a separate scalable web service that can be used to manage indexes from a central location.
While rebuilding an index, Lucene and Solr both remove the existing index and start building the new index from scratch. In environments with a lot of data, rebuilding the index can take a long time. During the rebuild process, pages that depend on data from this index, such as search or news lists, cannot access the data. As a result, content appears to be missing from the environment. This is unwanted behavior in some cases and can lead to many concerned questions from content authors. To ensure that content is available during indexing processes, it is possible to implement the switch Solr indexes strategy as described on Sitecore’s documentation website.
For a client of uxbee, we created several API’s that rely on data from Solr indexes. To ensure that the data is always available, we decided to implement the switch Solr indexes strategy. Since this strategy had already been described by Sitecore, it should be obvious how to implement, right? Wrong! This particular client is running Sitecore on Managed Cloud based on containers. The Solr part of managed cloud is based on Sitecore Solr Managed Service of SearchStax, which is based on SolrCloud. I got to work implementing the strategy by reading many different blogs and articles. Sometimes these were outdated, sometimes not fitting Sitecore or SearchStax. While reading, every now and then I did find a piece of the puzzle. Based on my research, I dare say that implementing this strategy on SearchStax was undocumented. Now I want to bring it all together in a how to implement Sitecore’s switch Solr indexes strategy on SearchStax. It includes everything from my research and all the steps to set up automatic deployment to manage both local and Sitecore Managed Cloud on Containers with SearchStack.
Switch on Rebuild strategy
With the switch on rebuild strategy, you set Solr to rebuild an index in a separate core so that the rebuild does not affect the search index currently in use. Rebuilding an index is not often necessary. Think of situations such as new fields being added to the index or a calculated field logic that has changed or an index that is out of sync with the database. Often these issues can also be resolved if you know what items are affected. That is not always clear, so it is much easier to rebuild the index as a whole. With SwitchOnRebuild, you can rebuild an index without worry, but it will cost you some extra disk space on the Solr server per index. To keep disk space within acceptable limits, I chose to implement this strategy only to web indexes, this is where it makes the most sense. To get started, I created an example patch file for containerized development that applies the SwitchOnRebuild strategy to web databases (including the SXA web index).
Access to Solr
Later in this blogpost I will create the new collections needed for the switch on rebuild strategy. To make sure everything is created correctly, I want to have access to my Solr instance. While there is documentation to accomplish this, it is a bit scattered. I have briefly described how to access your Solr instance on the different environments.
The documentation of Sitecore describes how to access Sitecore containers. The Solr container is exposed via port 8984. This is configured in the standard Sitecore docker-compose.yml. Therefore, you can easily access your local Solr container via http://localhost:8984.
Accessing a Solr container on Sitecore’s Managed cloud is not as easy as accessing your local Solr container. There is an additional layer of security that you need to consider when accessing SearchStax Solr instance.
First, you need to locate the credentials and URL in Azure Key Vault. You can do this by coping the secret from the solr-connection-string or take the “Sitecore_ConnectionStrings_Solr.Search:” environment variable from your running cm pod. Here you will find a URL that looks something like this: https://username:password@ss12345-ab12abcd-westeurope-azure.searchstax.com/solr;solrcloud=true. Copy the username and password and save them for later use. Remove the username, password and @-sign from the URL and replace “;solrcloud=true” with “/”. This will give you a URL that can be used to access your Solr Instance on Managed Cloud. To access the Solr instance, you must first provide the credentials you saved earlier.
Creating Collections by using the Solr-init container
You do NOT want to manually create the collections in your Solr UI as described on SearchStax’s documentation site, as this does not fit into an Infrastructure As Code strategy. Therefore, the collections must be created automatically. The Solr-init container is the best place, from here Sitecore creates all indexes.
Creating custom collections is just as easy, just place a json file in the data folder of the solr-init container. You can do this by creating or modifyinf the docker\build\solr-init\Dockerfile. When you install SXA, you need to add a file to this folder, as explained in add Sitecore modules to a container. I have created a json file called cores-SwitchOnRebuild.json and placed it on GitHub, you can also copy the contents below: