Programming And Stuff, You Know The Thing…

How to migrate a non-cloud solr 7 instance to a HDFS 2.8 cluster

Posted at — Oct 13, 2017

The following commands show how the data directory of the ‘gettingstarted’ collection can be moved to a hdfs cluster ‘mycluster’. Note that we won’t switch to SolrCloud and that the collection config gets copied to the hdfs cluster only for backup purposes. Everything except the ‘../data’ directory will still be read from the local disk after the migration.

The following commands should run on every Linux box after doing:

git clone https://github.com/jjYBdx4IL/example-maven-project-setups.git
cd example-maven-project-setups/solr-example

Pre-requisites are Java 1.8, Maven, and some commonly used linux command line tools.

# setup single-node solr instance
mvn clean install

# start the single-node instance:
./target/solr-*/bin/solr start

# open SpamMain.java and run it from your IDE (ie. eclipse) to feed the index with randomly generated documents

# now set up and start a full hdfs cluster:
cd ../hdfs-example
mvn clean integration-test
export HADOOP_HOME=`pwd`/target/dfsnode1

# migrate the data: (make sure there are no writes happening by disabling your index feeder task etc.)
./hdfs.sh 1 dfs -mkdir -p /solr
./hdfs.sh 1 dfs -copyFromLocal ../solr-example/target/solr-*/server/solr/gettingstarted /solr/gettingstarted
./hdfs.sh 1 dfs -rm /solr/gettingstarted/data/index/write.lock
./hdfs.sh 1 dfs -ls -R /

# restart solr against cluster:
cd ../solr-example
./target/solr-*/bin/solr stop
rm -rf target/solr-*/server/solr/gettingstarted/data
./target/solr-*/bin/solr start -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs \
    -Dsolr.data.dir=hdfs://mycluster/solr/gettingstarted/data -Dsolr.hdfs.confdir=$HADOOP_HOME/etc/hadoop \
# done.
# Now go to http://localhost:8983/solr/#/gettingstarted/query , click on Execute Query and verify
# that your data is still there.

# clean up:
mvn clean
mvn clean -f ../hdfs-example

Sadly, I didn’t find a way to make a non-cloud solr instance store its collection configuration on hdfs, though the cloud-version of solr will use ZooKeeper’s shared storage for storing configuration files.

Scripts and README can be found at: https://github.com/jjYBdx4IL/example-maven-project-setups/tree/6db984ef724618333c4995ef719d1a1db9f1ce06/solr-example