User Tools

Site Tools


clinecenterforadvancedsocialresearch:cyberinfrastructureteam:511migrationplan

5.1.1 Migration Plan

This procedure is deprecated, we will employ the usual migration plan for 5.1.1, deploying a completely rebuild index with some fixes to the country entries.

This document covers the 5.1.1 GNC migration. Although this version does not include a reprocessing, it does include the migration from the CLI version of Voyager to the Storm version of that application, as well as the new BLN ingestion plumbing. The transition to Storm is the easy part, the BLN on the other hand, has many moving pieces. There are new mongo collections that must be copied and restored to support the BLN system. There is also some cleanup of duplicate BLN GUIDs that must be performed, otherwise there will be errors when ingesting BLN content.

Step-by-step guide

Before you begin, give all Archer and Speed user sufficient notice of the upcoming migration, and let them know that there may be potential issues and downtime, although these systems should continue to operate normally.

Question : Do we want to rename any machines?

To migrate to GNC-5.1.1 from 4.0, do the following.

<HTML><ol></HTML> <HTML><li></HTML><HTML><p></HTML>First, shut down the Voyager CLI web crawler.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>The current Voyager runs as an Ubuntu service on voyager.clinecenter.illinois.edu. Shut down this service, and watch the logs to make sure it finishes.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Make sure the processBLN.sh script and any test Voyager topology are stopped.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>For testing I have been running the BLN processing tool on nexus-5-prod. Ensure it's not still running. It would be changing the MongoDB collections we need to be dumping for restore elsewhere in the next steps, and we can't have that. Use the process stack to identify the running application and simply issue a kill command. Also make sure the Voyager topology is not running on the storm system, as that might also affect this datastore. The storm command line interface is typically run from tornado.clinecenter.illinois.edu. The command storm list will list all running topologies, and storm kill command can be used to kill a running storm topology.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Rename the MongoDB collection bln to BulkLexisNexis. <HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>db.bln.renameCollection("BulkLexisNexis") should do<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Run the tool to remove duplicate BLN GUIDs (stored in original_docid field)  in the old BLN MongoDB collection (BulkLexisNexis).<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>The class edu.illinois.cline.storm.bln.OldBLNCopy in the archive project has a method to perform this cleanup. Pass the -dedup flag on the command line to invoke this action. This may take some time to run, as it has to remove over 500k duplicate articles. You can run this process in the background then move on to the next step.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Reset the MongoDB blnbuffer entries done flag.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>As blnbuffer entries are processed, the done field is set to true to avoid reprocessing. During testing this field value may have been marked to ensure they record was not reprocessed. These will all need to be reset with this command - <HTML></p></HTML> <HTML><p></HTML>
db.blnbuffer.updateMany({},{$set:{done:false}});<HTML></p></HTML> <HTML><p></HTML>This must be performed from the cline database.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Get a dump of the blnsources collection (in case anything goes wrong), and add any tier 1 BLN sources not already included to the list of sources already in place. Make sure you don't replace the ones we already have as they include the bookkeeping for the epochs that have already been processed.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>For testing I have only been using a handful of the BLN sources we intend to ingest in the first tier. Before we dump the blnsources collection, ensure all the tier 1 source sets are inserted. There is a utility that can be run to do this, edu.illinois.cline.bulklexisnexis.BLNMongoInterface main method has a feature that will do this for you.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Dump the blnbuffer and blnsources MongoDB collections.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>These two MongoDB collections contain all the data already loaded from Lexis Nexis. We want to preserve this so we don't have to load it again, so we will dump these collections to restore into the production database. These collections will go into cline database on the staging-prod.clinecenter.illinois.edu MongoDB.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Add the blnbuffer and blnsources collections to the production MongoDB used for staging.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>These collections will now be used in production. Add these collections to the cline database on the MongoDB on staging-prod.clinecenter.illinois.edu.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Add the ingestion record to the ingest log MongoDB collection to the the production MongoDB.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>We need an ingestion record to make insertions into the staging area, and this is required to process BLN entries. A very simple ingestion log entry must be inserted in the ingestlog MongoDB collection on staging-prod.clinecenter.illinois.edu, it should look like so:<HTML></p></HTML> <HTML><p></HTML>{ name : “Bulk Lexis Nexis”, owner : “cline”, steward : “cline”, source : “Lexis Nexis”, description : “This is the ingestion log entry for all bulk Lexis Nexis ingestion” }<HTML></p></HTML> <HTML><p></HTML>A manual insert is required here.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Update the Lodestar web application, Scout and Archer.<HTML></p></HTML> <HTML><p></HTML>Instruction<HTML></p></HTML>

<HTML><p></HTML>Lodestar has been updated to a newer version of Vaadin API, and has seen a facelift. Replace the existing version of Lodestart with this new one on lodestar.clinecenter.illinois.edu. Also, Scout and Archer will have to be updated to get access to the new MongoDB collections and name changes.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>When the process for step 2 completes, restart the BLN processing apparatus.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>There is a script to run this process. I would suggest we not set up a service for this, we may need to clean up the staging area before it can restart. The script in in the archive project, it's called processBLN.sh. This script run two tools actually, one to download new BLN operations from Lexis Nexis, the other pulls the buffered operations and stage the resulting documents for processing, or deletes them from the GNI.<HTML></p></HTML>

<HTML></li></HTML> <HTML><li></HTML><HTML><p></HTML>Start the 5.1.1 version of Voyager on the production storm cluster.<HTML></p></HTML> <HTML><p></HTML>instruction<HTML></p></HTML>

<HTML><p></HTML>Make sure the tagged release is checked out from the master branch on git, then recompile using gradle clean shadowJar. Then run the storm app, and check the logs to make sure all is well.<HTML></p></HTML>

<HTML></li></HTML><HTML></ol></HTML>




clinecenterforadvancedsocialresearch/cyberinfrastructureteam/511migrationplan.txt · Last modified: 2023/06/14 12:58 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki