Consolidating glFusion Sites- Part II

In Part 1 of this tale we moved content from one glFusion site to another by hand, copying the articles one at a time and doing minor edits. Took about an hour from start to finish.
In this part we take a fairly large site and move its content to another site using a database backup and a few other tricks. We're talking about 2000+ articles this time so doing manual copying is not an option.
The original site is www.centa.com - the Taxman's main web site where he has been accumulating answers to tax and immigration questions since about 1998. In 2007 I put together a Geeklog-powered site, tax.centa.com, that was a test the tools and techniques of moving to a databased content management system from the original Frontpage powered one. There were two reasons for this:
- Frontpage was at that time rumored to be being dropped as a supported product on Unix/Linux servers by Microsoft
- The multi-thousand page site was taking as much as 1/2 hour or more to do updates to - Frontpage just didn't scale well to that level.
So for a time I was putting updates into both the www.centa.com site using the scripts I'd written many years earlier - and into tax.centa.com as well using some new scripts. This resulted in some duplication but not as much as you might suspect since I was only doing some of the postings - splitting them across the two systems.
The tests were successful - so it was now time to consolidate the two sites back into the www.centa.com domain.
The road to this consolidation included the complete replacement of the www.centa.com site's individual file content with pages stored in glFusion. Preparation for this event included the work done in conversion of the e-mail newsletter/answer postings done to the tax.centa.com site. This entailed creating processes that pulled the original e-mails apart and built story records from them.
It turned out that the easiest way to do the conversion was to go back to the original email archives and simply run the script for all of them - with one major change - the story IDs created for the tax.centa.com site were pieces of the subject of each email. This was done because at the time there was no easy mechanism to get the <title> HTML construct to show the story name, so putting "real" key words into the URL in the form of the changed story ID gave the search engines something other than the artificially constructed default date string to key on. The new glFusion system puts the story title into the <title> construct so this was no longer necessary.
So the story ID was constructed in the same manner as the file name had been in the Frontpage system - which meant it was a straight forward mapping of old file to new story ID that could at some point be automated in a redirect. As it is, we simply have a massive number of individual redirects, one for each original file - and all in the Apache configuration file, not the htaccess file, so they are read only once at startup. Each line is like the following:
RedirectPermanent /CEN-TAPEDE/archive/2002-July/000007.html /article.php/UsCa2002-July000007.html
So this got the www.centa.com site to the point where we could bring in the articles from the tax.centa.com site and all would live happily ever after.
So consolidation day (yesterday as it turns out) came and the whole process took about 3 hours elapsed - mostly waiting for the database to digest each of the 14 pieces of the story table from the tax site. The whole process was as follows:
- ensure both sites had identical versions (1.1.5) of glFusion just to ensure record compatibility
- create backups of both sites - full backups, not just the databases - just in case. The sites are backed up daily off site with a 10 day archive kept plus periodic pulls "forever" to DVD but in this case I copied to local disk storage as well
- create database backups of both sites - the tax site's file was about 15 Megabytes uncompressed
- using VI, I edited the TAX site database backup to extract just the story records. None of the other parts of the backup should be used - they drop tables and recreate them and such - not a good thing in this case
- at the same time I replaced the TOPIC ID for each record with the corresponding one for the new site. Any that didn't map directly (jokes and thoughts in this case) I added to the destination system exactly as they were in the source system.
1,$s/'us_can'/'UsCanada'/g
This looked through all the records in the file and changed anywhere there was a 'us_can' to 'UsCanada' which is the name of the coresponding topic in the destination system. I did it in VI but it could have been done with SED (the stream editor) or any other patern matching tool. Note that this version of VI can handle files of almost any size that your computer can pull into RAM - but if your system was smaller or the files a lot larger, you could use other tools to accomplish the same thing. - The backup system puts a number of records into each INSERT statement - but with over 2000 stories there were a number of INSERTS, so, using "csplit" I split the file from the previous step into pieces, each beginning with an INSERT
csplit ../stories "/INSERT/" "{*}" - zip the resulting files (xx01 to xx15) and copy them from the server back to my workstation
- open phpMyAdmin to the remote server and open the CENTA database
- using the "import" function I uploaded each of the files created in 5, one after the other. I had used bzip2 to compress the files and phpMyAdmin admitted that it knew how to uncompress them prior to inserting them - a saving in bandwidth and time. Each file took about 5 minutes to upload and process.
- Once the files were in the new system I simply put the redirect statement into the configure file for the old one and restarted the web server. There were no static pages in this case so I didn't have to move them or do specific redirects for them - but in most cases this would have been minimal if it was necessary.
- Again, I changed the DNS to use CNAME for the old site to follow the new one and will go through the various SEO systems at Google and other places to ensure I don't confuse the issue by watching the old site's stats, etc.
Done! - as I said, about 3 hours from start to finish which included stops for coffee and a couple of phone chats. Not a huge project but then I'd done my homework.
Tag: glfusion centa.com redirectpermanent redirect mysql backup csplit frontpage microsoft apache l.a.m.p



What's Related