DoctorMyhill:Migration/Nov 09 Resync with live site
- 1 Introduction
- 2 Overview
- 3 Site Snapshot
- 4 Service Cut-over
- 5 Page Name Mapping
- 6 New Page Import
- 7 Pages In Conflict
- 8 Other Activities
This page is the control page for activities that we need to do to prepare for the cut-over of the live site to the wiki. I will list the agreed task areas and status on this page and use the discussion pages for discussion and comment on this between myself (Terry aka WikiTerry), Hania and any other of Sarah's team that wish to comment. As task areas are completed the will be moved to:
- Other Notes for miscellaneous notes relating to the migration.
The test site http://www.docsarah.co.uk/ was primed by talking an snapshot import of the live site on 30th Jan this year. At the time I (Terry) wrote a bunch of conversion utilities to migrate the snapshot content into Wiki format. The aim here was to use this as a test site for Hania et al to familiarise themselves with the Wikimedia engine facilities and decide how to improve formats etc., with the ultimate intent of junking this test migration and doing a complete reconversion to a new production Wiki.
However, one of the major issues that we've had here is resourcing limitations, both in the case of Sarah's team having other priorities and in my case my other commitments and fighting CFS itself. So what we have actually actually done it is run both sites in parallel with important changes being made to the publicly advertised site as well as clean-up and style changes to this Wiki site. Merging these back into clean Wiki with a fresh import of the live site will be somewhat problematic, so Hania and I have decided to:
- Do a merge of content rather than a junk and replace
- Use the existing service provider to prove the LAMP shared service for the new live site. Whilst I can always migrate it to a Virtual Private Server if the site usage requires this in the future, I would prefer to avoid this if possible because running a private server, even if it is a virtual one, still represents an increased future maintenance cost for Sarah and her team.
The remainder of this article is divided into the major task areas for this migration. Note that the bulk of this work is now completed and therefore the content has been moved into the Completed Tasks page.
I used my existing Perl scripts to carry out the first conversion pull on 29 Jan 2009 and the current export on 17 Nov 2009.
However, we do need to ask Bob to take a final safety snapshot of the legacy site database following cut-over and before decommission.
The current (ColdFusion Markup Language-based) website was developed by and is operated by a webhosting business goHolidays which is based in Builth Wells, Powys near Dr. Myhill's practice. GoHolidays specialises in maintaining and operating Holiday Accommodation websites. The current website is hosted on a server www.goholidays.net (18.104.22.168) managed by midwales dot com which in turn houses its servers in a collocation service provided by RapidSwitch from its Maidenhead data centre. The current doctormyhill.co.uk mail service is also provided by Midwales dot com through another RapidSwitch located server, with the Mail eXchange record currently pointing to 22.214.171.124.
Our intent is to replace the current (IIS-based) site, with a migrated Wikimedia base one hosted on a Midwales dot com webhosting service. Retaining service provision with an existing service provider will enable use to simplify live cut-over and minimise any service continuity risks.
- The management of the DNS record will be retained by Midwales dot com.
- Midwales dot com will provision the new shared service.
- I (Terry) will provision the application stack on this new shared service and resync the live content from existing website.
- Midwales dot com will then (at our request) update the DNS A record to point to the new wiki instance, and (as this change propagates over the next 24hrs) visitors to www.DoctorMyhill.co.uk and www.DrMyhill.co.uk site will then connect to the new site.
- Once cutover has been completed we can then notify goHolidays to close down the legacy site.
- This current preproduction site can either be left for this purpose or the service terminated at its renewal date and the docsarah.co.uk allowed to lapse.
The mail service will be maintained as currently with midwales dot com.
Page Name Mapping
The titles on the current website (a) do not use a consistent style or capitalisation convention; and (b) often append modification dates, etc.. We will be adopting a standard entitling convention for the new site so the migration process will map titles on migration. This whole area needs further work because of the lack of consistency in titles. I have divide this into two phases: (a) those problems which we will fix now so that the Wiki at go live will reflect such changes, which are now completed; and (b) those problems that we will post in a post go-live cleanup.
To be fixed as part of Migration
To be deferred for later clean-up
- Articles use an inconsistent mix of title case and normal case conventions (for example "Allergy to Foods, Inhalants & Chemicals - signs and symptoms of" vs. "Allergy and addiction"). Sarah and Hania need to decide what the correct thing to do is here in terms of style.
- Some articles had just a minimum descriptive title (e.g. "Acidity and ulcer disease") and others include a byline (e.g. "Anaemia - not enough blood - symptoms and diagnosis of"). Again the approach here is inconsistent and another policy decision for Sarah and Hania. A supplement to this how we refer to bylined article titles in other pages: do we just use the full title (e.g. "see Anaemia - not enough blood - symptoms and diagnosis of for further details" or do we drop the byline: "see Anaemia for further details". I have now added redirections for most of these, but some intra-wiki article references still use the long form. Renaming articles and replacing links to them is an extremely tedious process. I can create an automated process which uses as its input a simple mapping spreadsheet with two columns for "current title" and "new title", and automatically implements the wiki changes to apply this.
New Page Import
Thanks to the wonders of some nasty Perl Scripting:
Note that roughly half the imported pages had gremlins in there formatting, not so much to the conversion process, but limitations in the original HTML. The easiest fix was a quick eyeball review of the batches of pages after upload. I ended up spending perhaps 2-4 mins on 50% of the pages to clean up the worst formatting errors. I didn't do this with the pages that weren't updated.
Pages In Conflict
New Test Template
I want to introduce a standard Tranclusion form for all test so that we can standardise the proforma boilerplate on every test. I had to get my head around how wiki templated work with tables, but I've done this now. OK, not perfect, but at least the only changes that we need to make here are to the single template.
- Second iteration completed. Awaiting Hania's feedback
Legacy Site URL Mapping
If you google site:drmyhill.co.uk then you will see that some 500 page hits to the legacy site are maintained in Google. Likewise other sites reference site articles. We don't want these links to break so we will need to map the index.cfm and test.cfm URIs to the appropriate Wiki articles.
Again, this is a standard set of inputs that must be filled in at least once for any patient and where necessary updated following any material change before ordering further tests / supplements, etc.
- Second iteration completed. Awaiting Hania's feedback
Access Control to Wiki
I will implement the following rules:
- There are four user groups: Sysops (currently Terry and Hania), Bureaucrats (Sysops + any editors designated by Hania), Registered Users and Guests
- Only Sysops can create accounts for Registered Users and elevate Registered Users to Bureaucrat or Sysop. This can be initiated by a user emailing Hania with a request, but approval is up to Hania / Sarah.
- Guests have no edit rights, nor can they or Bots see the Project pages.
- Registered users can only edit Talk pages and User pages, once they have confirmed their email addresses.
- Only Bureaucrats can edit all pages.
- Almost Completed -- I haven't quite got this right as guests can still see Project pages, but the edit restrictions seem to work OK. We'll just need to monitor this through Recent Changes for now.