DoctorMyhill:Migration/Nov 09 Resync with live site/Completed Tasks
I used my existing Perl scripts to carry out the first conversion pull on 29 Jan 2009 and the current export on 17 Nov 2009. This is completed. Hania and the team will keep any further changes to the live site during this next week as we do the migration to a minimum and we will handle the resynchronisation of this small number of pages as a separate sweep up task.
This is based on integrating two sets of data:
- Outputs from ad hoc Perl scripts which perform contextual delta's of the Jan and Nov snapshots of the production site to determine what has changed on this site.
- Outputs from the Special wiki queries and by running SQL queries direst against the live www.docsarah.co.uk wiki database to do batch analysis to determine what has changed and where.
I am maintaining a summary spreadsheet of this analysis, and we can broadly sentence the pages into four categories:
- The page on the live site is new or updated and but the wiki hasn't been changed. In this case I just import the article/test from the snapshot that I've taken overwriting any existing wiki page. Here I have identified 52 updated articles (though 29 of these have had their titles tweaked), 17 new articles and 19 deleted articles. The new and updated article need to be loaded into the wiki using the import function and the deleted articles removed from the same.
- The page on the old site is hasn't been changed. This could be regarded as two categories as the wiki may or may not have changed. However, this doesn't really matter since in both these cases the wiki is fine as it is, so no change is required. There are 241 articles in this category.
- Both have changed. In which case we need to do an intelligent comparison of the changes. In some cases, the changes on one will be trivial enough to sentence the page into one of the two preceding bullets, but there will almost certainly be some that require a manual merge of changes. It's this last group that will be the pain. See Pages in Conflict.
In terms of test pages, there are 2 deleted test and 8 new tests and. All other tests (56 in all) have changed — mostly as a result of adding an extra charging caveat for the issuing of a GPs letter.
- Completed — 04 Dec 09
Page Name Mapping
The titles on the current website (a) do not use a consistent style or capitalisation convention; and (b) often append modification dates, etc.. We will be adopting a standard entitling convention for the new site so the migration process will map titles on migration.
This whole area needs further work because of the lack of consistency in titles. However, I want to divide this into two phases: those problems which we will fix now so that the Wiki at go live will reflect such changes; and those problems that we will post in a post go-live cleanup
To be fixed as part of Migration
- Some (random) titles use CAPITAL CASE as a means of emphasis (for example "FIBROMYALGIA - Possible Causes and Implications for Treatment"). All capitalised words will be replace by title case (e.g. "Fibromyalgia" in this case) or lower case as appropriate. Capital case will be retained for accepted abbreviations such as CFS and DHEA.
- Some articles have date stamps embedded in their titles (e.g. "updated April 2008"). All such dates will be removed.
- Some test have the Lab embedded in their titles (e.g. "ACUMEN"). All such qualifications will be removed. New test categories will be introduced (e.g. "Acumen Test") so that tests are correctly label and can be indexed by Lab.
- All titles will be space-trimmed.
- Articles embed other page references in their content. These will be converted to standard Wiki [[reference]]format
Subjects and Categories
Articles in the old site were labelled with subjects and categories. These two label set map onto Wiki categories with the same name. In the period since the last import, the following categories have been added:
- Lifestyle for Health and Fitness
- Your very good health
The following pages where deleted production site and this has been reflected in the Wiki:
- Allergy and Elimination Dieting - when the diet fails
- Angina - symptoms and causes of
- Arteriosclerosis - the symptoms of - updated November 2008
- Autoimmune disorders - diagnosis
- Autoimmunity - an introduction
- DHEA - an important adrenal hormone
- Dybiosis (bacterial) - gut sterilisation routine
- Dysbiosis - the herbal and drug treatments to do with the diet
- Dysbiosis - think of this if elimination dieting fails
- Hypoglycaemia - low blood sugar - a major problem for many people - updated September 2007
- HYPOGLYCAEMIA - Not just about diet! - December 2008
- Infections - when medical treatment is required
- Lyme Disease (borreliosis) and CFS - the practical aspects
- Practitioner list - how to find a doctor who will apply the environmental approach to your problem
- Prescribing cortisone if the ASP is abnormal - how to do this safely
- StoneAge Diet - more reasons why we ALL should eat it
- Stoneage Diet - principles of - updated November 2007
- The Methylation Cycle
- Timetable of supplements for oral chelation of heavy metals
This was tedious!!
In resolving changed names I also came across and deleted a duplicated article with a slight variant name:
- Viral Infections - how to avoid and treat them
Subsequent to this, I've extended my Perl scripts to use the LWP interface plus HTML form parsing to automate:
- Page Deletion
- Page Renaming
- Page Updating
and semi-automated move of these reconciliation / moving / deletion tasks by doing most of the analysis on spreadsheet, that pasting the relevant columns into a Perl script __DATA__ block to action. I then used this also to delete all dead tests.
Pages In Conflict
We have identified fifteen articles in this category. Based on Hania's initial analysis these can be categorised as follows:
- Use the current website version. I will basically treat this case as standard imports.
- Anxiety - diagnosis and treatment (ID 173)
- Autoimmune diseases - the environmental approach to treating (ID 219)
- Blood clotting problems (ID 168)
- Cancer - the principles of prevention and treatment (ID 380)
- CFS - The Central Cause: Mitochondrial Failure (ID 381)
- Detoxing - Far Infra-Red Sauna (FIRS) (ID 368)
- Stone Age Diet - this is a diet which we all should follow (ID 412 -- Note 257 is an obsolete duplicate)
- Use the wiki version. I will basically treat this case as no update needed.
- Cow's milk allergy - a common cause of problems in children and adults
- Dizzy spells - a common complaint with many possible causes (Already updated to be in Sync by Hania)
- EPD - the practical details of what to do for each dose
- Natural Family Planning
- Osteoporosis - Practical Nutritional Considerations
- MEDICAL QUESTIONNAIRE (I've updated this manually)
- Mitochondrial Function Profile test - practical information for non-UK residents (I've updated this manually)
- Enzyme Potentiated Desensitisation (EPD) - how it works had the "(EPD)" added for consistency to the rest of the site.
Where existing pages have stale titles, I renamed them to the new title before doing any upload of the updated pages, and this way article histories work correctly.
All pages renamed as per spreadsheet analysis. I'be got a copy of the spreadsheet, but I can't be bothered to re-summarise it here.
All articles and tsests in the legacy production site have now been upload into the new wiki. I ended up using semi-automated scripts as above, but doing these uploads in batched of 10 and using the Recent Changes view to index into and review the uploaded pages. I then tweaked the wiki formatting in perhaps 50% of cases to clean up bad layout carried over the the old HTML. Use Article Histories to review where necessary.
- As of ~ 17:00 on Thurs 03 Dec, the new wiki is now aligned (content-wise) to the legacy website as at my extract on 17th Nov.
Migration to New Website
I dumped the database and wiki directory and copied them to the new arwystli.net server, then unpacked the tarball into the wiki directory and loaded up the database. I then had to tweek the .htaccess file, site-specific logon credentials before the website was live at ~ 21:00.