Migrating from WordPress.com   1 comment

For a variety of technical reasons (more control over the domain, ability to run entirely custom PHP, potential future tech projects), I recently moved this blog from WordPress.com to its current home, a “self-hosted” WordPress installation on a Bluehost server. While the process was fairly straightforward and instructions are available, there are some “gotchas” with workarounds that bear explaining. This article gets fairly technical, so read on if you’re so inclined. I’d like to emphasize that I’m not a WordPress guru by any stretch, so I’m just describing what worked for me.

Splitting WordPress Items

As explained in the standard tutorials, the easiest way to transfer your posts, pages, etc. to a new WordPress installation is to use the inbuilt exporter. The problem I ran across is that the exported file was a bit too large. When I imported it into my new blog (selecting “Download & import file attachments”) the import process would timeout. At first, I attempted to “solve” this by repeatedly running the import process, as it seemed to resume where it had been cut off. However, I soon realized that there were odd errors involving the attachments (ie images) that were processing at the time each intermediate import aborted.

My solution was to split the exported .xml file and then upload it piecemeal. There are other sites that describe this process, but they mostly focus on the 2MB upload limit, not the php execution time limit that was frustrating my attachment download attempts. For that reason, I wrote my own quick Python script to break the file into manageable pieces (for me, 90 items per file).

The exported .xml file has three major sections. At the top is what I’ve come to think of as the preamble, the bit before the first <item> tag. It contains blog title information, categories, etc. At the very end of the file is the footer, a few closing tags. Between are the <item> blocks that describe each post, page, image, etc. in your blog. The key to splitting up the .xml file is that each sub-file contain the header, some items, and the footer. My script, below, requires Python 2.5 or later, and takes one argument, the path to the .xml file to be split. You can modify line 12 to change the number of items per output file.

import sys
import os

if __name__ == '__main__':
    with open(sys.argv[1]) as file:
        data = file.read()

    (header, blah, rest) = data.partition('<item>')
    (body, blah, footer) = rest.rpartition('</item>')
    items = body.split('</item>\n<item>')

    stride = 90

    (filename, ext) = os.path.splitext(sys.argv[1])

    for offset in range(0, len(items), stride):
        with open(filename + '-' + str(offset) + ext, 'w') as file:
            file.write(header)
            for item in items[offset:offset+stride]:
                file.write('<item>'+item+'</item>')
            file.write(footer)

Post Parents

In WordPress, posts (ie all objects) can have “parents”. This is particularly useful for image attachments, which are often “children” to an individual post. If that post contains a [gallery] shortcode, WordPress will render a lovely clickable array of thumbnail images. For whatever reason, although post parent data is in the .xml export file, WordPress didn’t seem to import it. The following Python script parses the .xml file and generates a file containing a list of SQL commands that, when run in the context of your WordPress site’s database (for instance, using phpMyAdmin which is built into the Bluehost cPanel), will reestablish the correct parental relationships between your posts, and hopefully fix any broken galleries. The first argument is the path to the .xml file; the second is the path to the output SQL file.

import sys
import os
import re

tablename = '<code>{{EJS10}}</code>'

if __name__ == '__main__':
    with open(sys.argv[1]) as file:
        data = file.read()

    (header, blah, rest) = data.partition('<item>')
    (body, blah, footer) = rest.rpartition('</item>')
    items = body.split('</item>\n<item>')
    
    with open(sys.argv[2], 'w') as file:

        for item in items:
            id = re.search('\<wp\:post_id\>([0-9]+)', item).group(1)
            parent = re.search('\<wp\:post_parent\>([0-9]+)', item).group(1)

            file.write('UPDATE ' + tablename + ' SET <code>{{EJS11}}</code>=' + parent +
                       ' WHERE <code>{{EJS12}}</code>=' + id + ';\n')

Image Sizes

Of course, you’re going to need to use some kind of tool (I recommend John Godley’s “Search Regex” plugin) to fix links and <img> tags throughout your posts. There are plenty of pages that describe how regex’s work, and my goal isn’t to duplicate that here. But one “gotcha” moving away from WordPress.com has to do with image size. WordPress.com has a nice image server which interprets URLs like: http://blogname.files.wordpress.com/2014/01/file.name?w=300. It seems the files.wordpress.com server efficiently serves up the file, resized to the width specified, saving your users from downloading a huge version of an image that will appear smaller on their screen. While WordPress will by default make a few different standard-size images from each upload, those sizes are not easily changed on a per-image basis, don’t allow modifying the size from within the post, and don’t have immediately obvious filenames.

To get around this, I created a quick PHP script which serves up a resized version of any image given the desired width. The script should be straightforward, but a few things bear note. First, I have this in a subdirectory of my webserver, so make sure to modify line 37 with the correct relative path to wp-load.php. Because it uses WordPress’s internal image resizing technique, it doesn’t need any additional libraries. Second, line 27 ensures that the resized images are cached by clients and proxies. This is to prevent the default behavior of interpreting the width parameter as a form submit and therefore not caching. Finally, notice that this script only resizes an image the first time it is requested at a given width — after that, the resized image is saved for future use. If you play around requesting lots of different sizes, it might be worth deleting the extras when you’re done.

<?php

function mc_generate_filename( $oldfname, $width ) {
	$suffix = 'mc'.$width.'w';

	$info = pathinfo( $oldfname );
	$dir  = $info['dirname'];
	$ext  = $info['extension'];
	$name = $info['filename'];

	return $dir . "/{$name}-{$suffix}.{$ext}";
}

function dump_file( $fname ) {

    if( !file_exists( $fname ) ) {
        header('Content-type: text/plain');
        echo 'Error - file does not exist: '.$fname."\n";
        die();
    }

    $finfo = finfo_open(FILEINFO_MIME_TYPE);
    $mime = finfo_file($finfo, $fname);
    finfo_close($finfo);

    header('Content-type: '.$mime);
    header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 2592000));
    readfile( $fname );
}

$w = $_GET['w'];
$origname = $_SERVER['DOCUMENT_ROOT'].explode('?',$_SERVER['REQUEST_URI'])[0];
$reszname = mc_generate_filename( $origname, $w );

if( !file_exists($reszname) )
{
    require_once('../wp-load.php');

    $img = wp_get_image_editor( $origname );
    if ( ! is_wp_error( $img ) ) {
        $old_size = $img->get_size();
        $resize = $img->resize( $w, null );
        if ($resize !== FALSE) {
            $new_size = $img->get_size();
	    $img->save($reszname);
        }
    }
}

dump_file( $reszname );
?>

In order for this script to function, you will want to modify your .htaccess file to redirect image requests including the w get parameter to it. Replace mcwp/mcimage.php on line 4 with the path to your copy of the PHP above.

. . .
RewriteRule ^index\.php$ - [L]
RewriteCond %{QUERY_STRING} ^w=[0-9]*$
RewriteRule (\.jpg|\.jpeg|\.gif|\.png)$ mcwp/mcimage.php [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
. . .

If, after implementing this, you don’t want all of the resized images that WordPress automatically generates bloating your server, you can remove them.

Redirect Detection

Once your new blog is set up, you’ll probably want to redirect visitors from your old blog. WordPress.com offers a Site Redirect feature which redirects requests for your old blog (including the target URI) to your new blog (or a URL of your choice). The redirection is accomplished using a (search engine friendly) HTTP 301 “Moved Permanently” response, which in theory means that intelligent search engines will update their records and viewers will be seamlessly redirected.

However, there’s not an easy way for your new site to determine whether a given viewer was redirected — and thereby tell them to update their bookmarks, etc. One option is to post an “update your bookmarks” to everyone for a while, but that might grate on the eyes. My solution was to create an intermediate redirection page, a simple PHP script which creates a cookie and then — using a second 301 redirect — passes the user to the actual blog. This also allows you to modify the URL between redirections if, for example, you are taking this opportunity to change your permalink structure.

First, create a new subdomain. I called mine redirect.oxfordechoes.com. On Bluehost, this is done through the cPanel’s aptly named “Subdomains” tool. To make life simpler, point the subdomain to your main public_html directory, not a new subdirectory (as is the default).

Second, you need the PHP script to create the cookie and generate the redirect. Notice that the cookie is valid on '.oxfordechoes.com', so the main blog can see and react to it. Lines 4-5 update the permalink structure (from date-based to title-only) and can be removed (and $redir_uri changed to $request_uri on line 9) if you’re not making any changes. Obviously, change the hostname on lines 7 and 9 as appropriate.

<?php

$request_uri = $_SERVER['REQUEST_URI'];
$redir_uri = preg_replace('!^/[0-9]{4}/[0-9]{2}/[0-9]{2}/([^/]+)/!',
                          '/$1/', $request_uri);

setcookie('MCREDIRECT', 'true', 0, '/', '.oxfordechoes.com');
header('HTTP/1.1 301 Moved Permanently');
header('Location: http://www.oxfordechoes.com'.$redir_uri);
?>

Third, you need to ensure requests to redirect.yourhostname.com arrive at that PHP script. Add an equivalent of lines 4-5 below to your .htaccess file:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^redirect.oxfordechoes.com$
RewriteRule !^redirect.php$ /redirect.php [L]
RewriteRule ^index\.php$ - [L]
RewriteCond %{QUERY_STRING} ^w=[0-9]*$
RewriteRule (\.jpg|\.jpeg|\.gif|\.png)$ mcwp/mcimage.php [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

Finally, you need to take advantage of the newly set cookie to display an appropriate message on your website. I used this somewhat hectic code within a PHP Code widget, which works well. You can visit this blog’s old address to see it in action. Line 2 ensures the message is only displayed if the cookie is set, while lines 3-4 fade the message away and un-set the cookie when the user acknowledges the message.

<div class="mcredirect-notification"
style="display: <?php echo $_COOKIE['MCREDIRECT'] ? 'block' : 'none'; ?>;"
onclick="jQuery(this).fadeOut();
document.cookie = 'MCREDIRECT=;path=/;domain=.oxfordechoes.com;expires=Thu, 01 Jan 1970 00:00:01 GMT;';">
    <table>
      <tr><td>Update Your Bookmarks!</td></tr>
      <tr><td>This blog has moved to:<br />
      <strong>www.oxfordechoes.com</strong><br />
      Please update your bookmarks.</td></tr>
      <tr><td>(Click to dismiss.)</td></tr>
    </table>
</div>

Posted 12 Feb 2014 by John McManigle in Technical

One response to Migrating from WordPress.com

Subscribe to comments with RSS.

  1. An outstanding share! I’ve just forwarded this onto a co-worker who was doing a little homework on this.
    And he actually ordered me breakfast simply because I found it for him…
    lol. So allow me to reword this…. Thank YOU for the meal!!
    But yeah, thanx for spending the time to talk about this issue here on your internet site.

Leave a Reply to weihnachten bilder Cancel reply

Your email address will not be published. Required fields are marked *