Remote Synthesis
Search my blog:
Viewing By Entry / Main
Jul 08, 2007

Fixing Case-related Issues When Moving Off Windows

I was recently tasked with moving a rather extensive site off of a legacy Windows server onto one of our standard Unix servers (though the same steps below should work for Linix as well). The prior person tasked with this was having to go in manually and correct the casing issues on each file including the case of the file itself, the case of any links within the file and the case of links within content entries within the custom CMS it used. This sounds like a tedious job indeed, and when I inherited it, my first thought was how I could possibly avoid doing this.

The answered turns out to be a combination of a recursive function to rename all the files on the server to lowercase and some help from Apache's mod_rewrite plugin. This would allow us to leave all the links in the documents and in the CMS alone (as well as not worry about anyone's old favorites) and everything should just work. We moved the site several days ago and are in the process of testing, but so far everything is looking good and it was only an hour or so of work, here's how it was done.Using mod-rewite
My first thought was that I could use Apache's mod_rewrite plugin to re-case all the URLs before they ever hit the file system (make sure the plugin is enabled first). There was quite a bit of help on this topic, and my code is a modified version from someone's post somewhere (sorry, I lost the post). Basically, it creates a rewrite map that uses a built-in function called "tolower" and then uses a regular expression to match the entire URL and convert it to lowercase.

<VirtualHost *:80> DocumentRoot "C:/Inetpub/wwwroot/xbox" ServerName xboxsampleapp RewriteEngine on RewriteMap upper2lower int:tolower RewriteRule ^/(.*)$ /${upper2lower:$1} </VirtualHost>

If you were to test your site now and look in the apache logs, you should see that it is looking for lowercased versions of your folders and files.

Changing the File System Files and Directories
The next step was to make all the files lowercase. This turned out to be a simple recursive CF function that traverses and renames every file and directory on the server. I used the Java file object for speed purposes (we had hundreds of files and directories at least and this ran in under 30 seconds). I also initially made the mistake of not building in a special case for Application.cfm which needs the capital A on Unix/Linux. Lastly, I was actually running this on a Windows machine (yes, I know...but it is my work dev machine) which wouldn't allow you to rename simply by changing the case since it saw it as the same file.

<cffunction name="lowerAllFiles" access="public" output="true" returntype="void"> <cfargument name="baseDirectory" type="string" required="true" /> <cfset var objFile = createObject("java","java.io.File").Init(arguments.baseDirectory) /> <cfset var allFiles = objFile.listFiles() /> <cfset var i = 0 /> <cfset var newFileWithPre = "" /> <cfset var newFile = false /> <cfset var objSystem = createObject("java","java.lang.System") /> <cfset var separator = objSystem.getProperty("file.separator") /> <!--- trap for null returned by allFiles ---> <cfif not isDefined("allFiles")> <cfset allFiles = arrayNew(1) /> </cfif> <cfloop from="1" to="#arrayLen(allFiles)#" index="i"> <cfif allFiles[i].isDirectory()> <cfset lowerAllFiles(allFiles[i].getAbsolutePath()) /> <!--- the file is renamed twice because on Windows the destination is viewed as already existing ---> <cfdirectory action="rename" directory="#allFiles[i].getAbsolutePath()#" newdirectory="pre_#lcase(allFiles[i].getName())#" /> <cfset newFileWithPre = allFiles[i].getParent() & separator & "pre_" & lcase(allFiles[i].getName()) /> <cfset newFile = allFiles[i].getParent() & separator & lcase(allFiles[i].getName()) /> <cfdirectory action="rename" directory="#newFileWithPre#" newdirectory="#newFile#" /> <cfelseif lcase(allFiles[i].getName()) eq "application.cfm"> <!--- the file is renamed twice because on Windows the destination is viewed as already existing ---> <cffile action="rename" source="#allFiles[i].getAbsolutePath()#" destination="pre_Application.cfm" /> <cfset newFileWithPre = allFiles[i].getParent() & separator & "pre_Application.cfm" /> <cfset newFile = allFiles[i].getParent() & separator & "Application.cfm" /> <cffile action="rename" source="#newFileWithPre#" destination="#newFile#" /> <cfelse> <!--- the file is renamed twice because on Windows the destination is viewed as already existing ---> <cffile action="rename" source="#allFiles[i].getAbsolutePath()#" destination="pre_#lcase(allFiles[i].getName())#" /> <cfset newFileWithPre = allFiles[i].getParent() & separator & "pre_" & lcase(allFiles[i].getName()) /> <cfset newFile = allFiles[i].getParent() & separator & lcase(allFiles[i].getName()) /> <cffile action="rename" source="#newFileWithPre#" destination="#newFile#" /> </cfif> </cfloop> </cffunction> <P>
<cfsetting requesttimeout="100000"> <cfset lowerAllFiles(expandPath("/DirectoryRewrite")) />

Conclusion
As you can see, I set the RequestTimeout really high in anticipation that this would take a good deal of time to run, but, to my surprise, it actually ran quite fast (on a Winblows laptop no less). Anyway, I will post updates if we run into any issues I didn't cover here, but I went through a long list of random pages and everything seemed to be working nicely. Hopefully this will help someone else when making this transition.

Comments
Scott P
You can also turn CheckSpelling on for apache.

What mod_speling actually does is it attempts to locate files that were spelled improperly, and in doing so, checks for upper- and lower-case equivalents of files.


Also seems relevant - I put a post up about parsing through code and changing cfincludes to lowercase:

http://www.scottpinkston.org/blog/index.cfm/2007/4/3/Converting-cfincludes-to-lowercase-filenames-using-CF


Sean Corfield
If only those people had followed a coding standard :)

http://livedocs.adobe.com/wtg/public/coding_standards/style.html

&quot;In general, follow our existing file naming conventions for files: all URL-accessible filenames shall be lowercase, with words optionally separated by underscores (determined by readability). Filenames must never contain spaces! Files whose names are not URL-accessible should generally be lowercase for consistency but we allow more leeway in this situation.

Note: Application.cfc, Application.cfm and OnRequestEnd.cfm are the only exceptions to the lowercase filename rule for URL-accessible files and must have exactly the case shown! The Mach II framework files are mixed case (and are not URL-accessible) - when referencing those files (as type names), you must use the same exact case as the filename.&quot;
[soapbox]
This is one of the biggest beefs I have with Windows developers: they're generally very lax about referring to components and files using the correct case. My other big beef is when they have lots of code that assumes all the world uses &quot;\&quot; as a path separator! Instead, they should use &quot;/&quot; because it works on all platforms (even Windows!) and it's trivial to make filepaths canonical with a quick replace() call.
[/soapbox]

Note that CF automatically looks for lowercase filenames (as well as the exact case match) so converting your CFCs and include files to lowercase should get most of your code working (which I think is what you're implying here). The remaining problem is case sensitive URLs - and your solution with tolower is really elegant! Thank you for sharing that!

@Scott, do you have a sense of how much overhead mod_spelling adds to requests?


Brian Rinaldi
@Scott - thanks for the comment, I had not heard of mod_spelling, I will look into it.

@Sean - well, I failed to mention why this particular site was on a legacy windows box in the first place, and it was because it is still running on CF5 (though on Unix it will run on 7). I don't remember offhand what platforms CF ran on then, but I do remember that it was predominantly a Windows solution at the time - so I forgive them for not considering the cross-platform issues (and yes, the site desperately needs a rewrite). However, I would agree that generally your criticism stands - I know I was lazy about this when I was a Windows-only developer.

You do bring up another good point, and that is that the function should probably be written to accommodate OnRequestEnd.cfm and Application.cfc. This wasn't really an applicable issue in this case, so I didn't do it. However, it is obviously an easy change to the function.


Scott P
@Sean - not really but I'm going to check it. I'm guessing from your comment that it is a serious impact.


Sean Corfield
@Scott, I don't know how much overhead it is but I *suspect* it is non-trivial since Apache would have to perform a directory listing and perform case insensitive comparisons against each file. From the mod_speling documentation page:

&quot;It does its work by comparing each document name in the requested directory against the requested document name without regard to case, and allowing up to one misspelling (character insertion / omission / transposition or wrong character). A list is built with all document names which were matched using this strategy.&quot;

Note that if more than one document with a close match was found, then the list of the matches is returned to the client, and the client can select the correct candidate (so the client gets a partial directory listing!).

According to the docs, you can limit it to CheckCaseOnly but it's not clear how much that reduces the overhead. Under CheckSpelling, it says this:

&quot;The directory scan which is necessary for the spelling correction will have an impact on the server's performance when many spelling corrections have to be performed at the same time.&quot;


Write your comment



(it will not be displayed)