Add Character Sets to CFPOP With JCharset

This is a follow-up post to Fixing CFPOP.

If you’re using the CFPOP tag to handle any volume of email you will eventually come across character encoding issues. It’s only a matter of time! These limitations tend to be with the underlying JVM and not with ColdFusion itself (which is unusual). The good news is that the fix is quite easy.

Grab JCharset and place the extracted .jar file in one of the following spots in your CF install path:

CF8: /runtime/lib/
CF10: /jre/lib/ext/

Restart CF.

So… is it working?

In order to find out if you have JCharset in the right place, download the “CharsetTB” script from
http://www.sustainablegis.com/projects/i18n/charsetTB.cfm.

Once the CF service is restarted this file will list all of the character sets ColdFusion has access to. If you can find “UTF-7″ in this listing: Good news – you placed JCharset in the correct directory! If it isn’t in the list, try another directory where CF will pick up the JAR when restarted.

My ColdFusion 11 Wishlist

I’ve been trying to put together a “ColdFusion 11 Wishlist” post for quite a while. The time consuming part has been that it is difficult to think of things that I want to see added to CF. Over the years I’ve become quite accustomed to its shortcomings, pitfalls and long-standing bugs. I don’t pretend CF is something that it is not. There are a lot of things that it’s just not good at (there are also many things that it IS good at). However, tacked-on features like <cfmap> don’t appeal to me because I know how to implement a Google map – and I can do it better, faster and can keep it up to date.

I mention the <cfmap> tag (and the feared <cffacebook> tag that we hope to not see in the upcoming version of CF) because these additions don’t sell the product to developers – they sell it to managers. Developers realize that these incomplete features are ultimately glued on to a somewhat antiquated language that only recently got the ability to understand null. I think videos like this (produced by The Onion?) will demonstrate the point I’m trying to make.

Adobe has done well to publish a road map for its next versions of CF with some general bullet points regarding upcoming features. The good news? They are listening to the community to some extent because the install process and PDF features are to be improved. Huzzah! The bad news? The other upcoming features appear to be “add-ons” for developing mobile sites (apps?) and social media integration. No thanks…

I’m all for, as the CF team puts it, “embracing futuristic technologies” – but the lack of these integrated “futuristic technologies” is not what prevents me from creating great web applications in ColdFusion. What does prevent me from doing that are the long-standing bugs in very important functions like SerializeJSON(). These are significant hold ups that chew-up development time are not addressed quickly (or even acknowledged).

So, what’s ultimately in my ColdFusion 11 wishlist? Just two things:

  1. Address the backlog of bugs.
  2. Listen to CF developers. We care about the the future of the product – it is our livelihood, after all. We won’t steer CF in the wrong direction!

Spidering / Link Checking With wget

I use XENU for link checking sites and finding missing assets but I couldn’t figure out how to make sure that it was following the redirects it encountered. For example, if an inline image source is “/images/sitelogo.jpg” but that 301 redirects to “/images/sitelogo-new.jpg”, XENU will report the redirect (as an error if you prefer), but what I really want to know is whether the destination of that redirect was a 200 OK (or a 404, or something else unintended). It wasn’t clear to me if XENU was ensuring that the file existed after being redirected.

I tried out a few other free tools but none seemed even as good as XENU. It was then that I stumbled upon the “spider” option in wget. You can set it free on a URL like so:

wget --spider -l 2 -r -p -o wgetOutput.log http://somesite.net

This will spider the URL up to 2 levels deep and ensure that any inline assets on the pages within those levels are also downloaded. The “-p” option ensures that inline assets like images or css are downloaded from a page even when the maximum number of levels in the “-l” option is reached. The output is logged to wgetOutput.log

At the very end of wgetOutput.log you’ll find a list of broken links that looks something like this. You will also get a ton of other useful information about every request that it made – so you know exactly what it’s doing!

Spider mode enabled. Check if remote file exists.
--2013-08-06 20:10:40--  http://somesite.net/images/sitelogo-new.png
Reusing existing connection to somesite.net:80.
HTTP request sent, awaiting response... 200 OK
Length: 4153 (4.1K) [image/png]
Remote file exists but does not contain any link -- not retrieving.
 
Removing somesite.net/images/sitelogo-new.png.
unlink: No such file or directory

Other Useful Options

Specify a user agent:

-U "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"

Spider a site that forces you to log in:

  1. Get the Cookie Exporter Add-on for Firefox.
  2. Log into the site you want to spider.
  3. From Firefox, run Tools -> Export Cookies -> cookiesFile.txt
  4. Use the “–load-cookies” option:
    --load-cookies cookiesFile.txt

Complete Example:

wget --spider -l 2 -r -p -o wgetOutput.log -U "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" --load-cookies cookiesFile.txt http://somesite.net

Time Management – The Pomodoro Technique

This isn’t a programming post but it is something that is important to developers: time management.

It can be difficult to “get in the zone” and stay there for a length of time because of general distractions in the office – phone/email/IM/etc. Some things are out of your control but I have found a time management method that works quite well when I am tackling larger tasks and not putting out fires: the Pomodoro Technique.

Basically, the Pomodoro Technique is breaking up your work into 25 minute coding blocks that are separated by 5 minute breaks. After 4 of those 25 minute “pomodori” blocks of work you take a longer 15 minute break. I use the term “break” pretty loosely because I generally take that time to check email and IM and respond to anything that needs my attention.

This doesn’t work EVERY day – some days you are jumping around to many small tasks and can’t take advantage of it, or there’s a lot of email or IM activity that you need to be a part of. But, when you are working on tasks and can temporarily limit communication distractions I have found this regular cycle of uninterrupted work followed by dedicated time for email to be a surprisingly productive idea (considering how simple it is).

There are a variety of timers available for absolutely everything (you can give it a go right from your browser with a site like Tomato Timer). However, I’m rather taken by the desktop app Tomighty as I can set it in the system tray and forget about it until it notifies me that the pomodori (or break) is over.

tomighty tomighty-break

Another ColdFusion SerializeJSON Bug

Adobe recently released ColdFusion 10 Hotfix 11 and it fixed Bug #3338825! I am positively ecstatic because invalid JSON causes me a lot trouble. This hotfix even addresses two other JSON serialization issues and so it appears to be good news all around.

However, it did not address Bug #3337394 in which the string “No” is turned into a boolean false. (“Yes” also returns a boolean true for good measure). This bug is still considered “Unverified” although it was filed in September 2012 (Test case below)

A colleague came across another SerializeJSON() bug today that I thought I’d share because I seem to spend a lot of my workday Regexing the input to or output from this function to clean up what it incorrectly handles. [It can't handle the truth!]

It’s filed as Bug #3596207 and this is its test case showing a numeric string with a trailing period being returned as an integer with a trailing decimal point:

SerializeJSON({a: "1."});

Output (not valid JSON)

{"A":1.}

Expected Output

{"A":"1."}

Test Case Showing Both Issues

SerializeJSON({a: "1.", b: "no", c: "yes"});

Output (not valid JSON)

{"A":1.,"B":false,"C":true}

Intended Output

{"A":"1.","B":"No","C":"Yes"}

Adobe has made some good progress with the latest hotfix but there are some serious bugs left in functions that have been neglected for some time now. SerializeJSON(), for instance, debuted in CF8 almost six years ago and after that amount of time these very basic serialization test cases should not fail.

Fixing CFPOP

Some of my ColdFusion projects involve receiving A LOT of email via CFPOP. It’s perfect for the job about 95% of the time. However, that remaining 5% failure rate reveals glaring inadequacies that I have spent significant amounts of time trying to work around.

I generally use CFPOP in a try/catch and fall back to either of the following solutions for troublesome messages.

CFX_POP3
If you’re running CF on a 32bit Windows server… $40 makes all your CFPOP troubles go away in an instant. The CFX_POP3 custom tag is an absolute bargain because it works with rare character sets, poorly named attachments, special characters, etc. Not once have I seen it fail for attachment filename or character encoding issues.

Limitations
1. Only runs on 32bit Windows servers
2. Seems to choke on large attachments (10MB+)

POP CFC
The POP CFC project is run by the creator of the CFX_POP3 tag and it contains some handy functions that use underlying Java methods for getting mail from a server and parsing through it. These functions are great for processing messages with oddly-named file attachments that will break CFPOP.

Notes
1. Supports the same character sets as CFPOP.

If you’re using CFPOP or POP CFC, I highly recommend setting up JCharset to allow processing of some of the more “unique” character sets out there. I’ll go over that in more detail in a future post.

Corrupted Queries in ColdFusion 7

For some time now I’ve had an application running on Coldfusion 7 that will randomly throw exceptions because of poorly formed SQL in seemingly random queries. I could not explain the malformed SQL from looking at the queries – they were always fine. Also, the horribly mangled SQL I would see in the exception logs could not possibly have been generated by any conditional logic in the query. Here is an example:

<cffunction name="getUsers" access="public" output="false" returntype="query">
  <cfargument name="username" type="string" required="false" default="">
 
  <cfset var local = StructNew()>
 
  <cfquery name="local.qryUsers" datasource="dsn">
    SELECT usr.username
           ,usr.email
           ,usr.name
    FROM users usr
 
    WHERE 1 = 1
    <cfif Len(arguments.username)>
      AND usr.username = <cfqueryparam value="#arguments.username#" cfsqltype="cf_sql_varchar">
    </cfif>
 
    ORDER BY usr.username ASC
  </cfquery>
 
  <cfreturn local.qryUsers>
</cffunction>

The above function would run fine 99% of the time until the SQL generated would cause an exception. The SQL from the previous cfquery that caused the exception would end up resembling something like this:

ELECT usr.username
      ,usr.email
      ,usr.name
FROM users usr
1 = 1
ASC

Needless to say it has become completely and utterly mangled – and with no way to account for it! Where has half the query run off to?! It was likely only a matter of time until a completely malformed query executed and resulted in data loss or corruption. So… what was the solution to this craziness?

My initial hypothesis was that SQL statements sometimes band together and run away at the thought of being transported to the database server for execution. This may very well have been true! Adobe could not be reached for questioning on this topic but they did release a hotfix that erected a 12 foot high fence around the edges of cfquery in order to prevent any bits from falling out or otherwise escaping at inopportune times.

Handy!

The hotfix is available here.