Wednesday, December 20, 2006

Outsource Spam Filtering to Google

A friend of mine was bending my ear the other day about the amount of spam he has to deal with. Said he, "I have three mail accounts, two of which are widely available on the Web and I am averaging more than 800 pieces of spam a day. My ISV's spam filter is OK, but I still have to check it periodically for any false positives."

He added, "I am thinking of subscribing to one of those spam services like IBM offers where I can route my mail to them. They clear out the spam, and leave the rest in my inbox."

Why pay for what you can get for free? I suggested he create an account at Google Mail, which has excellent spam detection capabilities and has never had a false negative in my experience. Then route your accounts to your Gmail account, and set up your mail agent to poll the one Gmail account. Given the 2.8GB of space Google gives you, even huge amounts of spam are not going to overflow your mailbox.

Sure enough, my friend tried it and he says in an email full of gratitude, he is free from spam.

Additional note: This solution could be done with other 'free mail' services, such as Yahoo, but I recommend doing it with Google, because:

  • I find Google's spam filtering is superior;
  • because Google resisted subpoenas to reveal user data. This last point makes me feel comfy routing my mail to them;
  • and because of Google's superior webmail interface, which facilitates checking mail from the road.
Give it a try.

Thursday, December 14, 2006

Porting C/C++ Development from Visual Studio to Eclipse: Does that make any sense?

This link is a long step-by-step tutorial on IBM Developer Works re porting C/C++ projects from Visual Studio .NET to Eclipse. You might wonder, as I did, why anyone who was using Visual Studio .NET would be tempted to port their development to a Java hosted environment to compile .NET code. I believe there is no good reason.

The article actually confirms my view , pointing out on several occasions that the Eclipse platform is indeed the wrong place for Windows development: "Neither Eclipse nor GDB understand the debugging information generated by Microsoft compilers. As a result, it is a challenge to select CDT as a full-time development environment for Windows development. However, you can use Debugging Tools for Windows for debugging side by side with Eclipse as a development environment." Ugh!

Eclipse CDT, as the article also points out, knows nothing about resources.

In other words, you'd be unlikely to ever migrate from VS.NET to Eclipse.

The article misses the opportunity to explain what Windows coding you would use the CDT for: porting code from other platforms to Windows when you don't have access to Visual Studio. That makes sense.

Friday, December 08, 2006

The new features of Java 7

With Java 6 set to be released, here is a peek into Java 7. It's a PDF of a slide deck by Danny Coward, the Java SE platform lead.

Among the interesting features are: BigDecimal overloads of arithmetic operands, strings in case/switch statements, better XML and Xpath support, and the invokedynamic instruction, which facilitates the use of dynamic languages on the JVM. (Note: in my September 1 post, I mistakenly described this instruction as a feature of Java 6. It is actually in Java 7, as a commentator thoughtfully pointed out in response to that post, and revisited by this presentation.)

Thursday, December 07, 2006

How much of a nerd are you, really?

Some of my fans might be distressed to see that I'm a mere 59 on a scale of 100. Jeez, I barely missed a passing grade. So, what's your rating?

I am nerdier than 59% of all people. Are you nerdier? Click here to find out!

Tuesday, December 05, 2006

Convert file formats on-line

This site makes it simple to convert file formats if you don't have the converter software at your easy reach--provided you're willing to send your file to a service.

Thursday, November 16, 2006

Live Data Visualization at its Best

This live visualization enables you to see the daily flights of FedEx planes throughout the US. Very cool and, I believe, increasingly the way data will be presented the future now that broadband is so widely available in the first world.

Thursday, November 09, 2006

Mozilla Foundation Finds its Mojo

Holy Toledo! The Mozilla guys are on fire! First, they released a good upgrade to the Firefox browser (despite being unable to recognize it as an upgrade from within the browser itself). Then, in the course of two days, Mozilla becomes the recipient of two major product codebases: Qualcomm's beloved Eudora mail agent and Adobe's ECMAscript execution engine. This latter project, called Tamarin, consists of the virtual machine that runs Flash Player 9.

Cool stuff! Mozilla used to be the boneyard for failed Netscape projects, but now it is suddenly at the heart of the open-source world--a pinnacle until recently occupied by the Apache Software Foundation.

As for my beloved Eudora, I need to see what happens with the project. I have been a paying customer for years. If Mozilla keeps it going, I'll stick with it. If not, I'll probably switch to Thunderbird.

Friday, November 03, 2006

Isn't Firefox 2 an Upgrade?

So, isn't Firefox 2.0 an upgrade to 1.x releases? Apparently not, according to this notice when I tried to get Firefox to download the new version.

Monday, October 30, 2006

Ralph Griswold dies

Ralph Griswold died earlier this month of cancer. Griswold was an important, but unsung , pioneer of programming language design. He was the inventor of SNOBOL and Icon (now Unicon), two languages distinguished by unsurpassed string processing capabilities. And I do mean unsurpassed. No language since has had such extensive native support for string operations.

The other outstanding characteristics of SNOBOL and Icon were that they were very high-level languages. This color picker was written in 31 lines of easily readable code. It might be a simple tool, but I doubt many languages could come close to that level of brevity for this same functionality (much less maintain such readability). Languages today could use a bit more of Griswold's vision, I believe.

Useful links:

Icon (which is still actively supported.)

Unicon (Icon with objects, advanced network and file functions, plus ODBC)

Griswold obituary

Thursday, October 05, 2006

Solving the problem of embedded comments

One problem language designers face is handling embedded comments. The basic problem is that if you comment out a large block, your intent might be thwarted by comments within the block. For example in C, the /* symbol marks the beginning of a block and */ the end. To comment out a block of code it's not sufficient to place these markers at the beginning or end of the block, because if a */ occurs inside the block, it will close the comment. So, most languages forbid the use of embedded comments. (As a result, various work-arounds are used. In C, for example, the #if 0 / #endif combo is used.)

Lua, which is a nifty dynamic language, is the first language I've seen to come up with a solution to this problem. Here goes:

In Lua, block comments start with: --[[ and end with --]] If you want to comment out a chunk that might contain block comments, you can add one or more = signs between the brackets: Open with --[=[ and close with ]=]. You can use any number of = signs between the brackets and Lua will comment out everything until it finds a pair of closing brackets with a matching number of equal signs. So, --[==[ and ]==] or --[===[ and ]===] are valid comment block markers. You can use as many equal signs as you want, as long as beginning and ending counts match. So, you can always block out your code regardless of how many embedded comments it might contain.

I think that's pretty clever. Anyone know of another language with a similar mechanism?

Sunday, October 01, 2006

Interesting Uses Of Virtualization

Last week, I presented two sessions at InfoWorld's Virtualization Executive Forum 2006. One was a panel on best practices for virtual test-lab automation, with representatives from VMware/Akimbi and Surgient.

The other was a solo presentation on interesting uses of virtualization--that is, compelling use cases outside of server consolidation. While hardware consolidation is still the leading driver for adoption of virtualization by IT, it will be overtaken by a slew of very interesing applications that are as yet little known. To see what I mean, have a look at the slides from my presentation. I have kept the deck short and to the point.

Wednesday, September 20, 2006

Amdahl's Law Revised for Hyper-Threading

In the book that Rich Gerber and I wrote on Intel's Hyper-Threading Technology, we present a revised version of Amdahl's Law that reflects the constraints of Hyper-Threading Technology. There was an error in that equation that was recently reported by Gang Chen of Shanghai Jiao Tong University. The corrected formula is:

Speedup = 1 / (S + (1-S)/(0.67n) + Hn)

where S = sequential time, n = number of logical processors, H = overhead.

Clay Breshears of Intel was kind enough to double-check the revision. Ignoring overhead, if a program is 99% parallelized on a Hyper-Threading chip with two logical processors, the speedup is:

Speedup = 1 / (.01 + .99 / (.67*2)) = 1 / 0.749 = 1.34

Intel has long suggested that a 30% overall speedup was the most improvement that could realistically be achieved. The 34% improvement shown here is more theoretical than real because no overhead is included and the code is 99% parallelized. So, Intel's projections are consistent with this formula.

Thanks to Clay and to Gang Chen.

Saturday, September 09, 2006

A Really Useful Ruby Book

In my latest column in SD Times, I try to point out that not everything is super-cool about Ruby. One of the things I point to is the lack of good books. Sure the pick-axe book is a good place to start, but you quickly encounter the need for a practical reference that illustrates the idioms for doing standard things.

O'Reilly and Associates has just released such a book, Ruby Cookbook, (ISBN 0-596-52369-6), which contains more than 800 pages of helpful routines accompanied by thoughtful explanations. It shows how things are normally coded in Ruby. It's an impressive work especially because of its tremendous sweep: whether you need string handling, file I/O, database programming, network programming, multitasking, or accessing BitTorrent, it's all covered in this book. At a $33 street price, the book is a steal, and most Ruby developers are likely to keep a copy beside their workstation.

Other Ruby books are on the way. Most specialize on Rails, but some focus on the language as such. (Surely, there will be the bumper crop of "Ruby in 7 Days" kind of titles and other popularizing volumes, as well). So, my complaint about the lack of good books might be resolved by year end. We'll see. Meanwhile, this volume is a huge step forward.

Saturday, September 02, 2006

Joel Spolsky on Writing Your Own Language

Here is a neat post about why Joel's outfit wrote their own language, as well as the benefits and drawbacks of this approach.

Friday, September 01, 2006

The new bytecode in Java 6 explained

For a long time, I've been looking for a cogent explanation of the new bytecode, called invokedynamic, that is being introduced in Java 6 (Mustang). This bytecode facilitates calls to scripting language methods in a context where data items are not strongly typed. Beyond that brief description, though, it's hard to find much info. Here, however, is a link that gives a good overview of the role of this bytecode and where things stand now regarding its implementation.

Tuesday, August 22, 2006

Review of Enerjy CQ2 (extra info)

In this week's edition of InfoWorld, I review Enerjy CQ2. This is a development dashboard that integrates with existing SCM and defect tracking tools to quantify the status of a project and the productivity of developers and developer teams. I give the product a big thumbs-up, especially for Java projects, where it adds a powerful defect checker.

Unfortunately, the online version of the review does not include a screenshot, which is fairly important for seeing what the product is like. So, here is the starting page, from which you can drill down into CQ2's other dashboard windows. As you can see, it's a clean, uncluttered interface.

Monday, August 21, 2006

More on Writing Your Own Language

Martin Fowler has a great series of articles on writing your own language. The advice is primarily oriented towards little languages--or what are now called domain-specific languages (DSLs). However, much of the information applies to more ambitious language projects as well.

Sunday, August 20, 2006

Writing your own language -- How to choose a VM

Let's say you want to write your own language, possibly even your own domain-specific language (DSL) and you want to run it on a VM. Which VM do you target?

You might instinctively think of the JVM (especially because there are so many tools to help you target the JVM) or the .NET CLR. These choices have the benefits of being highly optimized platforms on which numerous, substantial libraries already run. However in many cases, they will not be the best choice. For several reasons: they're big and difficult for users who are not technical to install (and you can't bundle them with your app); they can't be embedded into other applications; they might not support features you need.

There are many interesting alternatives that are small, reasonably fast, and have active communities supporting them.

There are, of course, the Perl and Python VMs. As you'll quickly see upon careful examination, however, both of those VMs are intimately tied to the languages they run. (No suprise there.) In addition, neither VM was really designed for targeting by other apps, so info on developing languages for them is not widely/conveniently available.

Other VMs do provide extensive support for new language developers, because they know this is the only way for them to build community. Here are three among the many you could choose from:

Lua, which consistently is the fastest performing VM outside of the JVM. It's also small, open source, widely used, and easy to embed. It's also very well documented and supported by an active community.

Neko VM, small, easy to embed, very actively supported by its developer. Particularly amenable to embedding in C applications.

Pawn, designed primarily for embedding. Its own default language is a lot like C.

These VMs are all open source. Other VM candidates are listed here, although this list is far from complete.

Most of these VMs encourage your compiler to output not bytecodes but source code using their native language. In most cases, this is the right approach because 1) the VMs can optimize your code better than you can write in their bytecode format, and 2) it saves you a whole lot of aggravation by not having to learn the bytecode and the minutia of VM internals. (Some knowldege of VM interals will, of course, be necessary.)

That being said, you do you decide which VM fits your needs best? Some principal considerations include (more in the link below):

  • Does the VM have native support for the data items you need?
  • Does the VM support the language features you need (garbage collection, multithreading, tail recursion, etc.)?
  • Does the VM support the performance features you need (optimization, JIT compiler, etc.)?
Once you have found a virtual machine that suits your needs, you have only to check out the community behind it, to make sure you're not nearly or completely on your own.

Note: Lambda the Ultimate is a terrific site for aficionados of programming language development. Here is a link to a post on this topic that might shed further light. If this area interest you, definitely tag/bookmark the Lambda site.

Final note: JetBrains, the makers of the popular IntelliJ IDE, have a mechanism that can be modified to provide an IDE extension for your language.

Saturday, August 19, 2006

Dijkstra's Three Rules for Project Selection

Edsger Dijkstra promulgated these rules in the context of scientific research, but I think they work well for selection of programming projects or, really, most intellectual endeavors that might break new ground.

The Three Golden Rules for Successful Scientific Research.

This note is devoted to three rules, the following of which is necessary if you want to be successful in scientific research. (If you manage to follow them, they will prove close to sufficient, but that is another story.) They are recorded for the benefit of those who would like to be successful in their scientific research, but fail to be so because, being unaware of these rules, they violate them. In order to avoid any misunderstanding I would like to stress, right in its first paragraph, that this note is purely pragmatic: no moral judgments are implied, and it is completely up to you to decide whether you wish to regard trying to be successful in scientific research as a noble goal in life or not. I even leave you the option of not making that decision at all.

The first rule is an "internal" one: it has nothing to do with your relation with others, it concerns you yourself in isolation. It is as follows:

"Raise your quality standards as high as you can live with, avoid wasting your time on routine problems, and always try to work as closely as possible at the boundary of your abilities. Do this, because it is the only way of discovering how that boundary should be moved forward."

This rule tells us that the obviously possible should be shunned as well as the obviously impossible: the first would not be instructive, the second would be hopeless, and both in their own way are barren.

The second rule is an "external" one: it deals with the relation between "the scientific world" and "the real world". It is as follows:

"We all like our work to be socially relevant and scientifically sound. If we can find a topic satisfying both desires, we are lucky; if the two targets are in conflict with each other, let the requirement of scientific soundness prevail."

The reason for this rule is obvious. If you do a piece of "perfect" work in which no one is interested, no harm is done, on the contrary: at least something "perfect"—be it irrelevant—has been added to our culture. If, however, you offer a shaky, would-be solution to an urgent problem, you do indeed harm to the world which, in view of the urgency of the problem, will only be too willing to apply your ineffective remedy. It is no wonder that charlatanry always flourishes in connection with incurable diseases. (Our second rule is traditionally violated by social sciences to such an extent that one can now question if they deserve the name "sciences" at all.)

The third rule is on the scale "internal/external" somewhere in between: it deals with the relation between you and your scientific colleagues. it is as follows:

"Never tackle a problem of which you can be pretty sure that (now or in the near future) it will be tackled by others who are, in relation to that problem, at least as competent and well-equipped as you."

Again the reason is obvious. If others will come up with as good a solution as you could obtain, the world doesn't loose a thing if you leave the problem alone. A corollary of the third rule is that one should never compete with one's colleagues. If you are pretty sure that in a certain area you will do a better job than anyone else, please do it in complete devotion, but when in doubt, abstain. The third rule ensures that your contributions --if any!-- will be unique.


I have checked the Three Golden Rules with a number of my colleagues from very different parts of the world, living and working under very different circumstances. They all agreed. And were not shocked either. The rules may strike you as a bit cruel... If so, they should, for the sooner you have discovered that the scientific world is not a soft place but--like most other worlds, for that matter--a fairly ruthless one, the better. My blessings are with you.

Friday, August 18, 2006

10 People Who Don't Matter in Tech

Interesting choices (Linus Torvalds, Jonathan Schwartz, Steve Ballmer--wow) made by Business 2.0. After reading their reasons, I can't say I disagree entirely with their selections. link

Friday, January 20, 2006

VMware Workstation 5.5 Impresses

For the last year or so, I have been using virtual machines (not the Java/CLR type, but the ones that enable you to run a guest operating system on a host machine) for two purposes: software reviews and long Web searches.

The use in reviews saves from me from loading and unloading software and slowly corrupting my Windows registry and dotting my win32 directory with leftover DLLs. The use in long Web surfs is to protect myself from malware. In the latter case, if your virtual machine is corrupted by malware, the entire damage is contained to the small set of files that constitute that virtual machine. You blow them away, clone an existing virtual machine, and you’re back in business. Nothing else is affected–nothing on the host system, nor any other virtual machine. Pretty cool.

Anyway, for the last year I have been using Virtual PC from Microsoft, which does a creditable job. This month, I switched to VMware Workstation 5.5. What a difference! The most conspicuous advantage of VMware is performance. When you’re running VMware at full screen, it’s hard to tell that you’re in anything but your regular, native desktop. I haven’t benchmarked it, but it’s very, very close. In contrast, with Microsoft Virtual PC 2004, you always know you’re in a virtual machine because performance is very slow. Opening dialog boxes, moving windows, starting up applications–all these actions arevisibly slow. So, not only does everything take a long time, but it is harder to assess the performance characteristics of software.

So a recommendation and a tip of the hat to VMware for a great implementation.