I recently went to Amazon’s Web Services Startup Tour (see some nice pics over at Ian Irving’s blog). I’ve been using S3 (via JungleDisk) for a while but all I really knew that EC2 meant “VMs in the Cloud” but didn’t have much of an idea about their other acronyms mean (SQS, EBS, etc.) nor any technical details. Here’s some of what I learned…

AWS is paid on a per-use basis with one charge for bandwidth and then another charge for any persistent resources used (HD space, VMs, etc.). With only one exception, whenever you transfer data between AWS services it’s free – a big incentive to use multiple services together.

S3 is their oldest service and offers near unlimited storage. What’s interesting is that many other services can automatically backup to an S3 account. S3 organizes the stored data into buckets (analogous to directories, except without nesting), objects (analogous to files), and keys (like in a hash, used to retrieve objects).

Simple Queue Service is a message service which provides a way for apps to store messages in a queue (*shock*) and can then be pulled out again by another part of the system. These queues can then be chained together to provide a simple way of controlling the flow of messages along different parts of a system.

Elastic Block Service provides a network mounted drive for VMs inside of Amazon’s Cloud. Nice for adding more space your EC2 instances and it’s lifetime is not limited to the instance it gets mounted on. A couple people pointed out that you can store most of you’re VM’s data on the EBS then when you bring up more VMs you shut down your VM, clone a couple copies, then clone the matching EBS Blocks. Sounds sweet.

EC2 is the best known part of the cloud: the VMs themselves. EC2 lets you clone existing VMs on demand so that you can grow your capacity. One of the presenters showed how he setups up his VMs with a shell script on his desktop. The VMs can run either Linux or OpenSolaris (with Windows VMs coming soon – wonder what the cost for those will be?). There are about a dozen different types of VM instances which you can run.

The only thing that keeps AWS from being completely automatic is the fact that load balancing is something that you have to do by yourself. However this is not a huge problem as you can just setup some VMs to do load balancing and certainly SQS can help out there too.

During the Q&A an audience member asked about why people entrusted their architecture to Amazon’s virtual cloud. The general consensus was, even ignoring cost and added flexibility, no one was really sure they could do it better and the cost of trying is so high that given the cloud as an option it seems like the best idea.

Amazon has some other neat offerings like SimpleDB and the Mechanical Turk which weren’t really explored but sound pretty useful. I haven’t looked at other providers of similar services but it would be interesting how do they try to differentiate themselves – especially since it should be  possible to mix and match the services between providers.

September 18, 2008, 12:40 am o'clock

I love books about mistakes. It’s not just a matter of liking seeing people fail but there’s real value in learning from other peoples mistakes. The bigger the screwup the bigger the lesson. In Fumbling the Future the authors showed how Xerox basically invented the personal computer years before IBM or Apple then failed to make any money off of it.

A bit of background: Xerox was the first company to make a photocopier. They then made a not uncommon move: they didn’t sell their equipment they leased it and charged per copy. A few years after they released their first copier and they had more money than they knew what to do with.

But they knew that wouldn’t last. Technology would soon make photocopiers obsolete (right?) the future would belong to computers not dead trees (also, robots were clearly promised but remain undelivered). Xerox would commit itself to developing the “Architecture of Information” which make it a leader not just today but for decades to come. So Xerox created the most prestigious and well-funded computer think tank to come up with the technologies of the future: PARC.

And they did.

Xerox’s PARC invented the first Personal Computer (the Alto) years before IBM and Apple. They invented WYSWYG word processor (two of them in fact: bravo and gypsy), developed the mouse, the concepts of windows and GUIs, idea of distributing personal computing across a network and sharing resources, Ethernet, and SmallTalk.

So why does Xerox still mean copies instead of computers or technology or vision?

Xerox failed to monetize these technologies for a few reasons which have now become common IT counter-patterns:

  • In 1969, Xerox acquired Scientific Data Systems to get a foot hold in the computer market. The idea was that PARC would develop the technology and SDS would make it into products. Makes sense, right?

  • Unfortunately, PARC’s engineers had two main attributes: 1) they were the best and 2) they knew it. PARC’s teams didn’t have much regard for SDS who made a living by not competeting with the elephant in the roomSDS never developed anything that PARC created.

  • Xerox required so many signoffs on even the most minor change that in the end no one was responsible for anything. It’s important to consult experts but eventually responsibility has to fall on someone.

  • Xerox managers, largely number driven managers from GM, weren’t really equipped to develop new technologies (after having come from GM’s “optimize the factory” environment). They tried, and were rewarded, for cutting costs and setting aggressive deadlines which they weren’t qualified to set. This created the inevitable environment where engineers cut quality to meet deadlines.

  • The above two points made wouldn’t have been such a problem but Xerox eventually lost it’s monopoly. The disconnect between management, reality, and engineers meant that when competitors entered the market with “acceptably worse” quality but pleasantly lower prices Xerox didn’t even know how to compete. Nor could it learn – the company wasn’t able to conceive a world where they weren’t the kings of copies.

  • The crunch caused by competitors made an increasingly defensive and conservative company more so. No executive nor upper manager wanted to risk their corporate necks on personal networked computing in an age where timeshared computing was still new (also, in part, pioneered by PARC). By the time Xerox finally got an executive with enough guts to push into PCs it was too late: IBM and Apple had beat them to it.

September 16, 2008, 12:05 am o'clock

I think I’m ready to start blogging again. I’m on a social web kick right now and so I figured  I’d update this (both the backend (MAN did was this out of date) and post again.

I also think that during my previous blogging attempts I was doing it wrong. I used to spend a lot of time trying to write my blog post and that, of course, means that many of my posts failed.

So what have I learned from the past and why do I think that this blogging attempt will be win where others are an embarrassment?

  1. Easy linking – I like links in blog posts. They can often be used to turn inside baseball into an introduction to a new subject. But links are a pain in most online WYSIWYG editors (something like 4 clicks + a copy and paste) and html fails at being a decent markup language for humans. So I went looking for a better markup language.
    Initially I thought to go a wiki. However that didn’t really feel right (and TikiWiki overwhelmed me with its feature set!). At the same time I was learning RoR and I came across Textile. Man, did I get excited. I do all my formatting and linking easily without a bunch of clicks. Thank you, Textile, for giving me my blog back.
  2. Spell Check – I hate spelling mistakes. However they seem to have a peculiar affinity for me. Today both Firefox and my blog play nicely together so my mistakes don’t break my reader out of the text wondering “What’s that word?”. Also spelling mistakes are embarassing embarrassing.
  3. This is a blog. not an A paper for a prof. not a ballad for a lover. It can have problems. It can contradict itself. Really, life’s too short to worry about my blog.
August 19, 2008, 9:56 pm o'clock

The distcc with MSVC project started last summer and as of today (and the fixing of bug 374563) we are glad to announce that we are able to distribute (non-debug) windows builds of Mozilla. This has been the only thing that has kept us from releasing the beta 01. Since it turns out that the bug is really in Mozilla (well… really in MSVC, but it’s simpler to change Moz’s code than MSVCs) there is no real difference between

This project started in early August ‘06 when Vlad said he wanted distcc to work MSVC. I had heard of distcc being used for gentoo but didn’t think of applying it to Windows and Mozilla. Cesar and I started with some pretty optimistic ideas about how long and hard we’ll be working!

But, with the relase of beta 01, this leg of the journey is coming to a close. We’ve been able to compile moz with distcc and msvc (information is going up on the wiki as I type this).

As always, you can get the source at svn://cdot.senecac.on.ca/distcc/branches/beta01. However this time we’re also offering windows binaries!

March 20, 2007, 12:39 pm o'clock

Last Friday, I was invited to participate in a panel at conference on applied research in Colleges held at OISE/UT. I was very glad to participate and give them the “student perspective”, having worked on a number of applied research projects. However, it was a little awkward: I seemed to be the only there with any kind of technical background! It was really weird trying to talk about what I learned and my experiance without really being able to touch on the wonderfully exciting technical things I learned! After all, that’s the cool part.

Speaking to people from the humanities forced me both to speak their language (human?) and think in different terms (!technical). The number one thing that I’ve gained from working on these projects is the confidence of knowing that I can develop software. Working on these projects has also helped me to apply the knowledge I learned in school to actual user needs. Not to mention that it gave me a chance to work with professional developers.

Seneca’s work in open source also means that many of their applied research projects give students a chance to be a part of the Open Source community. I can honestly say that I’ve known (either particpated in or know people who have) almost half a dozen  different applied research projects – all of them have used some kind of open source product as part of their work (either an open source framework, langauge, or major application was part of their work). While not all students and projects throw students into open source culture, the opportunity is there for anyone who wants to be a part of it.

While not everyone can always be a part of applied research projects, every student
can benefit from that kind of experiance. So what can they do?

Open Source is a wonderful opportunity to be part of community of professional developers while at the same time developing technical skills. At Seneca, we run Club Moz a great club where students can get together to work on Open Source projects!

March 14, 2007, 9:18 am o'clock

I haven’t been blogging for a while. This may make it seem like distcc has been stagnant, but that couldn’t be further from the truth.

A major thing worth mentioning is that we’re now at alpha 2. You can get this code at svn://cdot.senecac.on.ca/branches/alpha02. We also have a branch that is working on mingw compatibility (available at svn://cdot.senecac.on.ca/branches/mingwcompatible).

I think alpha 2 is probably the most important accomplishment. It has dealt with a whole bunch of issues, mainly centred around dealing with unix/windows path issues. It also added support for cl -I option which didn’t work with unix paths properly (before). However we’ve hit some issues trying to build Mozilla. We are able to distribute the compilation but the linking fails. Let me reiterate that: We are able to distribute the compiles just fine! We have some weird misconfiguration that prevents linking. Cesar (with help) is now testing other apps whether we definitely work.

Our next steps will be:

  1. Ensuring that our code definitely works (If we don’t find any real issues with alpha 2 then we’ll move it into beta make a big push to find testers and then release it).

  2. MinGW port. MinGW offers a great alternative to Cygwin and we’re going to try to take advantage of that by making the distcc code build under mingw. Right now the mingwcompatible branch is really unstable (won’t build). The big change in this branch will be the use of libtool for linking with winsock.

February 27, 2007, 9:43 pm o'clock

Changed my Binary blue scheme after some “criticism” of it. I torn between this them and the black scheme. Hmm…  maybe I’ll make my own scheme? Nah… no time.

January 23, 2007, 3:37 pm o'clock

I rarely use the word orgy (sigh) but I feel confident that that I was correct in calling Club Moz’s first hack day an orgy of success. With over 20 people attending we were a little overwhelmed. Luckily Phil, Vanessa, and Ben did a great job getting people introduced to Moz (while I ran around trying to make sure that people were moving in more or less the same direction). We invite everyone to attend our Hack Days and get involved.

One of the issues that we faced is making it possible for everyone to be able to work on code on a relatively large scale. The problem is that, despite everything David has done, our lab machines aren’t a great Moz build environment. (Not that everyone has to work on the Mozilla code directly, there’s lots of great stuff to do in extensions, python, and web dev – these links are just examples of one of each check out the full project list. But I digress…) Step in Ben Smedberg and MozillaBuild which creates a perfectly set up Windows build environment without having to install anything else. While Ben distributes Mozillabuild as an exe (which might have caused problems considering the Byzantine setups that the desktops here have) Dave gave us a zipped copy of it using the “the ultimate networking tool“.

This sounds, and is, great. However, when we tried to checkout the moz tree and build it we hit an error. The file basically said we can’t copy mozilla/config/nsStaticComponents.h to objdir/dist/include (if I remember correctly). Cesar and I poked around the net a bit. We fooled around with umask. We check whether we definitly owned the file/parent dir. We were stumped. Then I suggested we try an idiotic innovative solution of running chmod -R u+r on the source root and objdir. It worked. I don’t know why (I still think that chmod should have failed) but the build seems to be going fine.

Looks like Hack Days will be all right. [Edit: the build did work]

January 23, 2007, 3:16 pm o'clock

Naturally, one of the top goals for Cesar and myself has been doing a distributed Windows MSVC build with distcc. Our alpha release which was so good at compiling “simple” apps seems to die on Moz.

The build dies on mozilla/nsprpub/config/now.c . Distcc claims that the compiler died with exit code 2. However distccd’s logs say that cl exited with exit code 0. We’ll be looking into this more closely over the next few days.

For those interested, here is the distcc alpha01 failed moz build log, distccd alpha01 failed moz build log, and the Mozilla Build log for failed build (distcc alpha01) log.

January 10, 2007, 1:56 pm o'clock

I’m a incredibly proud to announce that as of now there is a version of distcc which works with Microsoft’s Visual C (and C++) Compiler cl. We’re calling this an alpha release and hoping that people will let us know if there’s anything they need fixed.

To get the alpha all you need to do is download it from svn://cdot.senecac.on.ca/distcc/branches/alpha01. Run ./configure, make, and make install. Remember that cygwin is a prerequisite.

After you have distcc installed (on your client(s) and server(s)) you need to start your daemons. Just run distccd—daemon—allow 1.2.3.4 (or whatever your client’s addresss is). Alternatively you can run distccd with an ip mask to allow multiple clients to connect: distccd—daemon—allow 192.168.0.0/24. This needs to be done on every daemon, or slave, box.

Finally you need to compile something! For distcc to know where the daemons are you need to export an enviornment variable called “DISTCC_CL_HOSTS” and give it the ip addresses of the hosts which can run MSVC jobs seperated by whitespace. When using msvc distcc does not consider the .distcc file at all. From the command line distcc is run like this:

distcc cl /c /Fofileone.obj file.c

Masquerade mode defaults to gcc settings and if you are using an executable name other than “cl” or “cl.exe” than distcc will default gcc. This is done to ensure that no functionality is lost for current users.

Please report any issues to distccbugs@foobartastic.com.

Cesar and I are now going to work on making sure that distcc with msvc works for Mozilla and porting unit tests to ensure that cl users have all the distcc functionality that gcc users do. For details on our plans check http://zenit.senecac.on.ca/wiki/index.php/Distcc_with_MSVC

January 9, 2007, 8:00 pm o'clock