Developer Skunk Works

Monday, January 02, 2012

Guice and Systems Programming - Reflections on Singleton and RequestScope

Summary: For Systems Programming (i.e. backend services) and even some WebApps, I recommend using @Singleton or non-scoped instances only. @RequestScope may be ok for WebApps serving HttpRequests directly from users in the request-per-thread model.

Context: I've done 10+ years of development mostly in Java and have seen frameworks come and go. Of those, the Guice dependency injection framework has succeeded and grown in adoption. I've seen Guice used in several medium to very large scale projects and have coached multiple teams in adopting Guice. In three projects, I've also seen production emergencies caused by incorrect usage of Guice (and other frameworks).

Production Emergencies using Guice:

Too many out of scope bugs to recall, most impacting production users. These are only determined at runtime and are caused by an often low-level component gained a new dependency on a @RequestScope, even though it is not always used within a @RequestScope. These have happened O(10+) times on two separate projects each, with completely different teams and styles (one web server, one RPC server).

Just to repeat - it's one class of bug, but over 20 production emergencies in two systems.
There is no known way to prevent these bugs other than 100% code and branch testing coverage at the black-box level. I.e. testing with unit tests or even mocked out services easily masks these runtime issues.
These are 100% due to RequestScope. Never seen for @Singleton or non-scoped.

Guice Singletons are Not Evil:

The traditional GoF Singleton Pattern:

A class will have exactly one instance in any running system.

E.g. SomeClass.getInstance() ==> return INSTANCE; // static member field

Does not allow for tests to have isolated instances or run in parallel.
Calling code points easily numbering 100-1000+ in a large system are hardcoded to a static method.

Guice Singletons:

Can have an unlimited number of instances based on:

Explicit binding, e.g. @Annotated instances or MapBinder
Explicit construction, for tests etc

Calling code points fully decoupled and declare their dependency class or interface
Tests can easily use new-real, mock, or stub instances
Allows testing using isolated instances in parallel

If you think you need RequestScope, think again:

I recently worked on a system from scratch with 5 engineers over a year, and we banned @RequestScope from the outset. The results?

Zero issues with scoping
Only once did we miss @RequestScope. Less than an hour of refactoring made the code simpler and eliminated the need or desire to use @RequestScope.
This system had > 20 major features, O(100k) LOC, multiple backends and storage systems, and supported 4+ architecturally different clients (test harness, Android, iOS, Web, ...). We did all of the things a normal WebApp does, and did not miss @RequestScope other than the note above:

Parameter parsing
Input Validation
Authentication
XSRF protection
Caching

The system was generally easy to refactor.
It was very easy to change our approach to various tasks using queues or other async processing, as there was no thread-affinity of any processing.

How did we get here?

Java is used primarily in WebApps.

Successful Storage Systems are mostly in C/C++. Examples:

MySQL
Oracle
Redis
Memcache
SqlServer
MongoDB

Notable counter-example that does not use Guice: HBase

WebApps often have lots of Servlets handling HttpRequests.

In ~2005-2006 Struts was well in decline and Guice came along.
The Action / Handler style of frameworks started to emerge*
Parameter parsing and input validation moved from servlets to frameworks
Many people repeated the "Singletons are Evil" mantra

The driving force here was testability and avoiding the real evil of supersized static registries (had different names in many projects, often "[Project]Registry".

Someone realized they could inject parameters rather than pass them as method arguments

For thread-per-request WebApps and those without crazy threading or queuing issues such as the oft mentioned "CRUD" apps, this worked well.

Note that the core ActionInputs* framework approach has ~95%+ of the benefit with zero dependency on RequestScope.

Systems Programming efforts that mistakenly used this style started to regress in terms of:

Production stability

See outages above.

Latency & Performance

A recent profiling effort saw 1/3 of a request's latency spent by Guice creating objects that had no request state, and for the most part no state at all. This happens when the dependency tree grows tentacles in the code-base.

Code factoring / coupling to the wrong model (thread-per-request)
Very limited processing model outside of "thread-per-request"

i.e. can't do SEDA, pipelining, batching across requests, etc.

Difficult debugging

Stacktraces with Guice provided instances are difficult to follow
Can't view source of generated runtime classes (AssistedInject, ...)
Control flow easily abstracted from procedural code such as a FilterChain being dynamically generated per request.

Production Emergencies using MapMaker:

Cache memory (weak references and keys) causing extremely long GC pauses with very low cache hit rates. This is due to the weak eviction implementation needing (1) two runs of the garbage collector to mark first, then finally remove the references and (2) expiring based on a very hard to internalize concept of "time to OOM" rather than a sane LRU policy. Switching to a simple size-1000 LRU cache (well under 1% of allocated RAM) resolved this entirely.
Weak and soft keys causing zero percent cache hit rate with MapMaker. An attempt to use MapMaker Map with long primitives as the keys had a zero percent cache hit rate due to (a) java primitive auto boxing and (b) MapMaker with weak or soft keys "uses identity (==) comparisons instead for keys". This caused every cache set or get operation to create a new Long object with a new identity resulting in zero hit rate.
Why MapMaker in this post? Both of the serious problems ran into with MapMaker occured with Guice injected (non RequestScoped) implementations of a Map interface. Guice encourages developers to rely on the "magic" of their environment and makes it easy to overlook implementation details and API limitations.

*What are Action / Handler / ActionInput Frameworks:
A full description is outside the scope of this post, but these are very common and easy to code framework approaches to handling requests, typically HttpRequests, in WebApps.

Actions are request-scoped (even if not by guice) processors for a specific request. I.e. a WebApp might have a LoginAction object that maintains state about a request as well as containing the code to validate, process, and return results for a request. It's state may include the user's login, password, as well as request metadata (a timestamp, security tokens, etc). These commonly have a constructor taking injected parameters (extracted and converted from the HttpRequest), along with 2 +/- methods to process and generate results for a request.
Handlers with ActionInput frameworks use singleton (no per-request-state) Handler classes along with ActionInputs per request. This separates input processing to be per-request and typically uses field reflection to populate the ActionInputs object. Each Handler has it's own SpecificActionInputs class extending a common ActionInputs class, with the framework using reflection to populate SpecificActionInputs per request. The Handler has very similar methods to "Actions" above, but now takes SpecificActionInputs as a method param avoiding the need for request-state to live in the Handler.

Sunday, September 12, 2010

My first dose of Git nirvana.

After using Perforce and CVS for years, I've spent the past two weeks using Git for our startup.

It only took a few days to get semi-comfortable using Git in the centralized model. I.e. I'd do some work and submit it to our central repo, rinse and repeat.

Until today.

I wanted to add a third party library to our system, which entails adding src & binary jars to /lib, and adding references to it in our ANT build.xml and shared IntelliJ project. I.e. just housekeeping work. So I did that locally and committed it to my local Git repo. I thought nothing of it. Now my working dir was clean, with no modifications.

Next I started using the third party library, adding some snazzy functionality to a demo. A little code here, a little there, and I was ready to commit again. Now I committed a second atomic changeset containing just this functionality and nothing more.

What's funny is that the whole time I was on VPN and could have submitted to our central repo, but why? Aside from the risk of my hard-drive melting down (unlikely and not a concern for 30 minutes of work), Git already had my commits tracked perfectly. So after completing these two bits of work, I pulled from the remote repo (with zero merge conflicts), and then pushed back my repo. That's it. Now everything is in sync, and my changes show up separately in our central repo with their independent commit messages etc.

So in summary: Even without doing wild distributed development, branching, etc, Git is extremely useful to isolate and track changes that compliment each other. Even without an explicitly named branch, you can complete several small chunks of work and push them together and atomically to a central repo. Great stuff!

Sunday, June 27, 2010

Definitive guide to disabling the touchpad / trackpad on Ubuntu 9.10 Karmic Koala (UI based, easy)

After reading several guides and forums about disabling the trackpad on Ubuntu 9.10, I finally figured out the magic combination. And there are no command line steps whatsoever - just System->Preferences choices.

The quick background:

Ubuntu has two System -> Preferences apps that affect the trackpad: Mouse and Touchpad. The Mouse app has a Touchpad tab with a "Disable touchpad while typing" option. The Touchpad app has an "Enable Touchpad" option on it's main tab. Apparently, if you check both, the Mouse app will detect when you are done typing and re-enable the touchpad!

Instructions:
1. Open System -> Preferences -> Mouse

2. Select the Touchpad Tab

3. Deselect (turn off) the "Disable touchpad while typing" option

4. Open System -> Preferences -> Touchpad

5. Deselect (turn off) the "Enable Touchpad" Tab

That's it!

Enjoy your laptop :)

Saturday, December 19, 2009

Ubuntu and a Thinkpad. A Beautiful Duo!

After another hour getting my dev tools setup on the Thinkpad I'm even happier with the hardware and software.

The Ubuntu community has high quality docs for all, of the few topics I've wanted help on. For example I now have middle-mouse-button scrolling working; something I missed from my last Thinkpad over three years ago.

Compared to Windows and even OSX, having a truly free OS and free software is liberating. Without question, both Synaptic and the new Ubuntu Software Center make finding and installing software incredibly easy. There are no restarts or system slowdowns. You can queue up several packages and they'll seamlessly download and install in the background. From low bandwidth at a web cafe.

As a developer this freedom and availability is key. If you're on the fence I say go for it!

Thinkpad T400 First Impressions and Ubuntu Linux Install

Just two days ago my new Thinkpad T400 arrived. I chose the base config with the WXGA+ LED screen option and webcam. In short, this thing is great.

After three years of a Macbook Pro (current and previous generation), it's good to be back to a thinkpad for development. The keyboard is amazing. Personally I really like the trackpoint, mouse button feel, and having 3 dedicated mouse buttons. This will be far better for programming and IDE environments etc.

Now to installing Ubuntu for dual-boot. Microsoft Windows 7 was preinstalled and I didn't want to remove it entirely. The Windows partition manager is a joke, skip right over it. It would only shrink the windows partition to ~80G even though only ~20G were used. So I installed Ubuntu via USB stick (more on that in a sec) and then used GParted. Note, you will also have to install ntfstools for GParted to be able to read and shrink the NTFS partition. After shrinking Windows to ~50G I actually installed Ubuntu a second time to make repartitioning easier.

For the Ubuntu install I had a 4G USB drive by OCZ. I used the Windows compatible uNetBootin program to copy the standard Ubuntu 9.10 ISO over to the drive and make it bootable. That took all of five minutes and was really easy. Simply plug the drive into the lenovo and on startup, hold down F12. Alternatively you can use the blue ThinkVantage button for a list of options. F12 lets you do a one-time boot from a different medium. Thanks Lenovo folks for doing this - so much more convenient than changing permanent BIOS options back and forth.

Live boot and install was amazingly smooth. It was the easiest and fastest OS install I've ever done. Out of the box, Ubuntu recognized all hardware on the T400 that I thought of testing. Webcam, High res screen, Trackpad with scroll areas, Wifi, Sound, Volume and Brightness controls, etc. Amazing. I connected to our home Wifi in a matter of seconds with great signal. The Installer runs from a shortcut on the live boot desktop and is very easy. It takes about 2 minutes of manual input (keyboard layout, timezone, etc), and then installs in about 10 minutes.

Later on I tried hibernation. It works!

In short:
Thinkpad T400 I give a rating of 9/10. The 1/10th ding is for a few creaky noises on opening the screen and carrying it. I hear this is normal so am not worried but, for example, the battery can jiggle a bit.

Ubuntu 9.10 I give 10/10. Amazing live install experience. I would recommend this to anyone. IMHO if you have a friend or family member who doesn't need MS Office or IE, I would now for the first time recommend Linux (Ubuntu) on the desktop.

Sunday, August 23, 2009

Open Hardware: Getting started with Arduino

Ever since attending the Maker Faire and subscribing to Make Magazine, I've been reading and seeing more and more prototypes using Arduino. Arduino is a collaborative Open-Source hardware prototyping platform including a free Software integrated development environment (IDE).

Speaking metaphorically, Arduino is to electronics hardware prototyping as Ubuntu is to computers. While there are several other linux distributions, there too are other open areas of hardware prototyping. None however are as friendly, clear, and obvious as Arduino. There is a vibrant forum community hosted at http://www.arduino.cc/, in addition to tons of high quality reference material on the IDE / language, hardware interfacing, and low level details.

The toughest part of getting started was deciding what to purchase and from where. Since Arduino is open source, there are several third-party fully compatible "clones" meeting the same specs. For someone just getting into hardware (it's been 10 years since I touched a resistor or an LED), you'll also have to buy some necessities depending on what you want to start with. I pieced-out my first purchases to www.sparkfun.com, ebay, and www.digikey.com. If I had to do it all over again, I would probably get everything from SparkFun and DigiKey. While ebay looked cheaper, after shipping on each item, I ended up paying either the same, or $0.05 less than DigiKey/SparkFun and had to deal with more packages, payments, etc. It's no fun to have your Arduino arrive while you're missing a breadboard or anything else to interface with :)

Here are the components I bought, with comments:
SparkFun:

COM-00682 LED Matrix - Dual Color - Medium x1 $6.95

I should have gotten the small. Bigger is not better. The medium doesn't fit on a single breadboard, and the pins are spaced so that you have to customize full-size or get two small breadboards. Thus, I ended up getting two lower quality / appearance full-size breadboards just to detach one of the power rails to get this to fit. Also, it's pretty big compared to (a) the number of pins it uses and (b) other small electronic components.

DEV-00666 Arduino USB Board x1 $29.95

Perfect! Nice packaging, high quality, I should have ordered sooner :)

COM-09288 Potentiometer - Linear x1 $0.95

Very useful for getting started. Makes it really easy to adjust timing without having to change code and re-upload / wait for Arduino to reset.

DigiKey: (format: part-num, manufacturer part-num, unit-cost, extended price (unit x quantity)

ULN2003APG-ND ULN2003APG(5,M) 0.55000 $0.55

Not everyone needs this. It's a darlington array which might be helpful working with a 595 shift register to power/drain current from an led array or matrix.

150QBK-ND CFR-25JB-150R 150O RESISTORS 0.06000 $0.60

I wasn't sure exactly on the resistor sizes I needed, and 150Ohm seemed like a common value for working with LEDs.

P833-ND ECE-A1CKA101 100UF CAP 0.15000 $0.45

Looks like I'll need this for a Max7221 matrix led driver which I ordered, to buffer the power supply against noise. Not 100% sure this is correct, but it's the correct farad value (100uF).

493-1095-ND UVR1H0R1MDD 100NF CAP 0.18000 $0.54

Second power buffering cap. Same, not sure, but it's 100nF.

P3K1103-ND EVU-F2LFL3B14 VERTICAL POT 10K OHM 1.19000 $2.38

I got two more vertical potentiometers, which are basically identical to the one from SparkFun at 10kOhm. This is a great size for tuning values. I got three so that I could build an Audino synthesizer which uses pots to control pitch and other features of the generated sound. See http://nomeist.com/audino/134 for a great video.

160-1066-ND LTA-1000HR RED 10SEG BAR 1.05000 $1.05

I figured a nice looking, pre-built led array would be nicer and cleaner than individual leds. Verdict: Yes! It's very nice looking and easy to use.

160-1067-ND LTA-1000G GREEN 10SEG BAR 1.05000 $1.05

Same as above. These are also quite bright, even at a ~10% PWM duty cycle. Very fun for making sweepers.

24KH-ND CFR-50JB-24K 0.05000 $0.25
30KH-ND CFR-50JB-30K 0.05000 $0.25

24k and 30k Ohm resistors purchased to control the Max7221 LED driver. The resistor value is used to control the current used to drive the array. I found some decent documentation on this and it looks like either value will work, but I wanted the ability to fine-tune the brightness a bit.

240H-ND CFR-50JB-240R 0.05000 $0.50

I got 10x 240 Ohm resistors for led control. These turned out to work great with the above LED arrays. Perhaps due to working with them at night, but at 5V (from Arduino) and 10% duty cycle, the 240o resistors still let plenty of current through for the LED array to be bright.

4608X-2-151LF-ND 4608X-102-151LF ISOLATED RESISTOR ARRAY 8SIP 150O 0.32000 $0.64

These looked convenient for working with a large LED array. They are basically 8 resistors in a single array that insert together into the breadboard. Note: I should have read this more carefully - there are fully isolated modules (what I got two of) and bussed modules. Bussed gives you more resistors in a smaller space, but have only one discreet sink. I.e. If all of your inputs need just Input -> Resistor -> GND, you can use bussed. This is the common LED use case. No big deal / will still use this, but bussed may have saved a bit of breadboard space.

296-1600-5-ND SN74HC595N 0.66000 $1.98

A 595 shift register. These allow you to use 4 Arduino pins to drive 8 digital outputs, and can be chained together to conserve pins. As I'm interested in driving up-to an 8x8 RGB LED array, but only purchased an 8x8 dual-color to start with, it seemed like 3x of these would be plenty. I.e. I can use two to drive the inputs to to the R/G side, while using the third to control the ULN2003A to sink columns of the matrix. Side note: Reading the spec-sheet now, I see that these can only drive ~6ma of current, which is only ~20% of the current spec on the LEDs. Thus, may need to purchase more ULN2003As, but will see.

Ebay:

Pre-cut prototyping wires x75 of various sizes. Very useful and cheap, and well worth the money.
A 830 pin breadboard for $6 shipped. Also useful, but next time I'd probably buy one of the larger boards so it's more solid and easier to add a bunch of components (i.e. the large led matrix). Note: larger also means easier to spread out and possibly protect your components.

-------------------------------------------------------------------

Connecting / Starting Up:

Since I tried saving just a few $ by ordering from separate suppliers, my Arduino arrived first and I didn't have any other components. Fortunately, Pin 13 is a built-in SMD (tiny, sufrace-mount) LED which you can drive like any other digital output. I installed the IDE which was trivial, and worked seamlessly on Vista x64 (64 bit Vista). Vista recognized the USB->UART (usb to serial) chip on the Arduino as soon as I connected it, auto-downloaded a driver, and it worked without a restart. Within the IDE you can easily load many samples, and I took one of the basic LED control samples and changed it to work on pin 13. One click and ... wait for it ... my little Arduino had a blinking led. It really is that easy, far easier and faster than ordering the components. Really.

Two days later, all of the goodies arrived and I started unpacking / identifying / organizing the gear. Note: I bought a small, $8 adjustable storage bin, and will probably return it for a larger one. It's really nice to put things away cleanly, and you probably won't use each of the dozen+ components all the time. The one I got is about 1.5in x 5in x 11in, and is much too small.

On to building the LED array sweeper, the fun part!

Insert LED array into breadboard, with pins on both sides of the center channel. Note the pin-zero / the first pin. The front of the array has a notch identifying pin zero. On the one I purchased, this is an Anode which will get connected to positive.
Bridge each cathode pin (all on the same side, opposite pin-zero), to Ground (GND), with a 240ohm resistor. A bussed SIP resister would make this much cleaner next time.
Connect pins 3-12 of your Arduino to pins 0-7 (Anodes) of your LED array. Note: I first used pins 0-9, but pins 0 and 1 can't be connected to power during code upload (they are serial comm pins).
Connect positive power (5v pin) and GND to the breadboard red/+ and black/- channel/strip where you connected the resistors in step 2.
Upload code and watch your sweeper!

Next post: Extending to include a pot for time control, and code for the sweeper including basic PWM brightness/dimmer control.

Friday, January 25, 2008

Response to MapReduce II.

This blog post is in reply to http://www.databasecolumn.com/2008/01/mapreduce-continued.html, and in specific points out fundamental flaws in David DeWitt & Michael Stonebraker's understanding of the MapReduce paradigm. My intent is purely to inform the public on what is really possible with MapReduce, particularly when paired with suitable highly scalable data storage such as DHTs (distributed hash tables).

I'm an experienced developer and have used Oracle, MySQL, and SqlServer professionally. Each filled a relational niche well. This post is not in any way against RDBMSs.

In response to the blog titled "MapReduce II", section No. 1, DeWitt and Stonebraker claim that a chain of three MapReduces would be required to answer the following question from two tables:
Table 1: Rankings (pageUrl, pageRank)
Table 2: UserVisits (sourceIpAddr, destinationUrl, date, adRevenue)
Question: What IP addr generated the most ad revenue during some specific week, and what was the average rank of the pages visited.

Aside from this being two questions, something typically done by RDBMSs as the cost has already been amortized over the large join, it is also an ambiguous question. I'll assume that the request for average rank of pages visited means "pages visited by that ip addr during that week". This is also a (likely) useless query, as many IP addresses are shared by large proxies such as those by AOL, or other ISPs.

Assuming for the moment (big assumption, not the entire answer) that instead of IP addresses in Table 2 we had some form of UserId, I'll say we can do this in a highly scalable way in a single MapReduce job. How:

Phase 1 (only Map Reduce, builds a useful data structure)
Map: scan all UserVisits records and map them to a key of [week-id/begin-date]:UserId
This will cause all user visits to be bucketed by week (begin date), then by revenue.

Reduce: for each set of input, sum the revenue and build a list of destinationUrls. Store the result in a highly scalable datastore that is indexed by key. There are several of these, but basically any large scale DHT with key indexing wil do. The output key will be of the form:[week-id]:RevenueDescending:UserId. Storing this result in a way addressable by this nicely constructed key allows for a single limited scan to find the max-revenue user with a single query to this auxiliary data structure.

After the Map Reduce, you can now find out all the information you need with relatively extremely cheap computation:

Now, read the list of URLs found by the query enabled by the Reduce: I.e. Do a limited (one record only) scan over the keys starting with key: "[week-you-want-id]:", and the first and only record returned will contain both the UserId of interest (answering the first question) and the entire list of destinationUrls for this user. The Aha! moment for some is to realize that the typical, and even a spammer/bot UserId will almost certainly have a very small number of UserVisits in any given week (say under 100,000). With this assumption, you can now simply iterate over the UserVisits list for this week-user bucket, and visit the Rankings table for each Url. Assuming that table is in an efficient-lookup DHT, you are now essentially done (most users of anything, even Facebook or MySpace probably have way, way under 100,000 page views per week).

If for some reason you don't agree with my 100,000 upper bound on page views per week, consider that with a two phase Map Reduce you can eliminate that step and make the final lookups O(1). This would add a phase before my Phase #1 above, that joins the two tables on URL. With the two phase approach, you can also completely reconcile having to support IP addresses (that may belong to AOL proxies and the like) in addition to UserIds. The reason, is that you now have the pageRank, and can compute the average pageRank per IP in the second phase Reducer easily. Now you simply store the average pageRank instead of the URL list, and the results become constant size per-record.

Finally, this Map Reduce has a major benefit (and cost!) above and beyond a single RDBMS query. It creates an auxiliary data result. This can be extremely useful. What if you wanted to chart the max revenue per week over time? With an RDBMS, assuming indexes were used to speed up the join and constrain the dataset by week, you would be looking at one (or many) costly queries. With the Map Reduce, the expensive computation is done once and is now available for future use. This approach of pre-computation isn't for everyone, but for high scalability it can be critical. You don't want to have to scan over these large tables often, if you can avoid it.

Sunday, June 04, 2006

Buzzwords Kill. How terms corrupt decision making in software projects.

I recently viewed two major software development projects fail. Fortunately, I was not directly involved with either of them. Over frequent conversations with the software developers, managers, and architects involved, I see a common thread: Buzzwords misused to strengthen an agenda or for empire building.

The two projects failed to the mis-application of two technical terms: J2EE and RUP. I'll start with the first, J2EE.

A multi-year project for a large piece of enterprise software started with a wide scope and few technical rules imposed by the client. The main technical directive issued by client was: be J2EE compliant. Early in the process, Software Architects with little expertise in Java/J2EE defined a core set of architectural goals. Ignoring the "common wisdom" gained by 1000s of projects and companies, that EJB is more often inappropriate than not, they defined the core architecture as EJB on J2EE. They declared that everything in the system should be represented by EJBs, and that all state information and message passing be handled by EJBs.

The result was easy to predict, although I can't fully credit the failure with the abuse of EJBs. Change requests took days or weeks for even the most simple of changes. Large parts of the system were never even started as the environment was soo cumbersome to developers. While I can't disclose the project's current state, you can probably guess...

Before moving on to RUP, let me say that the failure here was not primarily the EJB specification. The real failure was that "J2EE compliance" was interpreted to mean "everything must be an EJB". While not referring to "empire building", this certainly does refer to "agenda setting" and an abuse of power by the software architects. Had they cared to visit the few groups actually doing Java and J2EE development, they would have quickly understood that EJB is simply one part of a greater framework for software development in Java.

Now for RUP. Having just taken a course in software development lifecycles (SDLCs), I understand that RUP is a framework for modelling SDLCs. You can tailor RUP to be very much like Agile, or you can make an ancient Waterfall SDLC, or anything else for that matter. The project I mention here actually refers to several failing projects.

Perhaps for "empire building" or perhaps just for lack of understanding, the "process staff" decided to use absolutely everything they could find in the RUP SDLC. Even where it didn't make sense. Where they could have multiple iterations, they had many. What they didn't ever do was read up on how projects can succeed or fail. Here are some key factors from the CHAOS report by the Standish Group, describing the most common reasons software projects fail:

Lack of user input.
Lack of complete requirements.
Changing requirements.

Now, what have we as a community of Software Engineers learned over the past several decades? That the waterfall doesn't work except for a very few, extremely expensive, no way to patch applications such as Space Vehicle Software, Bank Software, etc. For most other applications where patching is possible and even likely, a more iterative/evolutionary/Agile approach is called for. Well, at work they choose extremely long RUP iterations (average of 6-12 months) and phases lasting for 1-2 years. Many of these projects are already several years old without a single line of code written or customer/end-user feedback. Most major business leaders lost all confidence that these projects could ever succeed, and most assume they will be discarded.

In summary: Be careful when using buzzwords, and do not make decisions based on buzzwords and acronyms alone. This may seem like trivial advice, but perhaps it will save you or someone else a failed or delayed project.

Saturday, April 22, 2006

JavaServer Faces - JSF - Powerful and Underestimated.

Project History and Experience.

Starting one year ago, I began researching UI frameworks for developing web appliations in Java. Keep in mind, I'm a Senior Software Engineer with 7+ years of experience, so I do bring some perspective to this research. My goal was to support a fairly complex user interface comprised of thousands of fields. These fields would include most of the standard user interface components such as text, select boxes, check boxes, buttons, tabs, various panels, and rich graphical mapping components. Each user will see distinct subsets of the full system, making the choice of UI framework particularly important.

In early 2005 there were at least a dozen Java web app frameworks, but I quickly narrowed the selection down to a few. There were highly praised but "non-standard" frameworks such as Tapestry, Spring and the now aging Struts. While each of these addressed page flow and validation, each seemed to focus on a JSP or HTML page as the core of the UI layer. If I recall correctly, Tapestry focused on plain HTML as the view layer with special tags embedded. Struts had the largest user base but, speaking from personal experience, can quickly become a cumbersome liability in a large project. Spring promised simplicity and elegance, with a focus on integration with hibernate.

None of these projects addressed the UI View layer in the classic Model View Controller (MVC) Architectural Style. Except JSF. JSF's core focus was a Component based View layer. If I were building a small application with, say 50-100 fields, standard JSP would do just fine with almost any framework or none at all. For a complex application with 1000s of fields, with rule driven inter-dependencies, the full MVC style was needed. JSF components could fire events. They could also be programatically composed, manipulated, and decorated.

I briefly entertained the idea of simply creating my own HTML RenderKit rather than take the potentially steep learning curve of JSF. I'm incredibly glad I stuck thru the JSF initial hurdles. While most of the JSF tutorials out there are for standard JSPs, what I wanted was JSF without the JSP. As the learning continued I thought JSF with JSP was fine, but thought I would need custom components. After months of effort, I have yet to find a need for a component that does not exist in either the core JSF API (per the Sun 1.1 Spec) or the Tomohawk MyFaces Implementation.

After 4 months into implementing this major project, I am ready to fully endorse JSF for anyone who will listen. The concept and style of JSF is incredibly powerful, providing a very rich present or future path for those who are willing to try it. My coworkers are openly impressed with the project and many did not think a web application was capable of this type of interaction. Just a few weeks away from production deployment, I've confidently load tested the application and it should far outperform our current legacy app.

Please forgive the intentionally vague lack of mention of my company or the project specifics. Let's just say that it's a pretty cool project, for a Fortune 100 company, and will be used by quite a few people.

Technical Details. Why you should think about JSF.

1. Composite Components.
Just like Swing, .NET Windows Forms, or most other client side UI Frameworks, JSF components are true components. Due to their object hierarchy, any JSF UIComponent can contain other UIComponents as children. This grants incredible flexibility. It means that you can generate and render from an arbitrarily complex domain Model to an Object Graph of the UI View.

2. Decorators.
Due again to the elegant component hierarchy of JSF, all components can easily be decorated. In this case, I built my own small hierarchy of Decorators that could decorate any UIComponent. There are several domain-specific Decorators I can't mention, but an obvious one that I can is for debugging. Simply pass the Decorator list to the root Model -> Object Graph, and each major section of functionality will be wrapped with a debugging panel showing important state information for the developer.

3. Component Attributes.
Gone are the days of coming up with a naming convention for HTML form parameters and parsing in those same parameters. With JSF Components, you can simply add as many Java Objects as you wish to named keys in the JSF Component. This is similar to "Object Tags" in some thick-client UI frameworks. In this way, when an event is fired for a Component, the Event handler can simply lookup the Object you attached directly to the Component earlier on. This also removes the potential (definite for my application) burden of using HTML form parameters to subsequently look-up objects from another data-store or Model.

4. Event Listeners.
If you use Eclipse, you know about it's code completion and automatic import suggestion. The first few times I implemented JSF Event Listeners, I almost selected the Java Swing ActionEvent import (suggestion) rather than the JSF ActionEvent import. I haven't bothered to compare the API specs of each class, but I can tell you that there is nothing lacking JSF ActionEvent API. You can completely free yourself from the HTTP cycle and simply concentrate on event sources and handlers.

For static applications without a dynamically generated component hierarchy, such as most of the JSF-with-JSP tutorials show, Event Listeners can be bound directly to the UIComponent itself. For my application, which currently has ~ 30 different Listeners/Commands, I realized it was not worth looking up the proper Listener for each control, when on average only less than 1 in 10 would be clicked on per render. Thus, all Command components are bound to a single Command Listener. This Listener, upon receiving an event, looks up the appropriate listener and invokes it with the event.

5. Extensibility
It's pretty easy to develop custom components for JSF. Due to the specific nature of the JSF specification, the MyFaces components can even be used with the Sun Reference Implementation (RI). For those obviously needed components missing from the RI, MyFaces easily separated their core RI components from their extended JSF components (Tomahawk). Other groups both commercial and non-profit/open source are regularly releasing sets of JSF components. As JSF adoption increases, there will be a stream of more and better components.

Conclusion.
Of the frameworks mentioned in the first section, only JSF has two completely separate implementations. Only JSF has the full endorsement of the Java Community Process (JCP). We can be quite sure that JSF will be around for a long time. After such a positive experience with JSF I am confident that others are too. I'm sure that in the near term, more blog entries like this one will show up and build the momentum, respect, and ultimately adoption of JSF. With the excellent extensibility in terms of 3rd party Components, JSF will be ever improving even without a specification update.

I'm not saying everyone should jump to JSF now, but if you are building a complex application in Java for the web, it's worth serious consideration.

Wednesday, January 26, 2005

Tomcat, ANT, CVS, Success in the Enterprise - Part 1

After two long days of mostly business-side meetings about my group's focus at work, our choices to use open source tools has been validated by a landslide.

First, ANT. When asked who else in the company uses ANT, which I did not know, someone thankfully chimed in and stated that both of the other two main Java-based development groups use ANT with great success.

Second, CVS. The long-since-promoted to management senior manager present had used CVS in the past. Not only was the choice of CVS validated, but there was much relief and satisfaction that we moved from a flat-file-tar.gz archives method of backing up source, to a full-blown concurrent versioning system (CVS).

Most importantly, Tomcat. This is not simply a tool enabling our development - it is a platform that we will use to develop on, deploy to, and run our production systems on. I say systems because we use Apache-AXIS for SOAP serving, and vanilla Tomcat for our web apps.

I found out that one of the major Java groups (which has been flailing about recently) has used (drum-roll) BEA WebLogic for their failing platform. They are riddled with both organizational issues as well as development issues. In short: they move like molassas. This other group has failed to embrace the new cross-enterprise focus on SOAP for our middleware, and takes eons to deploy new and simple features.

The Sr. Management present congratulated us on our success migrating to Tomcat, and hopes that our documentation and technical leadership will be a foundation for other groups to build on.

I always thought that using open source in a large enterprise would be a struggle. Today I know, it is not.

Saturday, January 08, 2005

About MySQL, Java, and good Prototypes

Over the holiday period at work most of my team was on vacation. Thus, I was left to "special projects". My group and I decided that a good project-prototype would be a SQL/RDBMS backed complete re-working of an existing form on our website. By form - imagine a form with about 20 complex fields for working with a command-line based back-end legacy system.

The goals of the prototype were simple:
Allow users of a complex app to save their form inputs in named templates.
Allow users to retrieve templates at any time, with an easy/natural UI.
Get this done - including installing MySQL & learning it's quirks in less than 4 days.

I have to say: the OSS community rocked. Soo many people out there using MySQL posted tutorials & FAQs on MySQL <-> Java usage, that rarely was I stuck on a problem. The only thing that got me - more of a Java issue, was that I had an old javax(?)-naming.jar in our web application lib dir (from when it was on JServ) that broke the more recent JNDI naming in Tomcat. Once I removed the outdated ...-naming.jar file, everything fell into place.

Having used JDBC (the java API for database connectivity) I was no stranger to the issues of using an RDBMS concurrently for the many simultaneous users of a web application. I started looking for a database/resource connection-pool and it was immediately clear: Apache-Tomcat includes a very robust connection pool! That was one of the only things I was about to miss from using WebLogic. So - after setting up the Jakarta-Commons based database connection pool (DBCP), I was ready to begin on my code.

In just a few hours, I had a basic schema, users, and a FREE tool for schema mgmt, backups, and general MySQL Admin. The tool I use is SQLYog. These guys realized that they'd never compete with the free tools (php-MySQL) if they didn't have a free 'lite' version. So, with this great windows-UI based tool installed and a few hours work, I was up and running - even with nice printouts of my schema (and html to email around).

To wrap it up - this article is intended to give confidence to anyone considering MySQL as a back-end for any Java project, not necessarily a web based one. The driver works well and the documentation is fantastic. In less time than purchasing and our IT-Unix Support group could have installed any of the non-free/OSS RDBMS systems, I had a fully working prototype ready to show-off. It was good enough, that it's now commissioned into a full project that will include upgrading a few other features and polishing the UI to work in both standalone (IE/FF) browsers as well as PDA browsers such as those in RIM devices.

MySQL + Java + Rapid Development = Success.

Thursday, December 23, 2004

Apache, Jakarta, Tomcat, AXIS - more than the sum of their parts

For months now I've read constant articles on slashdot about the merits of .NET vs J2EE. I'm getting sick of it.

I work for a large (1500 people) subsidiary of one of the largest companies in the US. Previous to this company, I worked at several dot-coms in the SF-Bay Area. The dot-coms, with the least money, all spent vast sums of it on proprietary closed-source systems. In the past 5 months at my new job, using ONLY open-source tools in addition to our in-house legacy systems, I've accomplished more than I had in the past several years of dot-com work.

I credit my new found productivity to two things:
1. Good Management.
2. The Open Source Community.

The management issue basically goes like this:
Sometimes you have to make a decision on a project. If you constantly change your goal when you're 60-80% of the way to a release, you're never going to get anything done. At my current job my boss gives us tasks, and lets us run with them until completion. If something much more important comes along, we'll interrupt as needed, but get right back on track with our project goals.

Open Source Community. Wow - I'm seriously, honestly, blown away by the Open Source Community. There are shared, always updated FAQs, Wikis, email lists, and usenet. There is nothing preventing people in the Open Source Community from sharing ideas, code, FAQs, Wiki entries, email, usenet, and more. Have a bug? Fear not - search the public bug database, email the developer-list directly, get an answer for free within hours. What's the cost? Well, as a matter of principle, contribute something back. You don't have to be an elite hacker - even your email to a list & its' replies will serve to bolster the online documentation of a project long after you're done with it.

When we have a new project, and it requires a piece of software, we don't need to deal with any other department. Yup - not purchasing, budgeting, project management, or any other group. It amazes me that at the dot-coms, handling less data in their dreams of expansion than my current company, they insisted on expensive closed-source solutions when viable standards based alternatives existed. Twice in three years at a former company, our entire development team was shut down for a day following WebLogic license expiration. We were too small for our account rep at BEA to feel it worthy of sending us a warning email, so we simply found out one day that our servers were hosed. Can you imagine if that happened in production?

I'll say it now, proudly, I never want to use WebLogic again. I use Tomcat now. It's a reference implementation, is actively developed, has great performance, and oh - by the way, almost every Java webapp-related utility/code/package/component/free software is tested on Tomcat first! Find a bug? Fix it your-self, recompile, and submit the code back to Apache. WebLogic had all kinds of interdependencies on the specific JVM it ran on. These weren't obscure things, but things like core JPDC support - that suddenly stopped working if you ran WL on a machine not blessed by their specific JVM.

And that's another thing: do you really want developers who only think/work at their desks? I've got Tomcat on (1) my Windows 2000 machine, (2) my Linux machine. At work, I've got it on a 4-cpu linux machine and a dual HP-UX machine. Guess what, it works everywhere. Oh yeah - one last thing: want to install on another machine, "just because"? Don't worry about sending your "budget man to the congress", simply download and install. No silly licenses. No explaining to corporate accounting why you need fail-over on this machine, dual-cpu license on that machine, cold-backup licensing on that machine... Just install and go.

This is why, in the last 5 months, I've achieved well over a year of dot-com closed-source platform work done. I don't have to wait for a budget, I was hired to do a job, and these are the standard tools. Apache - the vast majority of the web. Tomcat - I'm guessing the largest percentage of Java webapps are deployed on it. MySQL - oh yeah, InnoDB is ACID compliant the best I can tell, supports well beyond our minimal backup/restore needs, and yeah - it's GPL.

And now, back to the start: .NET vs J2EE. Wow - do the lame media at zdnet and the like get this one wrong. .NET is a fledgling competitor to J2EE. I used it for 6 months at a dot-com. Yes: it has a very slick and nice IDE, with lots of useful wizards. Problem: even to do our most basic of apps, the only components we could find were closed-source, expensive components (i.e. a DataTable-Binding GUI for $500 etc). The equivalent in Java would be at least a half-dozen components that get weeded out to one or two very successful and mature open source components. While there are a great deal of good choices out there, simply drop on over to jakarta.apache.org to see some of the Java projects at the Apache Foundation. Java is mature in a way that .NET is hopelessly behind - it has the support of *Millions of people around the world refining, sharing, testing and releasing great software.

*Note: by Millions, don't think I'm crazy. It's just that I recently read "The Cathedral and the Bazaar" A major point of this was that by releasing software as Open Source, your users become more valuable beta-testers than paid in-house QA testers are on-average. Why? Because they will test in more ways, in original ways, and as long as you have a honest information flow, will give you more valuable feedback than someone blinded by in-house politics and legacy ideas.