Monday, January 02, 2012

Guice and Systems Programming - Reflections on Singleton and RequestScope

Summary: For Systems Programming (i.e. backend services) and even some WebApps, I recommend using @Singleton or non-scoped instances only. @RequestScope may be ok for WebApps serving HttpRequests directly from users in the request-per-thread model.

Context: I've done 10+ years of development mostly in Java and have seen frameworks come and go. Of those, the Guice dependency injection framework has succeeded and grown in adoption. I've seen Guice used in several medium to very large scale projects and have coached multiple teams in adopting Guice. In three projects, I've also seen production emergencies caused by incorrect usage of Guice (and other frameworks).

Production Emergencies using Guice:
  • Too many out of scope bugs to recall, most impacting production users. These are only determined at runtime and are caused by an often low-level component gained a new dependency on a @RequestScope, even though it is not always used within a @RequestScope. These have happened O(10+) times on two separate projects each, with completely different teams and styles (one web server, one RPC server).
    • Just to repeat - it's one class of bug, but over 20 production emergencies in two systems.
    • There is no known way to prevent these bugs other than 100% code and branch testing coverage at the black-box level. I.e. testing with unit tests or even mocked out services easily masks these runtime issues.
    • These are 100% due to RequestScope. Never seen for @Singleton or non-scoped.
Guice Singletons are Not Evil:
  • The traditional GoF Singleton Pattern:
    • A class will have exactly one instance in any running system.
      • E.g. SomeClass.getInstance() ==> return INSTANCE; // static member field
    • Does not allow for tests to have isolated instances or run in parallel.
    • Calling code points easily numbering 100-1000+ in a large system are hardcoded to a static method.
  • Guice Singletons:
    • Can have an unlimited number of instances based on:
      • Explicit binding, e.g. @Annotated instances or MapBinder
      • Explicit construction, for tests etc
    • Calling code points fully decoupled and declare their dependency class or interface
    • Tests can easily use new-real, mock, or stub instances
    • Allows testing using isolated instances in parallel
If you think you need RequestScope, think again:
  • I recently worked on a system from scratch with 5 engineers over a year, and we banned @RequestScope from the outset. The results?
    • Zero issues with scoping
    • Only once did we miss @RequestScope. Less than an hour of refactoring made the code simpler and eliminated the need or desire to use @RequestScope.
    • This system had > 20 major features, O(100k) LOC, multiple backends and storage systems, and supported 4+ architecturally different clients (test harness, Android, iOS, Web, ...). We did all of the things a normal WebApp does, and did not miss @RequestScope other than the note above:
      • Parameter parsing
      • Input Validation
      • Authentication
      • XSRF protection
      • Caching
    • The system was generally easy to refactor.
    • It was very easy to change our approach to various tasks using queues or other async processing, as there was no thread-affinity of any processing.
How did we get here?
  • Java is used primarily in WebApps. 
    • Successful Storage Systems are mostly in C/C++. Examples:
      • MySQL
      • Oracle
      • Redis
      • Memcache
      • SqlServer
      • MongoDB
    • Notable counter-example that does not use Guice: HBase
  • WebApps often have lots of Servlets handling HttpRequests.
    • In ~2005-2006 Struts was well in decline and Guice came along.
    • The Action / Handler style of frameworks started to emerge*
    • Parameter parsing and input validation moved from servlets to frameworks
    • Many people repeated the "Singletons are Evil" mantra
      • The driving force here was testability and avoiding the real evil of supersized static registries (had different names in many projects, often "[Project]Registry".
    • Someone realized they could inject parameters rather than pass them as method arguments
      • For thread-per-request WebApps and those without crazy threading or queuing issues such as the oft mentioned "CRUD" apps, this worked well.
        • Note that the core ActionInputs* framework approach has ~95%+ of the benefit with zero dependency on RequestScope.
      • Systems Programming efforts that mistakenly used this style started to regress in terms of:
        • Production stability
          • See outages above.
        • Latency & Performance
          • A recent profiling effort saw 1/3 of a request's latency spent by Guice creating objects that had no request state, and for the most part no state at all. This happens when the dependency tree grows tentacles in the code-base.
        • Code factoring / coupling to the wrong model (thread-per-request)
        • Very limited processing model outside of "thread-per-request"
          • i.e. can't do SEDA, pipelining, batching across requests, etc.
        • Difficult debugging
          • Stacktraces with Guice provided instances are difficult to follow
          • Can't view source of generated runtime classes (AssistedInject, ...)
          • Control flow easily abstracted from procedural code such as a FilterChain being dynamically generated per request.
Production Emergencies using MapMaker:
  • Cache memory (weak references and keys) causing extremely long GC pauses with very low cache hit rates. This is due to the weak eviction implementation needing (1) two runs of the garbage collector to mark first, then finally remove the references and (2) expiring based on a very hard to internalize concept of "time to OOM" rather than a sane LRU policy. Switching to a simple size-1000 LRU cache (well under 1% of allocated RAM) resolved this entirely.
  • Weak and soft keys causing zero percent cache hit rate with MapMaker. An attempt to use MapMaker Map with long primitives as the keys had a zero percent cache hit rate due to (a) java primitive auto boxing and (b) MapMaker with weak or soft keys "uses identity (==) comparisons instead for keys". This caused every cache set or get operation to create a new Long object with a new identity resulting in zero hit rate.
  • Why MapMaker in this post? Both of the serious problems ran into with MapMaker occured with Guice injected (non RequestScoped) implementations of a Map interface. Guice encourages developers to rely on the "magic" of their environment and makes it easy to overlook implementation details and API limitations.
*What are Action / Handler / ActionInput Frameworks: 
A full description is outside the scope of this post, but these are very common and easy to code framework approaches to handling requests, typically HttpRequests, in WebApps.
  • Actions are request-scoped (even if not by guice) processors for a specific request. I.e. a WebApp might have a LoginAction object that maintains state about a request as well as containing the code to validate, process, and return results for a request. It's state may include the user's login, password, as well as request metadata (a timestamp, security tokens, etc). These commonly have a constructor taking injected parameters (extracted and converted from the HttpRequest), along with 2 +/- methods to process and generate results for a request.
  • Handlers with ActionInput frameworks use singleton (no per-request-state) Handler classes along with ActionInputs per request. This separates input processing to be per-request and typically uses field reflection to populate the ActionInputs object. Each Handler has it's own SpecificActionInputs class extending a common ActionInputs class, with the framework using reflection to populate SpecificActionInputs per request. The Handler has very similar methods to "Actions" above, but now takes SpecificActionInputs as a method param avoiding the need for request-state to live in the Handler.