The Taming of WIM How we made our wireless instant messenger more maintainable

ACL-Wireless is an Indian company whose main product, a mobile instant messenger, had become hard to maintain. This article describes how we got our source code under control, implemented an integrated build, wrote acceptance tests and refactored the system to have a more modular design.

1 INTRODUCTION

ACL-Wireless is a private Indian company founded in 2000, with just over 200 employees. It provides mobile value added services to operators, enterprises and consumers. It has featured prominently in the Deloitte Technology rankings for India and Asia Pacific. It pioneered instant messaging over mobile devices in India. [1]

Our company’s main product is an instant messenger accessible over a variety of wireless device interfaces such as SMS, WAP and Java devices. This program is interoperable with four popular instant messaging (IM) networks. We wanted to leverage our knowledge of wireless IM, but we repeatedly found this hard to do because our code base was difficult to work with. New programmers would often quit because they couldn’t understand the system well enough to modify it, add to it, or use technical infrastructure from it in new applications. This article describes how we got our source code under control, implemented a single build which we then ran on every commit, wrote acceptance tests and refactored the system to have a more modular design.

2 EXERCISE ONE

I was asked to implement a wireless village interface to our system so that we would meet the new standard for wireless instant messaging. I expected I would move functionality from the existing servlet down into a common application class which the new interface could share. I wanted to be able to run the system on my computer in order to learn about it. Unfortunately it would only run on a very carefully configured servlet container which few people remembered how to set up. Each programmer owned their own modules which they had sometimes inherited from others. Some source code was lost because their owning programmers had left, and our system was running off their binaries.

I started out by tracking down the current owner of each module and importing all available source code into CVS. We removed dependencies on modules whose sources were no longer traceable.

Programmers would email their jar files across to each other so there was an implicit dependency graph in their heads which I was able to learn by understanding who emailed whom. I set up an ant build for each module with an ftp upload target to replace such emailing. Each module knew its dependencies and would download them via ftp before building itself. A top level build would delegate to the module build files and later collect all the module jars into one place.

This was the beginning of a single comprehensive view of our software which would later turn out to be an enabling factor in refactoring WIM.

3 EXERCISE TWO

A year later I was asked to simplify our database interface module, a class called StorageMediator. StorageMediator’s source code had earned itself a reputation for being extremely fragile. As a result, there had only been additions to it over the last few years and it was now 3000 lines long. The two main problems were duplicate code within the class and the existence of domain logic which ought to have been in higher layers. I first grouped domain methods by functionality, then I moved each group out into its own class, leaving delegating methods behind. This left us with a number of services for registration, profile management and searching for and inviting people. Domain specific data access code was separated into a number of Repositories [2] which used StorageMediator internally. Generic data access code was left in StorageMediator. I simplified these remaining methods one at a time until I could identify duplicate implementations which I then removed.

At the end of this, StorageMediator was about 500 lines long and still backward compatible with the original version. We had six or seven new domain layer classes extracted from it, conceptually above it, but still in the same module.

4 EXERCISE THREE

Seeing that even supposedly untouchable code could be simplified safely on a live system, my project manager asked if I would do the same for a domain layer monolith called BackEndInterface. He realized that our domain layer needed simplification if we wanted to keep new programmers interested in our code base. BackEndInterface was 13000 lines long and I wondered if I needed a fundamentally different approach to tackle this one than the method by method refactoring that I had previously applied to StorageMediator.

As the two of us worked on simplifying BackEndInterface we slowly increased our team size. With a larger team I could generate various reports and post them as tasks on the wiki. We increased our scope to simplifying the entire WIM domain layer, with our special focus on BackEndInterface keeping things in perspective.

4.1 Some Requirements

Our refactoring team’s customer was the feature development team. We had one member from the development team on our team full time. She gave us direction for our refactoring. Early on we listed the points of most pain for higher layer programmers who had to work against the domain layer:

4.2 What we did

CVS had been abandoned since my previous work. While I had shown developers how to use the new build system to do what they already did, I hadn’t demonstrated to them the advantages of a comprehensive view of the project source code. Since their existing system of dependency management worked, they weren’t motivated to learn a new technique. The first thing we did was to bring everything back into CVS. Then we created build scripts for modules which didn’t have them already and updated the top level build script. We connected developers to the CVS versions of their modules. Now we could load each module into Eclipse [8] as a separate project, tell Eclipse about the inter-project dependencies and then perform system wide refactorings. Previously, it had been impractical to change API’s in lower layer modules. Now we could do this easily. For instance, we could inline the StorageMediator delegating domain methods and move the service classes extracted from it up into the domain layer. Many higher level modules no longer depended on the StorageMediator module directly and StorageMediator did not depend on domain layer modules any more, so we could simplify the inter-module dependency graph.

Though we were free to change API’s, we actually only committed such changes at night and sent out email to the feature development team to update their local copies of the modules when they came in in the morning. It would have been too disruptive to their process if we had changed published interfaces multiple times during the day and expected them to keep downloading their modules’ dependencies every time we did this.

Previously it was common for programmers to develop features for weeks at a time before they got even their own modules compiling again. We organized a continuous build system with CruiseControl [3]. This was a concrete way to have a buildable system every day.

I looked at CodeCrawler [4] and OptimalAdvisor [5]. CodeCrawler told me about inter-method dependencies in BackEndInterface. This helped me group methods within the class and then, as with StorageMediator, extract them into their own domain services. We moved each service into its own package where it was further refactored into smaller objects, becoming a facade itself.

After some pair programming my project manager was comfortable doing this on his own and I got to experiment with PMD [6] and condenser [7] which can both detect duplicate code. I put the generated reports on the project wiki and a second pair from our team took on the task of eliminating the reported duplicates. I would rate each wiki task’s difficulty from one to five. Newcomers to the team could start with the easier tasks. This wiki task list was key to how we made improvements at so many levels during our exercise.

We used IntelliJ’s [9] static code analyser to detect code which was unreachable from any of our entry points. We also wrote an awk script to detect large bodies of commented out code so we could remove them. Often such removals would reveal more unused code. So these tools gave us starting points for simplifying the system.

One pair, which included a person who knew the system’s functionality well, took on the task of writing tests using the program’s text message (SMS) interface. It was easier for our team to write comprehensive acceptance tests than comprehensive unit tests, because each acceptance test was similar in structure. Writing unit tests usually required first refactoring the unit to be testable.

We wrote a fake implementation of Versant ODBMS’s TransSession class and related classes so that we could run StorageMediator tests in memory and also on operating systems not supported by Versant.

We found that an elaborate locking system implemented in StorageMediator was completely redundant because it wasn’t being used correctly. We removed it.

Different modules needed different versions of Java. We removed such limitations (in particular our dependency on the non-blocking I/O library which needed JDK1.3) and got everything to compile successfully with JDK1.4. Once we did this we could combine and split modules freely.

We removed various caches, thread pools and other unnecessary generality. Where appropriate we moved code onto persistent data objects which had been kept dumb so far.

Each module had its own copy of an in-house logger. We replaced all but two of these copies with a single adaptor for log4j’s logger. The remaining two loggers had been modified to print their output in a particular format parsed by an independent billing script. We left them alone.

We created a User class to encapsulate the various representations of the user identifier found in the code.

Case logic for the four external messaging networks we interoperate with was scattered all over the place. Most of it has been moved into a ForeignNetwork class hierarchy.

We explored Versant to learn what sort of class structure modifications it could automatically reconcile with the database schema. Once we knew what we could safely do, we were able to refactor our persistent classes as well.

We showed the development team how to use the remote debugger to connect Eclipse to the servlet container. This sped up our periodic stabilization procedures greatly.

4.3 Result

BackEndInterface is now about 200 lines long. The extracted services are independent of each other. They were formed from BackEndInterface and the other two big modules — utilities and search utilities, which don’t exist any more. The domain layer’s package structure and the developers’ mental model of the system have converged. As a result, the core team agrees that the domain layer is much easier to understand.

4.4 Notes

We merged a month’s work on our CVS branch with the trunk successfully. We were aware that making large upgrades to running software is risky so we allowed a full two weeks for stabilizing the merged code and closed our doors to new feature requests during this time. A diff based merge makes it hard to see refactorings, so instead we actually merged the new features on the trunk into our refactored code, since those changes were easy to see with diff. After the first merge, we merged more frequently.

While we did write many unit tests, we didn’t always write them for every refactoring. One reason we got away with the bugs introduced was that we had one expert from the core team with us and thanks to her ties to the core team, they would help us identify where we’d broken the code. Everyone felt that we were onto something good and so the testing team was more tolerant of bugs we introduced than other refactoring stories tell us people will be.

5 LESSONS LEARNED

6 CONCLUSION

Even a commercially successful product sometimes needs its development process improved. We have been able to continue to be a successful company largely because we learned to manage our source code better. We recently forked WIM into a new product aimed at a different market. This was precisely the sort of leverage that management had desired of our code base, but had been unable to get before. In the fork, we discarded modules we didn’t want and adapted others. CVS made forking easy and the inter-module dependency graph showed us which modules could be removed without breaking the system.

In addition to the goals we originally set out to achieve, we continually see other benefits of our exercise. For instance, management had always wondered why development stopped during the weeks of testing leading up to a deployment, even though the testing team was different from the development team. Now we can test and stabilize a CVS branch, while development continues in parallel on the trunk.

In order to effect change, it’s essential to address both the technical and the social aspects of a problem. As an example, it was only on the second try that we got our team to use CVS. How the social aspect is addressed is not nearly as important as knowing that it does need to be addressed, and that it is at least as fundamental as the technical aspect.

References

[1] <http://www.acl-wireless.com/>.

[2] Eric Evans. Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley Professional. 2003.

[3] <http://cruisecontrol.sourceforge.net/>.

[4] <http://www.iam.unibe.ch/˜scg/Research/ CodeCrawler/>.

[5] <http://javacentral.compuware.com/pasta/>.

[6] <http://pmd.sourceforge.net/>.

[7] <http://condenser.sourceforge.net/>.

[8] <http://www.eclipse.org/>.

[9] <http://www.jetbrains.com/idea/>.