Do Not Get Deep In ... Mud

2012/03/01 12:32

General advice points for programmers summarizing practical experience and conclusions that are still surprisingly perceived as controversial. This post wants to help get rid of enterprise software development superstition and activate critical thinking abilities.

This post contains a lot of lies. However, those are lies in a sense "a Newton's theory of gravity is a lie". We know it is a lie, because we know it fails in some conditions, but we also know those cases and can act accordingly. So let's say this post contains a lot of simplifications, though useful ones. Some points are valid just for Java-like languages, others apply to a wider selection.

Do not use XML for anything... Ever

This point is big so I will start with it. Many will hate me and my family for this. The industry is still recovering from XML and not all people realize that. Introducing XML brings more problems (that are quite hidden at first) than it tries to (questionably) solve. XML was not designed to contain structured data, it was designed to decorate text files with some simple tags. (Arguably poorly — I recommend Douglas Crockford's presentation, where he gets to XML at minute 24)

An especially bad idea is to generate, bind, or modify code according to XML files. If you work in statically typed language like Java, XML completely subverts its type system (and you spent so much time to learn those badly designed restrictive generics to embrace it a bit more). Refactoring tools generally stop working and any change to code potentially introduces some mysterious run-time exceptions.

The worst practical examples I saw was using XML to write input validators that automagically generated modules that hijacked the control flow and inserted the validation steps. There were many bad things about that in practice, the worst being spreading multiplicity of validators that did the same (or slightly different) thing in different places, and binding to classes using their names (that are not updated using any refactoring tools). A real maintainability nightmare. Similar reusability problems affects XAML for .NET and GUI for Android, although there was quite a lot of time invested to design it as good as possible to provide some benefits as a compensation.

I hear you saying: "I use it for DI with XYZ framework and it makes my life so much easier." I am pretty sure that you mean Dependency Injection by DI here. You should first think Dependency Inversion instead, which is the indisputably essential principle behind that and it does not require dependency injection as a mechanism. If you use dependency injection, do others a favor and isolate it in one place and prevent it from leaking to other parts of the system.

For your custom data format you send or store, use JSON, it has some warts, but is orders of magnitude easier to work with and library APIs are significantly simpler and can map JSON to language data structures smoothly. (Heck, you can even write your JSON parser in one lazy afternoon if you do not like the available ones — trust me, I did it.)

Use the language in the way that fits the problem

There are many cases when programmers do or do not use some parts of language because they read or heard about it and remember one or two related anecdotes. The problem is usually that the original context is not emphasized or is completely forgotten. There is always room for a reasonable doubt, and you should use it to your advantage. In this point I try to address weird habits I met the most.

Do not be afraid to use new to create an object. There is nothing wrong with instantiating your objects in your code. You do not really need to let some magic create your objects according to some holy Egzemel scrolls. Just keep in mind design principles like dependency inversion, separate and/or isolate the object construction code if you need flexibility, and do not literate your code with new statements uncontrollably.

Beware of getters/setters. They usually point to a flawed code design when not used in very special cases like a context object with many internal states and not too limited transitions (e.g. canvas, external device object, ...), builder objects to construct a consistent object incrementally, and a few others. They too often lead to too tight coupling, difficulties with concurrency and keeping the object state consistent. If you need something with publicly exposed properties, it is usually much better to expose an immutable data structure instead.

Do not design for future inheritance I did not say future extension. That is a different thing. Inheritance is just one form of extending a functionality; and in most aspects the inferior one. When you speculatively use inheritance for future extensions, it is essentially predicting the future — a skill in which we especially suck. Inheritance can be useful for code reuse locally, but even then there are limitations. A common way to avoid some traps (e.g. breaking transitivity of equivalence) is to make classes either abstract or final. However, even then there are too many hidden contracts that must be satisfied. For future extensions, it is easier to design an interface or more, and put there just what is really needed (unless you are publishing to an unknown audience you care about, then you rather think this through much more thoroughly). If it seems that the user would appreciate some template code behind the interface, the interface is either too big and should be split to several smaller ones, and/or you can provide the required functionality in separate library functions.

Use immutable state by default. Majority of programmers now agrees with this to some extent, if you do not then you are either very lucky or very careless. Having your objects (data) immutable simplifies too many difficult things: security, concurrent access, deciding where and how to defensively copy, caching, equality comparison and sorting, storing in maps and sets, ... Even just reasoning about the code becomes simpler and less context dependent. Actually, if you use mutable objects in maps and sets (particularly sorted ones, or when you redefined equals+hashcode functions), please stop, or find another job.

If you are in Java for example, abuse final as much as appropriate (not necessary to use it for method arguments and all local variables). Fortunately, many new languages significantly simplify defining immutable fields and variables, that you do not have to think about it much anymore.

Do not wrap or extend collections. I cannot even count how many times I saw an object like People or Team, that has a collection of Person objects as its only data. This may look like a nice encapsulation, especially when you add some generally useful functions to the object. However, if you then see a tenth class encapsulating the same list while replicating some of the functionality from other similar objects, or even worse, you start seeing Team object where it does not even make sense, just because "Team is a kind of collection of people, and we need people-like object here, and we are reusing the code!" it is not a very positive feeling (the sound of your.teeth grinding does not help much either).

Standard collections coming with the language are already a very good and flexible abstraction. You can use a set of Person objects where you want... well a set of people. You should use a map of a string to a Person object where you expect a name—pesson mappings. If you need a special functions to process collections, you can write them as static functions that may even work on many more collections than you planned for.

Do not use checked exceptions. The debate is over here. There are good reasons why no other languages (even those considered safer than Java) use them. Languages on JVM must even actively work around them (usually by declaring all functions as throwing Exception base class).

Do not become a slave of a framework

Here I do not mean .NET framework or java.* libraries, or any set of useful composable libraries, but frameworks like JSF, ASP.NET, or Rails. These frameworks try to do a lot on your behalf (including class names, project file structure and automatically generated code and configuration files) and let you hook your code in some predefined form to some predefined places. It works like magic until it starts doing something unexpected or not doing something required. At that point it is often already quite late to redesign the code and involuntary hacking takes place.

The problem is that these frameworks unlike e.g. language libraries are not made to be taken apart, one part identified and customized, and put back together. They are the finished program and your code is its plugin system. A plugin is not supposed to customize the program outside of the plugin's responsibility. The only viable way is to take the control back and isolate the framework as an implementation detail from most parts of the application. Usually, stripping down the framework to its smallest essential well-understood core and separating that is a good start for a very predictable behavior. Then it is possible to start incrementally enabling more original features only as needed and replacing by your own if the functionality does not match the expectations, preventing getting stuck in a long investigation of what the hell is going on... again.

Do not create your own framework for others to use. There are several typical outcomes: Your framework will be weak, preventing others from having their work done; the framework will force others deforming and twisting their code to the will of The Framework; the framework will almost approximate the framework that came with the language in the first place, accomplishing less than nothing. Eventually, everyone will hate it including you, because you will be constantly asked to put there something you do not want to. If you have the temptation, build reusable libraries that anyone can use or ignore as they wish instead. That is already difficult enough.

ORM is for losers

A critique of Object-relational mapping is another big point that may cause you to want to dance on my face so I saved it until the very end and use something called a metaphor that I am particularly proud of.

ORM is like riding a bicycle with support wheels without noticing that it can be done without them ("what an insane idea, that cannot work, I know my physics"). You think you are riding a bicycle and cannot understand why people are laughing at you and everyone else is actually much faster and more elegant without even trying. If you do not take the "leap of faith", throw away your support wheels, and take a bit of effort to learn the skill properly you will never know. You may even invent best practices for using the support wheels, and write books about support wheel types and handling differences. You may even win every argument by pointing out that you will beat anyone riding without support wheels in a race across a frozen lake (even though everyone else would succeed faster just going around it). Other people will enjoy riding a bike freely, which you will never experience.

Conclusion: Be rational, skeptical, but not cynical

I wanted to put pragmatic to the subtitle, but it is being used more and more as an excuse for being sloppy. (If you hear your manager with weak understanding of the code saying that you are taking a pragmatic approach, more often than not it means the wrong, short-sighted approach). So I chose values that are generally useful even outside the world of programming.

To be skeptical does not mean dismissing possibilities just for the emotions it brings to you. Often you can be right, but I saw too many "You mean this is from XYZ? That must be expensive crap. Poor suckers that have to use it." Many times without even realizing that it is for instance a strongly supported open source project that became popular just because it efficiently solves the exact problem you need to solve.

Always use the right tools for the right job. Learning what "right" is is the tricky part, but there are smart people in this area and sometimes it is "only" about putting aside prejudices and the "enterprise" superstitions for new possibilities to appear. Then mix in some critical evaluation of the real compromises in the available approaches, and just try more of them if you are not sure. Try to commit to decisions about fuzzy areas as late as possible. This experience becomes priceless and gives you a lot of credible leverage (hopefully) when dealing with someone with very narrow and skewed field of view later.

· · · · ·

Why In Programming Education Matters

2012/02/18 18:11

A story about a small team's success in a mission impossible like project. (And also how having a good team is a very important prerequisite.)

It started as a nine month contract work for a relatively large and growing global company quite some time ago. The team of contractors was coming through the same small local company and was quite gelled (we studied or worked with each other, or had common friends). The project we were assigned to was cancelled after a couple of months and to honor the remaining months in the contract a new task was identified for us. There was no one that would be willing to work on it and we suddenly turned out as a free resource.

The project itself seemed to be quite tedious. It was about converting some dynamic pages customized per client from one obsolete ad hoc technology to another completely different. Data and a sort of template comes in, HTML comes out. All templates were designed by someone else, all quite poor quality, all using deprecated and misunderstood HTML constructs not working on newly appearing browsers (it was the time when IE6 ruled the world). We were supposed to not just convert them, but to test and fix them all to look exactly the same everywhere as the original in IE6. There seemed to be nothing reusable. An average template took about 2–4 hours to convert. There were thousands of them. There were 4 of us. Do the math.

We Are Not Monkeys (we are apes, well, technically monkeys as well)

The first thing that we noticed was that many of us replicated work of others as some templates were created using copy–paste–modify, but there was nothing to suggest which were similar to each other and how much. In addition there was a noticeable pattern of biological evolution (with a weak natural selection) in action; several simple designs blindly combined and mutated evolved to extreme complexity (table-based designs of large nesting depths, redundant code with no function, etc.).

Once we went to lunch together and started complaining and hypothesizing what we would need to be able to deal with the problem effectively. Maybe there can be an unfortunate person selected to just looking at the templates and assign to the people appropriately. However, there was no clue even in the visual appearance what the code looks like (some visually similar could be (re)coded by different people, some different by copy–paste and changing sizes and images). Code itself was a mess and formatting was not consistent or absent (I did not mention that HTML was created by string concatenation in a very unsuitable language). So for a while we thought we would need an artificial intelligence to help us. But then me (studying more practical computer science at a university) and my colleague (studying more theoretical one) started remembering some of lessons we went through.

We are able to compare two files to get their distance in terms of number of keystrokes needed to convert one template to another using Levenshtein Distance. That seemed to be a good approximation expressed by one number and mostly ignoring formatting.

Then we would need a plan to order the templates how we should proceed from one to another reusing the one before as efficiently as possible. We remembered Travelling salesman problem and quickly dismissed as not the right way. We did not need the closest template to the previously finished template, we needed the closest template to any of the previously finished templates. The education jumped in to provide immediate answer: Minimum Spanning Tree.

Since the produced tree is planar (can be drawn nicely on a paper) we wanted to visualize it as clearly as possible to see how we can tackle the problem further. We remembered that one of Java demos does that kind of thing, so we knew it is possible.

After this lunch we were quite excited that we can try to solve our problem. Moreover we would need to do what we enjoyed much more — to program.

The Next Three Days

Levenshtein distance computation is usually solved using dynamic programming and turned out to be short, simple, and the program fast enough to compute a matrix of all distances between all possible pairs of templates. For efficiency we used C++ (about a hundred of lines). This matrix was fed to a program computing the optimal tree that was very fast and took not much more than a hundred of lines of C# (a fresh new language that time). The result (a set of template ids, connections and their distances — nodes and edges of the tree) was loaded to a heavily customized Java graph layout application. When the layout was (with human assistance) done it was exported to a vector format and printed on A1 format paper (area of 0.5 square meters).

It Works!

After (and for some of us even during) this work was done we were very positively surprised. It appeared to be even better than we hoped and extrapolated from our small sample we already finished converting. There were very visible clusters of templates and we could effectively split the templates just by drawing a loop around a cluster to allocate and color the finished templates when done. Every time we moved to a different cluster, we could get to its most typical template and have most of the work on the other templates done.

These three days to get the tools done sped us up orders of magnitude of most of the time. There was almost constantly some one at the paper battle plan to allocate new work or mark the completed templates. Of course, the faster you worked, the easier work you could allocate for yourself, so everyone could chose its pace. Different colors made obvious who is the least productive. Nobody wanted that displayed on the wall.

We finished 99% of the templates in a fraction of the estimated time to everyone's surprise. Our approach could identify those 1% templates that would get us stuck for a day each. Postponing the work on those enabled us to further reduce the useless work on templates that expired or were redesigned by a client while we worked on easier ones.

We Are Hiring You – Gelled Shmelled

The work we done impressed a few people enough that one CxO took our battle plan as an inspiring souvenir and we were offered to become regular employees (in fact, we were bought without actually buying our company).

People from our team were dissolved into the existing teams and we started to work in the way others did. Only very infrequently, one of us was asked to automate something (if we came with the idea first). In time the teams split and merge, people were transferred between projects and teams, and our initial success has been forgotten for it happened in those old times with no traces left. That is why I wrote this. Conclusion is that education sometimes matters big time, but the culture of many companies matters even more.

· · · · ·

How to Get TDD Under Your Skin

2012/02/06 01:15

It may be hard to force yourself doing proper TDD (test-driven development). Particularly, if you spend a lot of time developing in an environment hostile to it or other agile practices. It appears that most of the time the biggest obstacle is yourself.

It is good to keep reminding yourself what TDD is, because it is quite easy to keep forgetting the basic steps you should follow (not just try to):

  • Before you write any code, you must write a failing test for it.
  • You must write just the smallest amount of code to make that test pass (often using ugly tricks).
  • You clean/refactor the code to remove duplicity and improve design without changing behavior or adding functionality (also applies to tests), keeping all tests passing.
  • Repeat.

There are many positive effects of using TDD in a right way. Among the most useful are meaningful unit tests covering almost all of the code and highly decoupled code. But you can find about this all around the web (e.g. TDD on wiki); I want to focus on different aspects here.


The first thing to clear up is the persisting confusion about all the contradictory facts you can find about TDD "controversy". Some critique is based on common fallacies as straw man fallacy, argument from ignorance, appeal to authority, appeal to majority, etc. Those are relatively easy to identify. However, there are substantiated claims that appear to demote the usefulness of TDD. In these cases careful analysis what the critique is about is required. And it seems that the most important thing about these claims is a context for which they are applied.

There are very interesting talks about related issues that are worth seeing like Rich Hickey's Hammock Driven Development, or Dan North's Patterns of Effective Delivery that may seem to be partially TDD-discouraging. The pattern here is usually that those are indisputable experts (they mastered the subject above the level of common competence).

You can get to this level of expertise by different paths, but some are very difficult. You can decide to not use TDD at any time, but there seems to be enough evidence now that deciding that before you actually master it and can see all the implications would be very irresponsible at the very least. This decision does not usually affect only your future you but also other people that may want or need to use or touch your code. People are very bad at predicting the future. Moreover, people are very good in underestimating their inability of predicting the future. That means that what you think now about the future is mostly irrelevant.

My Bag of Tricks

One small thing I found very helpful is to have something reminding you about your good intentions you started with. The famous green wrisband works for me and there is a chance it may work for you as well. The important rule is: Wear it as often as possible and never write code that does not respect the best practices you know with the band on. It should feel shameful to take it off. To get it is easy, to deserve it — not so much.

As with any art or craft the most important is practice. However, not all variants of practice are equal. The one that is confirmed by many studies (a very readable one) to be the most important is deliberate practice. One of the increasingly popular form of deliberate practice is code katas.

Code katas (named after katas in karate) is a complete solution to a relatively simple problem mastered to a certain degree of perfection. The main goal is to master some parts of the process so that it becomes automated by your subconsciousness and your brainpower is free to be used for creative parts of the process (similarly to how touch typing can be automated to let you think about what you want to write). The important aspect is that you learn how the process should ideally look like, preventing bad habits from creeping in. One additional positive side effect is that you also improve your skills in using your text editor or IDE.

For TDD the code katas let you almost automatize the cycle of test, code, refactor. Not having this cycle automated it is quite difficult to follow it unless you keep focused on it all the time.

When Not To Use TDD

Short answer: You just should. And even if it seems you cannot, you should try to find a way.

Longer answer: When you are sure you know what you are doing.

There are common cases where TDD does not make much sense. For instance a GUI that does not do more than setting/reading some properties and calling methods may not be even possible to design using TDD. There are pieces of code (proofs of concepts) that you are certain will be thrown away soon, however, make sure they are thrown away. I already mentioned the "ability" to predict the future and there is quite a lot of code in use around that was not meant to survive.

Remember: Lack of good unit tests leads to potential chaos, chaos leads to anger, anger leads to hate. The hate of your future you can be quite frustrating, although just for your future you. The hate from your future colleagues can be even dangerous (the company/manager does not always pass as the target to blame)!

· · · · ·

Applying for a Distant Apprenticeship

2012/01/24 01:15

Solving a test problem of Tic-Tac-Toe as an apprenticeship application and a shift of my thinking about programming such tasks.

I spent many years of developing software, but something was still feeling odd. People happily used the software, but I was the one that knew that under the cover the process of creating it was more resembling kids playing and experimenting with colored crayons than predictable process steadily increasing the products' value.

Although this may be a bit exaggeration, as I looked into an abyss of desperation of company's "global framework" and in comparison, what I was doing was actually fun most of the time (except for the times when a tiny fraction of the framework was needed in our project and it was inseparable from the rest). I still felt the limits of what I can do the way I was doing it, especially when some architectural change of the system was involved.

I was seriously contemplating starting over in another company when I saw a tweet and a blog post from 8th Light about getting new apprentices that would have the opportunity to learn from 8th Light's craftsmen and one of them being (OMG) Uncle Bob. As I live in Europe I had no illusions I can become one, but there was a test problem to solve attached to the application. I really like solving problems like this. It was a TicTacToe program that has a computer player that never loses.

The first version

I jumped right into the problem. I decided to use Javascript since everything from the code to the GUI can be packed into one HTML file that anyone can run. In addition, I was discovering the Lisp roots of Javascript and the implications of that and that it indeed had good parts.

After one or two hours the first version was ready to be sent. It tackled the problem in a very straightforward way. The entire tree of possible moves was computed (as the problem space is reasonably small) and traversed in a manner of Minimax algorithm (simplified as the entire tree was known) and winning/losing branches were stored in the tree nodes.

Happy with myself I sent the solution back. There was a bit of discomfort about the code having no tests and design fitting just this limited version of the problem, but I had good reasons for that, did not I? How could I develop using TDD Javascript in a webpage and still keep it in one file, right? Or so I thought.

The reply contained two questions: How the code works (as it was indeed clear to me, but having a second look, it was rather tricky to decode) and how can I be sure that it never loses.

I tried to explain as well as I could my solution and how it maps to the code... One thing became clear: Just by making small updates of the code and using a consistent terminology, it could have been much easier to explain. Self-confidence -10%. The next part seemed exciting. How can I be sure it never loses? I know, I did it for several years at the university: I will prove it. So I replied with a detailed and rather long formal (as far as I could tell) proof. Self-confidence +20%.

The second version

The reply had quite a surprising twist: "When I want to make sure it does what I want, I will write a test". Self-confidence -50%. It was so easy and much more error-proof solution than what I spent my time on. Especially in this case when I could just test all the possibilities. But there was a catch. To be able to test the algorithm, it needed some non-trivial changes. These changes would rather be supported by tests as the code was tricky. Thus: I need a testing framework. After some searching and considering different options I came to conclusion to add a minimal test framework for this purpose. I knew that making one is not terribly difficult from Kent Beck's TDD book, where he reconstructs one from scratch. The result surprised me. For illustration, this is what I came up with in a couple of minutes:

var test = { assert: function (x, message) { if (!x) { throw {name: 'TestAssert', message: message}; } }, setup: function (resultElem) { return function (tests) { var name, s = '', passed = 0, failed = 0; for (name in tests) { try { tests[name](); passed += 1; } catch (e) { s += 'Test failed in ' + name + ': ' + e.message + '\n'; failed += 1; } } = s ? 'red' : 'green'; resultElem.innerHTML = passed + ' passed, ' + failed + ' failed'; if (s) { alert(s); } }; } };

Calling setup() with a DOM element for displaying test results returns a function to apply to an object of test functions. It was all I needed to proceed and it was so simple. The second version was created and tested using it and I was actually much faster than doing the formal proof and it was executable, repeatedly. The proof would become obsolete after certain code changes, tests would not or would tell me so. I was just too focused on rationalizing why I cannot do certain things like unit testing that I forgot to focus on possibilities how I would be able to make them work and what would be the implications.

After this experience my informal distant version of apprenticeship had started. But in fact, it had already been running since I got my first feedback that challenged my experience in quite unexpected ways.

· · · · ·

The New Beginning

2012/01/23 19:45

My experiences with professional software development and realization of how the perceived connections with science, engineering, craft, and art are skewed in the process we use to learn and teach creating and maintaining software.

I have been professionally programming for almost a decade now. Last seven years I spent as a software engineer in a relatively big multinational company whose business is web based. I went through different teams with very different attitudes and intentions and there were not many moments from a programmer's professional point of view that were happy or satisfying.

I read a lot of articles and watched many talks about agile practices. It seemed and still seems as there are many practices and methodologies that worked for some and failed miserably for others. Reasoning behind that seemed always very logical, but explaining (or rationalizing) reasons of success or failure after is usually relatively easy and as it appears not as useful for predicting outcome of other projects as it may seem. One reasonable hypothesis is that so many parameters are changed after the project starts, that assigning it to a certain group of similar projects that succeeded may be harmful if the project turns up not as similar and further adjustments are disallowed.

When I started in the big company there was the pure "waterfall hell" about which every developer hears horror stories and which inspired many Dilbert comics. But I was quite lucky. I came from a university (after two years of unfinished PhD studies) and I was the programmer among many web developers. I was doing the same monkey job as others, but I was able to automate 90% of my work and soon many tools were used by others. I went through more teams with similar results. Then I ended up in a team where I was the senior software engineer (aka .NET programmer) among DB engineers. I was responsible for how everything is done from the web GUI through server to the border of database. As you can imagine, if you are responsible for everything from estimates to development practices (as you are the only participant on inside), you can learn a lot and save time to learn even more, or so I thought.

It is true that I managed to learn tons of new things theoretically and was able to explain what are the advantages of TDD, BDD, small cycles, design patterns, etc. But my job was not reviewed to the level of code and at the end of the day all the projects I did were still plugged into 3 month release cycle with one month of QA. Obviously, I did not care about BDD, there is no direct client to deal with on a daily basis, I knew what I need to do from a project specification. Of course, when I needed to get back to the projects I worked on before it was much less fun than starting new ones and the time to fix or add features was shorter when done directly without TDD (for the rich web apps needed much more care with TDD as the code was spread across many environments). And even if I had some tests from start soon they became quite obsolete and pointless. I promised myself next time I will do it right too many times.

The realization

Then it occurred to me that for me it is like learning martial arts from books. I had some martial art experiences to recognize the pattern forming in front of my eyes. No one says that it is impossible to learn something like martial art by other means, but without a master-teacher, you are seriously handicapped.

Humans usually tend to overestimate their knowledge in a field they learned a bit more about than people around them. This causes them to form opinions. Once they have an opinion (even worse if they are already expressing it publicly), it causes a bias in the following learning process as they tend to ignore what contradicts and emphasize what supports it. With a teacher or a master the situation changes (imagine an exam at school, where you say that you have not learned the special relativity, because you think that is not as a good description as string theory might be in the future; or imagine the pain (literally) if you tell your Kung Fu master that you think that this kick is better in this situation than that jab he is teaching you).

What I needed is a discipline. But I had already all the burden of years I was programming I needed to get rid of and the solution is not as easy to telling yourself from now on I will respect all practices from this particular arbitrary book. The brain got wired somehow and controlling this brain by the same brain may need quite strong will (whatever that is). I needed a motivation.

Not long after, I read a tweet from Uncle Bob about apprenticeship call from 8th Light involving solving a test problem. I like solving problems. That is how my new journey started. And more about that next time.