Friday, September 6, 2013

C# StringBuilder Quirks

As a software consultant, I see quite a bit of stuff that makes me wonder what the author was thinking when they wrote it.  Before you start yelling at me in the comments about how full time employees are great developers, too, I agree completely.  Often, what I'm looking at is from another consultant who didn't have an understanding of the domain he was in or he didn't really care about things like performance because after all he didn't work there anyway.  Then sometimes I'm looking at my own code.  Yeah, that happens.

Anyway, I'm at a client right now where we make use of the StringBuilder class to build different output messages.  We use this feature a lot so we'd like to have it working as efficiently as possible.  When this stuff was originally written, the team just instantiated a StringBuilder object and started invoking
   1:  .Append()
with various pieces of text they want to add. Like this:

   1:  var builder = new StringBuilder();
   2:  builder.Append("Hi");

Then a consultant came in and told them how terribly inefficient they were being with that implementation.  After all, when you just instantiate a StringBuilder object with the default constructor it only has enough room in memory to hold 16 characters.  16!  That's it!  Clearly that won't be enough.  So the consultant had the team create all of their string builders with enough capacity for 2048 characters.  Now, before I go on I should probably explain briefly exactly how the StringBuilder grows.

Basically, every time you try to append a string to the StringBuilder the capacity of the StringBuilder is evaluated.  If the string you're trying to append would cause the length of the string in the StringBuilder object to grow beyond the current capacity, new memory is allocated, the capacity of the StringBuilder is doubled, and your string is added.  So if you do what we did above, the default capacity is fine (16 characters is obviously enough to hold the three characters in "Hi!").  But then if you added something else like "Welcome to my blog.  I hope you enjoy what I have to say." you won't have enough room.  You're trying to squeeze 60 characters into an object that can only hold 16.  So your StringBuilder's capacity is doubled (to 32, for those of you who aren't friends with Math) and that isn't big enough, so the capacity is doubled again (to 64).  The end result is that you have caused memory to be reallocated twice and the capacity of the object has been quadrupled.

Ok, so that wasn't very brief, but it was important.  Back to the situation at hand.  You're probably thinking that I'm being pretty nitpicky about 2048 characters, and you would be right.  It is a big deal because we're using (relatively) obnoxious amounts of memory to build strings... just strings.  Assuming standard Unicode characters in the string (which is pretty fair in our use) each 2048 character StringBuilder takes up 4096 bytes in memory, even if we only put in 100 characters (which is also pretty fair in our use).  We use this implementation hundreds of times throughout our enterprise application, which accommodates hundreds of users at a time.  Do you see the problem here?

To be sure, there is a tradeoff to be had between pre-allocating the capacity of the StringBuilder to save the time required to reallocate that memory and wasting all of that extra capacity when you can be fairly certain it will never be used.  I'd rather allow the runtime to do what it is designed to do and reallocate the capacity as needed.

No comments:

Post a Comment