Max. bytes in a UTF-8 char?

4.

There are a maximum of 4 bytes in a single UTF-8 encoded unicode character.

And this is how the encoding scheme works in a nutshell.

Bits of code point First code point Last code point Bytes in sequence Byte 1 Byte 2 Byte 3 Byte 4
7 U+0000 U+007F 1 0xxxxxxx      
11 U+0080 U+07FF 2 110xxxxx 10xxxxxx    
16 U+0800 U+FFFF 3 1110xxxx 10xxxxxx 10xxxxxx  
21 U+10000 U+1FFFFF 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Source: Wikipedia (also confusingly showing 6 possible bytes when truly 4 is the maximum)

Wait, I heard there could be 6?

No.
You heard wrong.

Continue reading

How Java’s text Formats can subtly break your code

Today at work we found a subtle issue that will sometimes break your code in very difficult to find ways. Read this if you don’t like days of bug hunting for mysterious issues only occurring on high-load production machines.

Ever wrote code like this?

private static final DateFormat FORMAT = new SimpleDateFormat("yyyy-MM-dd");

prevent-debugging-headache
Most of us did. It’s the most intuitive way to use a DateFormat to format some Date object as a human-readable String.

Unfortunately, it’s wrong.

When used in a multi-threaded context (e.g. in a Servlet), this will end up breaking. Sometimes. When you least expect it.

Continue reading

The Breaking Float Bug is back!

Back in september of 2010 I wrote a blog post about the Breaking Float rendering bug I found in WebKit, the rendering engine backing the Safari, Chrome and Opera browsers. In april of 2013 the WebKit team landed a patch which would eventually end up in the browsers relying on WebKit. Case closed right? Well, not quitte…

Blink brought it back

blinklogo

Coincidentally, in the same month that the Breaking Float bug was fixed, Google forked WebKit into a new project called ‘Blink’. And the fix to the Breaking Float bug was not in the fork. Over time, the Chrome browser switched over to using Blink instead of WebKit, bringing back the issue in all it’s breaking glory.

Continue reading

The Football Oracle’s Predictions

Just after midnight on monday I posted my first sports-related post. I’m not much into sports myself but I happen to know someone who is and often makes predictions about sports matches that come true. So I thought it would be fun if he would share some predictions about the 2014 World Cup that is drawing to a close right now. So I wrote a post about his World Cup predictions.

The Oracle’s first prediction was on the money

Brazilian supporters crying

My ‘Oracle’ as I will call him from now on predicted that Germany would convincingly beat Brazil. Well it does not come much more convincing than winning with 7 to 1!

The Brazilian supporters where shocked at what they were seeing. Germany walked all over Brazil and they are off to the finals!

Continue reading

World Cup Predictions

I’m not into sports but I do watch the European and World Cup matches that my country partakes in. The Netherlands have reached the semi-finals so it seems like a good moment for a post. But as I don’t know anything about sports I consulted with my brother-in-law and a sports-oracle of sorts, Stefan.

Germany to win convincingly from Brazil

Neymar
With Brazil having lost it’s two most important players, Neymar who is suffering from a back injury and Silva who is suspended, Germany is a huge favourite in the eyes of Stefan. They will win convincingly he says. He points out that statistically, the German team is right up there with the Brazilian team, but that the Brazilian team without it’s key players is just not up to that level. “Only if the German team loses it’s keeper in the first minutes of the game to a red card and a penalty that Brazil scores do they risk losing this game”, he says.

Continue reading

I like my code wet

Redundancy. Repeating yourself. Duplication. Copy-paste.

These have become like curse words in IT. This data is completely redundant! Did you just copy-paste that code?

No Copy-Paste, No Copy-Paste, No Copy-Paste

We all learn early on in our programming ventures about how code duplication is bad. It creates maintenance nightmares we are told. Our code should be DRY. I believed it and lived by it. But as I am maturing in my programming I am more and more starting to doubt it.

Continue reading