Basics of Javascript · String · codePointAt() (method)

Jakub Korch
Nerd For Tech
Published in
5 min readMay 23, 2021

--

This article is a transcript of my free youtube series about basics of web development. If you prefer watching over reading, feel free to visit my channel “Dev Newbs”.

Hello again fellow developers. What has this day prepared for us now, you ask? Well, we have a treat today. Today we will be talking about a method that originates from the Javascripts’s older bigger sister. And the method is no other than codePointAt().

Okay, so what’s up with the remark about big sister? And what language did I mean? Correct answer is that by bigger sister I meant Java. See, when I was preparing this episode of String methods, I obviously had to do some research. Funny thing happened when I was looking for some info about this method on my favourite websites. Two of them pointed me towards Java method codePointAt(), instead of the Javascript version. To make it even more interesting, the description of the method is actually spot on definition of Javascript version. This led me to the conclusion that Javascript “borrowed” the specification of this method from Java. All right, that’s it for the trivia section of today’s tutorial, now let’s get back to the explanation.

The codePointAt() method returns a non-negative integer that is the UTF-16 code point value. Emphasis on the code point.

let mySoSoLongString= "Hi devs! 🙂";mySoSoLongString.codePointAt(0)         // 72
mySoSoLongString.codePointAt(1) // 105
mySoSoLongString.codePointAt(2) // 32
mySoSoLongString.codePointAt(3) // 100
mySoSoLongString.codePointAt(4) // 101
mySoSoLongString.codePointAt(5) // 118
mySoSoLongString.codePointAt(6) // 115
mySoSoLongString.codePointAt(7) // 33
mySoSoLongString.codePointAt(8) // 32
mySoSoLongString.codePointAt(9) // 128578

We of course have some edge cases to cover. If there is no character at a given index, method returns undefined. To get this behaviour we can either use a negative integer as index or we can use an integer value that is equal or higher than the length of the given string.

mySoSoLongString.length                 // 11
'🙂'.length // 2
mySoSoLongString.codePointAt(-1)); // undefined
mySoSoLongString.codePointAt(11)); // undefined

If the value provided as index value can not be converted into integer value, the method will default to using index equal to zero.

mySoSoLongString.codePointAt("X")        // 72
mySoSoLongString.codePointAt(NaN) // 72
mySoSoLongString.codePointAt(null) // 72
mySoSoLongString.codePointAt(undefined) // 72
mySoSoLongString.codePointAt(false) // 72
mySoSoLongString.codePointAt(true) // 105

First four examples have index values that can not be converted into integer, ergo the default value of index equal to zero is used instead.

Last two cases are a little bit different. Both boolean values true and false can be converted into integer — false is zero and true is one. In case of false it does not really matter, because the result is the same as if false was not convertible — the result is using zero as index value.

In the case of true it is a bit more interesting, because the index value we use is one. So we get the code point at position one.

So far, so good. Seemingly there are no issues with this method. Finally. String method that works with no strings attached. Well, I am sorry to burst your bubble, but this story is going downhill from now on.

The string from the previous examples had an emoji character at the end. We got the actual unicode value for this smiley face, because I pointed the method at the correct place — at the index of the first code unit creating the emoji symbol. The emoji consists of two code units starting at index 9 and ending at index 10.

But what would happen if I gave index 10 as an index of the character I want to get code point from? Well, let’s see together in example 4.

mySoSoLongString.length                 // 11
'🙂'.length // 2
mySoSoLongString.codePointAt(9) // 128578
mySoSoLongString.codePointAt(10) // 56898
mySoSoLongString.charCodeAt(9) // 55357
mySoSoLongString.charCodeAt(10) // 56898

I am outputting the length of the string just as a reminder. Next output shows that emoji is indeed two code units long.

Then we have the code point value from index number 9. We get what we expect. So far so good.

But then we get the code point value from position 10 and the problems are here. What we got is actually just a value of the code unit in the specified position.

Check the next two lines of output where I output code value of code units instead of code points, using the charCodeAt() method from the previous article . In the case of position 9 we get different results, but in the case of index 10 we get the exact same result.

Problem is that codePointAt() method just like any other String method does not know how “big” the character is.

Most emojis and special symbols are either one or two code units, which in turn in most cases translates into one code point. For these cases, we already have a method to identify these characters. We can use spread syntax and split them into an array of separate individual symbols. Then we can handle each character without any trouble. I will show you that in the next example.

let ordinaryEmojis = "🐊🤦👍🙂";ordinaryEmojis.length                     // 8

[...ordinaryEmojis] // 🐊,🤦,👍,🙂
[...ordinaryEmojis].length // 4[...ordinaryEmojis][0].length // 2
[...ordinaryEmojis][1].length // 2
[...ordinaryEmojis][2].length // 2
[...ordinaryEmojis][3].length // 2

But sometimes, we have even more special characters, emojis…whatever you want to call them.These are a combination of other emojis and even some special “glue” characters that tell us that we need to modify the default character in a certain way.

Let’s see that in the last example of the day.

let specialEmoji = "🤦🏼‍♂️";specialEmoji.length                       // 7[...specialEmoji].length                  // 5[...specialEmoji]                         // 🤦,🏼,‍,♂,️[...specialEmoji][0].length               // 2
[...specialEmoji][1].length // 2
[...specialEmoji][2].length // 1
[...specialEmoji][3].length // 1
[...specialEmoji][4].length // 1

In 99% of situations you will encounter those ordinary emojis. However, you might be lucky to be that one in a hundred developers, who encounters “specialEmoji”.

The name of the default emoji is facepalm and we can see a gender neutral version of it in the ordinaryEmojis variable from the previous example.

But our special one has additional information, which further modifies it. In the end it is made of the following five code points:

  • default gender neutral facepalm emoji (2 code units)
  • emoji skintone modifier [caucasian] (2 code units)
  • zero width joiner (1 code unit)
  • gender sign [male] (1 code unit)
  • variation selector [multicolor] (1 code unit)

Fortunately these emojis from hell are still quite rare, but their popularity grows by day. Unfortunately for us, there is no simple solution for handling reading strings which contain such symbols so far. Maybe when they become the de facto standard for emojis, this will change. Until then, all you can do is pray.

As usual thank you for your time and I will see you soon in the next article.

--

--

Jakub Korch
Nerd For Tech

Web enthusiast, programmer, husband and a father. Wannabe entrepreneur. You name it.