User:Espeholt jr/sandbox

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

{Orphan|August 2006}} There is three types of strings used in programming languages. These types are:

- Fixed length strings

- Terminated strings

- Counted strings

This wiki will describe Counted UTF-8 string which, as the name says, is a counted string which uses the UTF-8 character encoding.

Length prefix[edit]

Pascal strings (also called P-strings) use a 1 byte length prefix. The major disadvantages to that is that strings can't be more than 255 characters long. In Microsoft .Net MSIL there is a 2byte length prefix. This allows the string to be 65535 () characters long. It is very rare that a string is more than 64 K long, but it can happen and another disadvantage is that the string uses 2 bytes more, even if it only contains a single character. With Counted UTF-8 string all these problems are solved with a variable length prefix length. If the string contains 127 characters or more, there is a length prefix of 1 byte. If the string contains 16383 characters or more, there is a length prefix of 2 bytes.