User:Espeholt jr/sandbox
{Orphan|August 2006}} There is three types of strings used in programming languages. These types are:
- Fixed length strings
- Terminated strings
- Counted strings
This wiki will describe Counted UTF-8 string which, as the name says, is a counted string which uses the UTF-8 character encoding.
Length prefix[edit]
Pascal strings (also called P-strings) use a 1 byte length prefix. The major disadvantages to that is that strings can't be more than 255 characters long. In Microsoft .Net MSIL there is a 2byte length prefix. This allows the string to be 65535 () characters long. It is very rare that a string is more than 64 K long, but it can happen and another disadvantage is that the string uses 2 bytes more, even if it only contains a single character. With Counted UTF-8 string all these problems are solved with a variable length prefix length. If the string contains 127 characters or more, there is a length prefix of 1 byte. If the string contains 16383 characters or more, there is a length prefix of 2 bytes.
No comments:
Post a Comment