The text part of a message can be encoded according to several text alphabets. The two text coding schemes that can be used in SMS are GSM 7-bit default alphabet defined in [3GPP-23.038] and the Universal Character Set (UCS2) defined in [ISO-10646]. The amount of message segment needs to fit into 140 octets. Since the two texts coding schemes utilize one septet and two octets, respectively, to encode a character/symbol, the amount of text that can be included in a message segment is as per the table shown below:
Coding Scheme | Text length per message segment | TP DCS | |
Bit 3 | Bit 2 | ||
GSM alphabet, 7 bits | 160 characters | 0 | 0 |
8 bit data | 140 octets | 0 | 1 |
UCS2, 16 bit | 70 complex characters | 1 | 0 |
Text Compression: In theory, the text part of a message may be compressed [3GPP-23.042]. However, none of the handsets currently available on the market support text compression.
Now the interesting point is that as per GSM TP-DCS can be 1 octet with 8 bits of binary value representation and with only two bits (bit 2 and bit3) reserved for the coding scheme, we can have a maximum of four coding schemes combinations. In contrast, SMPP supports more than a dozen coding schemes.
How does this work in the A2P messaging ecosystem?
To understand this, let’s first understand other binary values in GSM TP-DCS.
Message Class: In addition to the coding scheme, TPDU indicates the class to which the message belongs. Four classes are defined with a combination of bit 0 and bit1 as shown below
Class 0: Immediate display (Flash message) : 0,0
Class 1: Mobile equipment specific message : 0,1
Class 2: SIM specific message : 1,0
Class 3: Terminal equipment specific message : 1,1
Note: If Bit 4 of TP-DCS is set to 0, it indicates that message has no class. And this is where SMPP uses those two-bit values bit 0 and bit 1 in the combination of bit 2 and bit 3 to represent dozens of coding schemes. But there is a trade-off to this. Such messages can’t use Class 0, Flash message, which is quite popular in some countries. But again there is a workaround to this. You can use TLV parameter dest_addr_subunit to inform the message class. A list of coding schemes used by SMPP is shown below:
Bits | 7 6 5 4 3 2 1 0 | Meaning |
0 0 0 0 0 0 0 0 | SMSC Default Alphabet | |
0 0 0 0 0 0 0 1 | IA5 (CCITT T.50)/ASCII (ANSI X3.4) | |
0 0 0 0 0 0 1 0 | Octet unspecified (8-bit binary) | |
0 0 0 0 0 0 1 1 | Latin 1 (ISO-8859-1) | |
0 0 0 0 0 1 0 0 | Octet unspecified (8-bit binary) | |
0 0 0 0 0 1 0 1 | JIS (X 0208-1990) | |
0 0 0 0 0 1 1 0 | Cyrllic (ISO-8859-5) | |
0 0 0 0 0 1 1 1 | Latin/Hebrew (ISO-8859-8) | |
0 0 0 0 1 0 0 0 | UCS2 (ISO/IEC-10646) | |
0 0 0 0 1 0 0 1 | Pictogram Encoding | |
0 0 0 0 1 0 1 0 | ISO-2022-JP (Music Codes) | |
0 0 0 0 1 0 1 1 | Reserved |
Before we end this article lets explain the remaining 4 bits i.e. bit4, bit 5 bit 6 and bit 7 are used to indicate coding groups