Data Coding SMPP Vs GSM 03.38

The text part of a message can be encoded according to several text alphabets. The two text coding schemes that can be used in SMS are GSM 7-bit default alphabet defined in [3GPP-23.038] and the Universal Character Set (UCS2) defined in [ISO-10646]. The amount of message segment needs to fit into 140 octets. Since the two texts coding schemes utilize one septet and two octets, respectively, to encode a character/symbol, the amount of text that can be included in a message segment is as per the table shown below:

 

Coding Scheme
Text  length per message segment
TP DCS
 
 
 
 
 
 Bit 3
Bit 2
GSM alphabet, 7 bits
160 characters
0
0
8 bit data
140 octets
0
1
UCS2, 16 bit
70 complex characters
1
0

 

Text Compression: In theory, the text part of a message may be compressed [3GPP-23.042]. However, none of the handsets currently available on the market support text compression.

 

Now the interesting point is that as per GSM TP-DCS can be 1 octet with 8 bits of binary value representation and with only two bits (bit 2 and bit3) reserved for the coding scheme, we can have a maximum of four coding schemes combinations. In contrast, SMPP supports more than a dozen coding schemes.

 

How does this work in the A2P messaging ecosystem?

 

To understand this, let’s first understand other binary values in GSM TP-DCS. 

Message Class: In addition to the coding scheme, TPDU indicates the class to which the message belongs. Four classes are defined with a combination of bit 0 and bit1 as shown below

 

Class 0: Immediate display (Flash message) : 0,0

Class 1:  Mobile equipment specific message : 0,1

Class 2: SIM specific message : 1,0

Class 3: Terminal equipment specific message : 1,1

 

Note: If Bit 4 of TP-DCS is set to 0, it indicates that message has no class. And this is where SMPP uses those two-bit values bit 0 and bit 1 in the combination of bit 2 and bit 3 to represent dozens of coding schemes. But there is a trade-off to this. Such messages can’t use Class 0, Flash message, which is quite popular in some countries. But again there is a workaround to this. You can use TLV parameter dest_addr_subunit to inform the message class. A list of coding schemes used by SMPP is shown below:

 

Bits
7 6 5 4 3 2 1 0
Meaning
 
0 0 0 0 0 0 0 0
SMSC Default Alphabet
 
0 0 0 0 0 0 0 1
IA5 (CCITT T.50)/ASCII (ANSI X3.4)
 
0 0 0 0 0 0 1 0
Octet unspecified (8-bit binary)
 
0 0 0 0 0 0 1 1
Latin 1 (ISO-8859-1)
 
0 0 0 0 0 1 0 0
Octet unspecified (8-bit binary)
 
0 0 0 0 0 1 0 1
JIS (X 0208-1990)
 
0 0 0 0 0 1 1 0
Cyrllic (ISO-8859-5)
 
0 0 0 0 0 1 1 1
Latin/Hebrew (ISO-8859-8)
 
0 0 0 0 1 0 0 0
UCS2 (ISO/IEC-10646)
 
0 0 0 0 1 0 0 1
Pictogram Encoding
 
0 0 0 0 1 0 1 0
ISO-2022-JP (Music Codes)
 
0 0 0 0 1 0 1 1
Reserved

 

Before we end this article lets explain the remaining 4 bits i.e. bit4, bit 5 bit 6 and  bit 7 are used to indicate coding groups