« Previous - Version 7/16 (diff) - Next » - Current version
Adrian Georgescu, 05/31/2012 04:21 pm


SIP-XMPP Instant Messaging (IM)

In XMPP there are several types of messages which lead to different semantics when exchanging XMPP message stanzas between 2 endpoints. This section focuses only on message types that allow 2 endpoints to send instant messages to each other.

XMPP IM types

In XMPP there are three ways of doing Instant Messaging:

  • Normal: the default message type. A reply is not expected from the recipient. This is further referred as 'Single Message' mode.
  • Chat: This message type implies both parties have engaged a conversation. This is further referred as 'Chat Session' mode.

The first one is session-less and the latter is session based.

  • Headline: An endpoint receiving this type of message should never reply, since it's meant to be used by servers or other entities to deliver announcements.

SIP IM types

In SIP there are two ways of doing Instant Messaging:

  • SIP MESSAGE (RFC 3428). This is further referred as 'Single Message' mode.
  • MSRP sessions (RFC 4975). This is further referred as 'Chat Session' mode.

The first one is session-less and the latter is session based.

Single Message Translation

The mechanism described here follow the currently available specifications for SIP-XMPP interoperability:

XMPP single messages are mapped directly to SIP MESSAGE requests and vice versa.

Overview

The mechanism for translating XMPP normal message stanzas and SIP MESSAGE requests is straightforward, they map one to one as stated in http://xmpp.org/internet-drafts/draft-saintandre-sip-xmpp-im-01.html. However, since SIP is used mainly with UDP as a transport, if a XMPP stanza is bigger than 1500 bytes it will be chunked into smaller pieces to avoid ethernet fragmentation related issues.

Since SIP MESSAGE is a non INVITE transaction, it has to be replied immediately, because there is no way to avoid retransmissions. This means that the SIP-XMPP gateway will reply on the SIP side before knowing if the message was actually delivered to the XMPP side. In order to express this a "202 Accepted" reply will be sent to the SIP request instead of a "200 OK".

On the other hand, when an XMPP stanza is translated into a SIP MESSAGE request the SIP-XMPP gateway is able to report back the result (in case of error) by using a message stanza of type error. This is possible because of the asynchronous nature of stanza processing in the XMPP protocol.

Error reporting

No error reporting mechanism can be used at the SIP level to notify about SIP MESSAGE delivery success or failure, since the request has to be replied to immediately (because it's a non INVITE transaction).

Chat Session Translation

The mechanism described here follow the currently available specifications for SIP-XMPP interoperability:

In XMPP there are two different types of chat sessions:

  • Formal sessions: those negotiated with XEP-0155
  • Informal sessions: any exchange of message stanzas of type chat

Formal sessions map directly to SIP sessions but since support for that XEP doesn't seem to be widely deployed it will not be implemented.

Informal sessions can be mapped to SIP sessions with MSRP media or to SIP MESSAGE requests. Both mechanisms will be implemented and selecting which one to use will be decided with a configuration option.

The use of SIP MESSAGE is highly discouraged due to the following reasons:

  • There is no unique message identification mechanism
  • The most used transport in SIP is UDP, which is unreliable, thus making delivery of SIP MESSAGE requests unreliable
  • Lack of an end to end delivery confirmation mechanism
  • Message order is not guaranteed if an unreliable transport like UDP is used
  • Messages could get duplicated due to retransmissions if an unreliable transport is used
  • The majority of deployed endpoints lack support for CPIM, which is required for conferencing scenarios

Defining a common chat session model

In SIP a session is started by creating a dialog with the INVITE method and it's ended by terminating the dialog with a BYE request. In XMPP there is no universal mechanism to indicate that a chat session has started or ended. Because of this, the SIP-XMPP gateway will try its best to correlate the state on the SIP side with the one on the XMPP side.

There are different mechanisms by which the start and end of an XMPP chat session can be stated, but unfortunately none of them seem to be implemented in the most widely used XMPP clients, so relaying on them would lead to trouble.

  • XEP-0155: Stanza Session Negotiation. This XEP has been in draft form since 2008 and even if implementation is encouraged none of the widely used XMPP clients implements it.
  • "XEP-0201:"http://xmpp.org/extensions/xep-0201.html: Best Practices for Message Threads. This XEP is more recent and some many clients implement it. Unfortunately, the concept of a "chat session" according to this XEP doesn't match the one on SIP because message threads last far longer, they can be resumed even after being offline for a while.
  • "XEP-0085:"http://xmpp.org/extensions/xep-0085.html: Chat State Notifications. This XEP defines a set of states in which use can be while on a chat session. Many clients implement it and it can be used to signal composing indication on the SIP side and also to decide when a session should be ended on the SIP side (the gone state).

Since no reliable way has been found to map SIP sessions to XMPP chat sessions and vice versa, the SIP-XMPP gateway will try to use all the available information to act as accurately as possible.

Addressing

The first thing that needs to be solved is addressing: XMPP JIDs have a resource, which uniquely identifies a given XMPP client instance, for example saul@ag-projects.com/foobar. A similar mechanism needs to be implemented on the SIP side so that individual devices and thus session endpoints are properly matched. This is solved by using GRUU (RFC 5627). With GRUU each device will have a unique identifier, like the XMPP JID resource. For example, these could be the 2 endpoints of a given session: user1 sip:saul@ag-projects.com;gr=89y89y4hr489j98jf4 <--> user2 ag@ag-projects.com/foobar.

If a SIP endpoint doesn't have a GRUU support a single fixed identifier will be assigned. This fixed value MUST never change while the application is running. The lack of support of GRUU imposes a limitation, though: only a single concurrent session can be carried out with the same destination XMPP JID, because otherwise it would be impossible to match the destination of the incoming XMPP stanzas (the recipient would always be the same).

Starting a session (SIP)

In order to start a session from the SIP side, an INVITE will be used, as usual. When building the request URI, the caller may specify the callee instance he wants to talk to by sing the GRUU semantics, that is: sip:user@gmail.com;gr=foobar would be translated to user@gmail.com/foobar.

If there is no session established between the caller and the callee the SIP-XMPP gateway will accept the session and will start translating SIP chat messages to XMPP chat message stanzas. If there is already an ongoing session between the two given endpoints, the SIP-XMPP gateway will reject the session with 488 code.

Note that if the SIP request URI doesn't contain the resource identifier (gr parameter) the translated JID is a bare JID (a JID with no resource specified) so the real recipient is unknown until a response is received from any XMPP client with that JID.

Starting a session (XMPP)

As aforementioned, XMPP doesn't have a mechanism to indicate the start of a chat session, so the XMPP client will just send a message stanza. If there is no session whose endpoints map those specified in the stanza a new outbound SIP session will be created.

The outbound SIP request will always have a GRUU in the From header, as a result of the translation from a full JID.

Note that if the recipient JID is a bare JID the real recipient is unknown until a reply is received on the SIP side (the request may fork and the session will be bound to the endpoint that answers).

Ending a session (SIP)

If a SIP endpoint sends a BYE request to the SIP-XMPP gateway, the SIP session will be terminated and a body-less chat message stanza will be sent to the XMPP endpoint with the gone chat state (XEP-0085).

Ending a session (XMPP)

If a XMPP endpoint sends a chat message stanza with the gone chat state the SIP-XMPP gateway will terminate the session on the SIP side by sending a BYE request. Since not all XMPP clients send the gone chat state the SIP-XMPP gateway will keep a timer which will terminate the session on the SIP side if no chat messages were exchanged in that amount of time. The default value (it's configurable) is 10 minutes, as recommended by XEP-0085.

XMPP chat session <-> SIP MESSAGE

Error reporting

No error reporting mechanism can be used at the SIP level to notify about SIP MESSAGE delivery success or failure, since the request has to be replied to immediately (because it's a non INVITE transaction).

XMPP chat session <-> MSRP

Error reporting

None of the XMPP - SIP interoperability specs mention how error reporting should be done for chat messages. Since XMPP supports receipts (XEP-0184) they are correlated with the MSRP REPORT requests by the SIP-XMPP gateway in order to have message delivery assurance on both SIP and XMPP.

xmppgw_im_chat_msrp.png (92.7 kB) Tijmen de Mes, 05/07/2012 11:06 am

xmppgw_im_chat_msrp2.png (93.1 kB) Tijmen de Mes, 05/07/2012 11:06 am

xmppgw_im_chat_sipmessage.png (92.6 kB) Tijmen de Mes, 05/07/2012 11:06 am

xmppgw_im_normal.png (85.9 kB) Tijmen de Mes, 05/07/2012 11:06 am