SBE

What tool should be used to generate stubs from schema?

MarketFactory recommends using the RealLogic SBETool. It should be noted that we use v1.25.1 to avoid a bug in the CME ilink3 schema that is revealed with later versions of the tool, but clients are free to use later versions.

The breaking change was introduced in SBE Tool v1.25.2, as described in RealLogic issues #889 and #917: because it is impossible to specify a character value of 0x00 in a valid XML document, the proposed solution, such as it is, is to convince CME to remove the null_value="0" attribute from the specification of charNULL in their schema. This may take a while.

Why does the wire representation of a message appear different to that defined in the schema?

When directly inspecting "on-the-wire" representations of char datatypes - including enumerations which may take numeric values - be aware that these are subject to UTF-8 encoding, and you may be inspecting the decimal code of the given character.

Refer to https://en.wikipedia.org/wiki/UTF-8#Codepage_layout for details.

<enum name="Side" description="Side" encodingType="char">
    <validValue name="Buy">1</validValue>
    <validValue name="Sell">2</validValue>
    <validValue name="TwoWay">7</validValue>
</enum>

You may observe the following values:

49 - '1' - Buy
55 - '2' - Sell
55 - '7' - TwoWay

When I parse/write a string from/to a fixed length SBE field, how do I determine/signal where to truncate?

The string with either be null-terminated or will be of the size of the fixed length field. So if the string is smaller than the field length; it will end with a \0. If the string is the same length as the field length then there is no terminator.
When writing you do not have to ensure the remainder of the array is all \0 values - although that would work. The string just needs to be terminated with a single \0; or if the string is exactly as large as the field length - you don't need to do anything.

The server expects string fields to be terminated with a \0.

Why is there no checksum in the SBE messages?

It's worth remembering that the FIX protocol checksum is at the application level. The underlying TCP transport also incorporates it's own CRC and checksum checks.

The main reason this application checksum exists (in my opinion at least) is that the FIX protocol is a simple structured string and there is a real chance of it being mangled at the application level (not all FIX 'engines' are equal), which those transport-level checks would not protect against.

For SBE, code stubs are generated against the schema, using these to read/write a given message or field ensures it is well-formed. There are still ways to break the SBE messaging - e.g. receiving an enum value that is not supported in your stubs (i.e. schema compatibility issues). Any such error on reading/writing is a disconnection event.

In addition, SBE is a performance-oriented wire protocol. If checksums are included as for the FIX protocol, application-level checksum computation becomes a significant component of the overall cost.

Is there no way to define groups once rather than have them repeatedly defined for each message type that uses them in the SBE schema?

Unlike FIXML, there is no definition of an attributeGroup component that is defined once and referenced multiple times (e.g. in a complexType). In SBE each group has to be an explicit component of it's parent group or message. To minimise the amount of 'duplicate' code, there are two broad avenues to consider:

Schema change

There is a way to aggregate multiple fields together as a composite datatype (and a composite may itself reference other composite types), so using our NoHops group as an example, in theory we could move to something like:

<type name="UTCTimestamp" presence="optional" nullValue="0" primitiveType="uint64" semanticType="UTCTimestamp"/>

<type name="SeqNum" nullValue="0" presence="optional" nullValue="0" primitiveType="uint32" semanticType="SeqNum"/>

<composite name="NoHopsFieldsType">
    <ref name="HopRefID" type="SeqNum"/>
    <ref name="HopSendingTime" type="UTCTimestamp"/>
    <ref name="HopNetworkTime" type="UTCTimestamp"/>
    <ref name="HopArrivalTime" type="UTCTimestamp"/>
</composite>

<sbe:message name=MyMessage id=1">
    ...
    <group name="NoHops" id="627" dimensionType="groupSizeEncoding">
        <field name="NoHopsFields" id="20000" type=" NoHopsFieldsType"/>
    </group>
</sbe:message>

Extending this idea, we could specify every single field/type as a composite type containing a single type definition, allowing us to create composites (with refs for the individual field type definitions as above) for each unique body or group level block. But moving from multiple fields to compound types is not free and things become very awkward in other ways:

The composite cannot contain 'data' elements e.g. VarText.
Future changes (e.g. additional fields, enum values etc) are far more likely to be breaking changes.
We wind up with a message that has all the fields obfuscated, but groups and data definitions still remain in situ - i.e. a half-solution, really.
We may well wind up with multiple definitions for the same basic field/type which must all be kept in sync...
SBE is a FIX standard, yet the FIX IDs and descriptions for the individual fields is lost. All the 'fields' in the schema are now embedded components of custom types.

In short, this is not a fruitful path. The "SBE way" is for each message to be self-contained, if you didn't want to rely on the generated stubs (with the duplicate types etc), then at a slight performance hit, the typical alternative would be to use an on-the-fly decoder model, which requires usage of the SBE intermediate representation to process each message in a more generic manner.

Smarter stub generation

MarketFactory has thought about creating a 'smarter' stub generator that would identify common groups, types etc and generate a more optimised set of getter/setter etc primitives. This is conceptually doable and probably much more in line with what you're looking for, although this work is not currently planned.

Session Handling

If I wanted to keep a message store so I could resend messages, what technique would MF recommend?

A typical technique is to write every outgoing message to a memory mapped file at the same time as sending it down the TCP socket, for possible retrieval later. The file could be persisted to disk periodically if there was a need to keep it longer term.
If you don't need to replay messages to the venue, then just storing the sequence number is good enough.