Faster FAST Decoder – Updated

After I posted the blog about the “faster FAST decoder“, I made some further improvement. Here is the latest result using the same CME data file.

Total time = 11.797371 seconds;
Packets/entry/maxEntries = 34217764/165204011/116;
0.344773 microsecond per packet; 0.071411 microsecond per entry.

Tested on the same 3.3 GHz Xeon.

Jeff made some improvement and now he can process the same file within 9.5 seconds on a 3.3 GHz Core i7. So his result is still about 20% better.

After talking with him I realized our decoders have different user requirements. I tried to make my decoder more general purpose at the cost of a little bit of performance.

All I can say is given the same user requirement I can make an implementation as fast as his. But I would like to stay with my current normalized classes.

My implementation will be able to fill the decoded fields in a predefined class, without any assumption about the message template. Every field has a bool type presence flag that occupies 8 bytes in the structure. Message is managed efficiently within an object pool and every time before the message is returned to the pool it has to be “reseted”, i.e. the presence flags has to be memset to false.

Below is the definition of the two most used messages in CME, MDIncRefresh and MDEntries. Note is was automatically generated by my system. Actually my system can generate nice code like this from any given FAST template.

The decoder knows how to efficiently fill in the structure. It can take many statically generated template handlers as well as dynamically generated handlers (during runtime). Of course statically generated handlers performs better but dynamically generated handler guarantees the flexibility and robustness. A unique “signature” will be generated along with each different version of FAST template to guarantee that the system can dynamically detect template version change at startup.

Another nice feature about this design is that the decoder can ignore some fields one is not interested in, with a little bit of performance improvement. For example, I never cared about the TradeDate field because for the instruments I am interested in it is just the current session date.

Unfortunately this flexibility does result in some cost in term of performance. Commenting out the memset message reset will have 1.5 seconds of improvement using the same CME data. Not to mention the cost of setting the “presence flags” during decoding could be similar or even higher.

Overall I am quite happy with my FAST decoder. It can decode at sustainable throughput of over 200MB/s.

Advertisements

About weiqj

High Frequency Trader; Hardcore Programmer; CME Individual Member
This entry was posted in Uncategorized. Bookmark the permalink.

7 Responses to Faster FAST Decoder – Updated

  1. Mark says:

    This is very interesting stuff. Would it be possible to get a copy of your input dataset? I am trying a few ideas myself but my input data (captured multicast packets) are obviously different from yours. Do you also strip out the pre-amble?

  2. Miguel says:

    I would be interested in the dataset you have tested against as well. I would like to test against a modified version of QuickFAST I have…

  3. Gaurav says:

    Hi,

    In the dataset, that you are using, does each message come with independent header?
    Please do share the dataset as well.

    Thanks

    Gaurav

  4. Johan says:

    Hi,
    Does the decoding time include maintaining a view of the order book?

    Thanks,
    Johan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s