After I posted the blog about the “faster FAST decoder“, I made some further improvement. Here is the latest result using the same CME data file.
Total time = 11.797371 seconds;
Packets/entry/maxEntries = 34217764/165204011/116;
0.344773 microsecond per packet; 0.071411 microsecond per entry.
Tested on the same 3.3 GHz Xeon.
Jeff made some improvement and now he can process the same file within 9.5 seconds on a 3.3 GHz Core i7. So his result is still about 20% better.
After talking with him I realized our decoders have different user requirements. I tried to make my decoder more general purpose at the cost of a little bit of performance.
All I can say is given the same user requirement I can make an implementation as fast as his. But I would like to stay with my current normalized classes.
My implementation will be able to fill the decoded fields in a predefined class, without any assumption about the message template. Every field has a bool type presence flag that occupies 8 bytes in the structure. Message is managed efficiently within an object pool and every time before the message is returned to the pool it has to be “reseted”, i.e. the presence flags has to be memset to false.
Below is the definition of the two most used messages in CME, MDIncRefresh and MDEntries. Note is was automatically generated by my system. Actually my system can generate nice code like this from any given FAST template.
The decoder knows how to efficiently fill in the structure. It can take many statically generated template handlers as well as dynamically generated handlers (during runtime). Of course statically generated handlers performs better but dynamically generated handler guarantees the flexibility and robustness. A unique “signature” will be generated along with each different version of FAST template to guarantee that the system can dynamically detect template version change at startup.
Another nice feature about this design is that the decoder can ignore some fields one is not interested in, with a little bit of performance improvement. For example, I never cared about the TradeDate field because for the instruments I am interested in it is just the current session date.
Unfortunately this flexibility does result in some cost in term of performance. Commenting out the memset message reset will have 1.5 seconds of improvement using the same CME data. Not to mention the cost of setting the “presence flags” during decoding could be similar or even higher.
Overall I am quite happy with my FAST decoder. It can decode at sustainable throughput of over 200MB/s.