Patterns and Best Practices for Building #Low-Latency #Trading Applications
Posted on August 18th, 2015
08/17/2015 @Barclays Capital, 745 7th Ave, NY
Buko Obele, software architect @Neeve Research, spoke about how to build low latency trading systems in #Java.
Buko introduced the topic by noting the competitive importance of getting to a trade submitted to the market first and that in 2010 low latency meant < 10ms, but now to be competitive trades need to be submitted in less than 250 microseconds. This means that the electronic decision to trade needs to be done in less than 90 microseconds.
He then argued that Java is the best language to use in developing trading systems since
- Popular and is updated to the stay in the forefront of computing
- Lots of programmers know Java
- Good for coding business logic
However, special programming practices are needed to successfully use Java in a low-latency trading environment. These coding methods consider
- Key metric = Time to First Slice (TTFS). Need to worry about the tails.
- In low latency, need to consider outside the JVM, including the OS, BIOS, Hardware, Network
As a result the Java code may not look like Java code you usually see. To see how the code needs to be written Buko talked about 5 enemies of high performance Java code and then talked about how to surmount these issues through specialized coding methods.
Enemy #1: the Garbage Collector => need to do your own garbage collection.
Enemy #2: the language and the library: the following affect performance: strings, bigDecimal, autoboxing, Java Collections (e.g. ArrayList grows, Maps rehash), Exceptions, Advance for-loop (create an Iterator), Java 8 Optional Lambda…
Enemy #3: Threads: threads block/context switch, the scheduler will intervene, difficult to reason about performance when there are many threads
Enemy #4: I/O: L1 Cache at 5ns up to disk at 10ms. Main memory is 100ns. To be fast enough one needs to consider where data are stored
Enemy #5: Layers of abstraction: when data is passed from one layer to another the data are copied. The scheduler de-priortizes our process to give other processes their “fair share”
Buko then spoke about how to conquer these issues
#1 Zero garbage – pool objects, share objects, reuse objects, pass parameters. Pre-allocate everything you need to trade that day (3mm trades/day) => no garbage to collect. Since space is pre-created, there is no need to call constructors during the day.
#2. No strings, no dates, no big decimal, no autoboxing. When they receive strings, copy the message directly into their domain as a stream of bytes. All messages are cached at startup. For money values, treat them as fixed precision numbers.
Do not use Exceptions. Instead use a pattern of violations. Always pass a list of potential violations. If a routine has a problem it will add a violation to the list of violations. Aggregate State: have an array of objects into which things are stored. To lookup characteristics, pass an enum to a function.. Replace Java Collections with HPPC and Koloboke or other optimized collections. Be careful with lambdas – can lead to an allocation. Replace foreach() with for(int i=0; i< …; i++) loops
#3. Business logic is single threaded. Use separate threads for I/O and replicating data. Use Lock-Free Queues that allows threads to pass information without synchronization. Threads are pinned to a core – use busy spinning so the core is always looking at the queue. Be mindful of Numa – the physical structure of the board to place each CPU close to the data stream it needs to access.
#4. Use in memory computing. All data is kept in memory as plain old Java objects. No data layer. “memory is the new disk”. Event sourcing – a database provides assurance of high availability. But they do everything in memory. Instead us event sourcing – every message is processed twice and the backup and the main are coordinated. If the main process fails, the backup can take over in less than a second. Primary will wait for the backup to indicate that the message has been processed. Disaster Recovery is supported using a separate off-site replication – don’t wait for the DR site to complete its calculations, but other than it’s similar.
#5. Zero copy –use a special network card to bypass the network card. Use a DirectByeBuffer to refer to the off-heap data (so you don’t need to copy to the heap). Use framing – to view the buffer without needing to decode the entire message. Can copy the input buffer using bulk copy and could possibly keep it in the buffer and then use a bulk copy to send it to the market. Never copy data into the JVM heap. Try to make the messages look like their states so there is no translation so they can be directly worked in the domain.
Finally he talked about http://xplatform.com:
X Platform is a way forward so you can concentrate on the business logic. It creates the underlying structure that you can use in low-latency trading.