Patterns and Best Practices for Building #Low-Latency #Trading Applications
Posted on August 18th, 2015
08/17/2015 @Barclays Capital, 745 7th Ave, NY
Buko Obele, software architect @Neeve Research, spoke about how to build low latency trading systems in #Java.
Buko introduced the topic by noting the competitive importance of getting to a trade submitted to the market first and that in 2010 low latency meant < 10ms, but now to be competitive trades need to be submitted in less than 250 microseconds. This means that the electronic decision to trade needs to be done in less than 90 microseconds.
He then argued that Java is the best language to use in developing trading systems since
- Popular and is updated to the stay in the forefront of computing
- Lots of programmers know Java
- Good for coding business logic
However, special programming practices are needed to successfully use Java in a low-latency trading environment. These coding methods consider
- Key metric = Time to First Slice (TTFS). Need to worry about the tails.
- In low latency, need to consider outside the JVM, including the OS, BIOS, Hardware, Network
As a result the Java code may not look like Java code you usually see. To see how the code needs to be written Buko talked about 5 enemies of high performance Java code and then talked about how to surmount these issues through specialized coding methods.
Enemy #1: the Garbage Collector => need to do your own garbage collection.
Enemy #2: the language and the library: the following affect performance: strings, bigDecimal, autoboxing, Java Collections (e.g. ArrayList grows, Maps rehash), Exceptions, Advance for-loop (create an Iterator), Java 8 Optional Lambda…
Enemy #3: Threads: threads block/context switch, the scheduler will intervene, difficult to reason about performance when there are many threads
Enemy #4: I/O: L1 Cache at 5ns up to disk at 10ms. Main memory is 100ns. To be fast enough one needs to consider where data are stored
Enemy #5: Layers of abstraction: when data is passed from one layer to another the data are copied. The scheduler de-priortizes our process to give other processes their “fair share”
Buko then spoke about how to conquer these issues
#1 Zero garbage – pool objects, share objects, reuse objects, pass parameters. Pre-allocate everything you need to trade that day (3mm trades/day) => no garbage to collect. Since space is pre-created, there is no need to call constructors during the day.
#2. No strings, no dates, no big decimal, no autoboxing. When they receive strings, copy the message directly into their domain as a stream of bytes. All messages are cached at startup. For money values, treat them as fixed precision numbers.
Do not use Exceptions. Instead use a pattern of violations. Always pass a list of potential violations. If a routine has a problem it will add a violation to the list of violations. Aggregate State: have an array of objects into which things are stored. To lookup characteristics, pass an enum to a function.. Replace Java Collections with HPPC and Koloboke or other optimized collections. Be careful with lambdas – can lead to an allocation. Replace foreach() with for(int i=0; i< …; i++) loops
#3. Business logic is single threaded. Use separate threads for I/O and replicating data. Use Lock-Free Queues that allows threads to pass information without synchronization. Threads are pinned to a core – use busy spinning so the core is always looking at the queue. Be mindful of Numa – the physical structure of the board to place each CPU close to the data stream it needs to access.
#4. Use in memory computing. All data is kept in memory as plain old Java objects. No data layer. “memory is the new disk”. Event sourcing – a database provides assurance of high availability. But they do everything in memory. Instead us event sourcing – every message is processed twice and the backup and the main are coordinated. If the main process fails, the backup can take over in less than a second. Primary will wait for the backup to indicate that the message has been processed. Disaster Recovery is supported using a separate off-site replication – don’t wait for the DR site to complete its calculations, but other than it’s similar.
#5. Zero copy –use a special network card to bypass the network card. Use a DirectByeBuffer to refer to the off-heap data (so you don’t need to copy to the heap). Use framing – to view the buffer without needing to decode the entire message. Can copy the input buffer using bulk copy and could possibly keep it in the buffer and then use a bulk copy to send it to the market. Never copy data into the JVM heap. Try to make the messages look like their states so there is no translation so they can be directly worked in the domain.
Finally he talked about http://xplatform.com:
X Platform is a way forward so you can concentrate on the business logic. It creates the underlying structure that you can use in low-latency trading.
#Hadoop/ #Java 8 Stream Debugging
Posted on July 27th, 2015
07/27/2015 @ LinkedIn, 350 5th Ave (Empire State Building), NY
Two speakers spoke.
Keith D’Souza @LinkedIn spoke about how LinkedIn integrates map reduce into its work flow. Specifically, he talked about the system and tools they use to generate recommendations on people and job offers that are presented to users. Some of their main off-line tools are
Project Takeout to handle user requests to download a data archive.
Gobblin takes data from multiple sources and sends to the primary db.
Azkaban is a workflow project manager controlling Hadoop with commands to supervise job dependencies and track jobs.
Dr. Elephant – analyzes jobs that have run to assess whether processes can be made more efficient.
In the second talk Tim Fagan spoke about five techniques for debugging java 8 streams. Here an object is followed by a series of operators that manipulate the object. He illustrated these debugging methods on a “one-line program” to eliminate words in a string of text containing the letters x,y,z, and produce an alphabetically sorted list without duplicate words.
Tim presented the following debugging methods:
- Collect – print lists of the outputs on each step within the operation. But this is verbose & one needs to split the code and create intermediate variables.
- Peek & guess – insert peek(out::println) which is an operator that does nothing to affect the stream, but can create outputs to the console. But the output can be confusing due to the order of processing of the steps.
- Peek & collect – use peek(debug1::add) where debug1 is an arraylist. Much easier to read than peek&guess since outputs are grouped by the location of the peek within the stream.
- Peek with labels – create a log method that allows labels to be added to each peek(log(“step 1: “)
- Decorate – add the debug() function to call the initial object before starting the stream of operations. This creates tags which are printed to the console after each operation
Tim recommended the decorate method and acknowledged the flexibility of the peek methods.
Domain driven design for #noSQL and multi-model #databases
Posted on November 10th, 2014
11/10/2014 @ Grubhub 1065 6th Ave, NY
Max Neunhoffer @ArangoDB
Max talked about how the there are different database structures for different types of database queries: #documents, #key/value stores, #graphs. Frequently there are queries of a single set of data that are searched in different ways. For example this may be handled using one type of database (NoSQL) with inefficient methods when queries are for “shortest path” which are best queried with in a graph database (Neo4j).
The main argument for this approach is if you are starting a database in which you might not know the full extent of the queries that might be eventually be used. The flexibility to handle the data using different data models may be useful in the future.
JavaOne 2014 Recap
Posted on October 28th, 2014
10/28/2014 @Pivotal, 625 6th Ave, NY
Tim Fagan @Lab49
Tim summarized some of the new items recently presented at the #JavaOne conference in San Francisco. There, the main emphasis was on describing the new features available in #Java8, which is the default download version of #Java and will replace Java7 in 2015. Topics were
1.JSON reader as part of the language into JsonStructure; Tim talked about the new structures to read, write and update Json structures.
2.Async processing within an HttpServlet. This makes it easier to write code to minimize the risk of blocking a thread when either reading or writing to other services.
4.avoiding SQLlinjection – a way to hack into sites taking advantage of the jsessionID cookie which is used so you don’t need to relogon to each page. Using this cookie, however, opens a vulnerability using a ‘cross-site request forgery’ in which an infected page posts a request for an unauthorized transaction using information it reads from the cookie. Solution: create a custom header which includes a secret token as a hidden input that was originally from the secure web site. Add the token to the header when you do a post response. But you don’t need to protect all your pages, just those vulnerable to unauthorized transactions.
5.JavaFX – replacement for Swing: 3d support, printing, new controls, CSS improvements, GPC acceleration on Linux,… create shapes wrap them with images to make an object. Can also translate in a direction, rotate around an axis, etc. Separate object from the motion controller. Tim walked through code showing how easy it is to animate objects.
7.internet of things – e.g. Thalmic Myo – sensor on your arm ; Leap Motion – sophisticated APIs (two hands and a pointer device); TheEyeTribe – looks at your eyes and tell you what you are looking at. ; Parrott AR.Drone 2.0 ; NAO robot – all run using Java
For the videos see:
Apache Cassandra & Java
Posted on July 1st, 2014
Caroline George, a Solutions Architect at DataStax
Followed by a lightning talk by Kenneth Scher at BNY Mellon
6/30/2014 @ BNY Mellon, 101 Barclay Street, New York, NY
Caroline George presented the system architecture and query language for Cassandra, a massively scalable NoSQL database. Cassandra is callable from Java and other languages through Cassandra Query Language, an open-source subset of SQL. Much of the presentation was a review of NoSQL databases and why they are different from relational databases: no joins, fast writes, appends that replace record updates, etc. Caroline also emphasized that Cassandra’s architecture is base on nodes that are equal in status which makes the system robust and easily scalable.
Caroline’s slides are available here.
Kenneth Scher presented a brief overview of methods that BNY Mellon uses to monitor the performance of their networks. Their tool, called Dynatrace, creates system statistics and network diagrams of the latencies between databases, servers and user interfaces.