Experiments « KYRIAKOS ANASTASAKIS – ΚΥΡΙΑΚΟΣ ΑΝΑΣΤΑΣΑΚΗΣ

This post provides some experiences from making a simple Java app (well, the actual app does something this is just a demo) that connects to Bitfinex to get the funding rates run as a GraalVM native image.

xChange is one of the most famous Java libraries for integrating with a large number of crypto exchanges. It provides a simple API that Continue reading →

I have recently started using MongoDB for a very demanding task. Storing all Forex pair ticks (e.g. every change in bid/ask prices). I know there are frameworks designed for this task (e.g. kdb+), but since I wanted to avoid the learning curve. Besides I already use Spring Data in my project and it works with a minimal number of changes for Mongo.

In Mongo I have a collection with more than 3.5 billion records (and growing) and I want to find the latest date for each pair. I tried using the aggregation framework of Mongo, but it doesn’t seem to use the indexes and takes ages (didn’t finish after one day).

Relational Structure

In relational DB the table structure would look something like:

id	pair	dateTime	bid	ask
1	EUR/USD	2015-04-03 21:32:31.456	1.14141	1.14142
...		...	...	...

Then you would have to run the following query:

SELECT  t.pair, MAX(t.dateTime)
FROM tick_data t
GROUP by t.pair;

MongoDB Aggregation Framework

In MongoDB the document structure is the same. I am a very very novice user of Mongo, but I gather we could use the aggregation framework for this query:

db.tick_data.aggregate(
    {$group:{_id:"pair", "maxValue": {$max:"dateTime"}}}
);

However, this takes ages, even though I have used a composite index on pair and dateTime.

Very Fast Result Using MongoShell

I tried using a sort of iterative approach using MongoShell:

db.tick_data.distinct( "pair" ).forEach(function(per_pair) { 
  var lastTickPerPair =  db.tick_data.find({ "pair": per_pair }).sort({"dateTime": -1}).limit(1);
 var lastTickOfPair = lastTickPerPair.hasNext() ? lastTickPerPair.next() : null;
  print( "pair: " + lastTickOfPair.pair + ", dateTime:" + lastTickOfPair.dateTime); 
  } 
 );

This approach seems to use the composite index on pair and dateTime I defined and the results are lightning fast (for 3.5 billion records).

Maybe there are other ways, but after some digging around I couldn’t find any other method that would use indexes.

It’s been ages since I have posted some sample code. It’s mainly because I don’t have time to collect and post sample code anymore. This once was a bit more challenging and googling wasn’t help much, so now that I have some time I though I would post some sample code that achieves batch inserts with spring data.

For example this link:
http://forum.spring.io/forum/spring-projects/data/118203-bulk-insert-with-crudrepository indicated that I had to manually get the session and iterate/flush (which was true when using Spring/Hibernate/JPA). But when using the CRUDRepository it appears it’s much simpler.

FULL CODE

Full code sample (maven project) can be found on github: https://github.com/cyrus13/anastasakis-net-sample-code/tree/master/spring-data-batch

You basically need to have the following elements:

Add: ?rewriteBatchedStatements=true to the end of the connectionstring.

Make sure you use a generator that supports batching in your entity. E.g.

@Id
@GeneratedValue(generator = "generator")
@GenericGenerator(name = "generator", strategy = "increment")

Use the: save(Iterable<S> paramIterable); method of the JpaRepository to save the data.
Use the: hibernate.jdbc.batch_size configuration.

RESULT

So enabling the query log in MySQL:

SET global general_log = 1;
SET global log_output = 'table';

we can see the following mysql code is executed:

SET autocommit=0;
select max(id) from ExampleEntity;
SHOW WARNINGS;
select @@session.tx_read_only;
insert into ExampleEntity (exampleText, id) values 
('de32bec8-1cf9-4f14-b816-0ab7a00b1539', 4),
('c0c85b32-eb2d-4a69-ade4-ac70ea94241c', 5);
commit;
SET autocommit=1;

Note: Don’t forget to stop logging statements into MySQL general log!

SET global general_log = 0;

Today I decided to update to OSX Yosemite. How much time would it take? 1 hour, 2hours? It appeared to get stuck to “one minute to finish” the installation for quite some time. During the upgrade process if you move the cursor to the top of the screen a menu will appear. One of the options is: “Show Log”. This option pops up a window that shows detailed information about the process.

In my case for around half an hour it was logging events like this:

 Nov 16 12:54:13 MacBook-Pro.lan OSInstaller[411]: (NodeOp) Move "/Volumes/Macintosh HD/Recovered Items/usr/local/texlive/2012/texmf-dist/source/latex/koma-script/doc/english/common-1.tex" -> "/Volumes/Macintosh HD/usr/local/texlive/2012/texmf-dist/source/latex/koma-script/doc/english" Final name: "common-1.tex" (Flags used: kFSFileOperationDefaultOptions,kFSFileOperationSkipSourcePermissionErrors,kFSFileOperationCopyExactPermissions,kFSFileOperationSkipPreflight,k_FSFileOperationSuppressConversionCopy)

It appears it was taking a lot of time processing the texlive files. I had installed texlive in the past, but i haven’t used it at least in a year. So, before upgrading to Yosemite make sure to delete any unwanted applications for a faster upgrade!

I installed a fresh Ubuntu desktop guest on VirtualBox 4.3.14. Trying to install VBox additions I was getting an error message:

The headers for the current running kernel were not found. If the following
module compilation fails then this could be the reason.

Apparently all I had to do is reinstall the headers using the following two commands:

sudo apt-get remove dkms build-essential linux-headers-*
sudo apt-get install dkms build-essential linux-headers-$(uname -r)

Trying to reinstall Linux Additions for VirtualBox works like a charm!

KYRIAKOS ANASTASAKIS – ΚΥΡΙΑΚΟΣ ΑΝΑΣΤΑΣΑΚΗΣ

Category Archives: Experiments

Experimenting with Quarkus Native Image Generation for xChange

Finding last date of a record in MongoDB enforcing the use of indexes

Relational Structure

MongoDB Aggregation Framework

Very Fast Result Using MongoShell

Batch inserts with Spring Data and MySQL

FULL CODE

RESULT

When one minute is not one minute

Headers not Found in Fresh Ubuntu Installation on Virtual