import_linkedin « KYRIAKOS ANASTASAKIS – ΚΥΡΙΑΚΟΣ ΑΝΑΣΤΑΣΑΚΗΣ

I have recently started using MongoDB for a very demanding task. Storing all Forex pair ticks (e.g. every change in bid/ask prices). I know there are frameworks designed for this task (e.g. kdb+), but since I wanted to avoid the learning curve. Besides I already use Spring Data in my project and it works with a minimal number of changes for Mongo.

In Mongo I have a collection with more than 3.5 billion records (and growing) and I want to find the latest date for each pair. I tried using the aggregation framework of Mongo, but it doesn’t seem to use the indexes and takes ages (didn’t finish after one day).

Relational Structure

In relational DB the table structure would look something like:

id	pair	dateTime	bid	ask
1	EUR/USD	2015-04-03 21:32:31.456	1.14141	1.14142
...		...	...	...

Then you would have to run the following query:

SELECT  t.pair, MAX(t.dateTime)
FROM tick_data t
GROUP by t.pair;

MongoDB Aggregation Framework

In MongoDB the document structure is the same. I am a very very novice user of Mongo, but I gather we could use the aggregation framework for this query:

db.tick_data.aggregate(
    {$group:{_id:"pair", "maxValue": {$max:"dateTime"}}}
);

However, this takes ages, even though I have used a composite index on pair and dateTime.

Very Fast Result Using MongoShell

I tried using a sort of iterative approach using MongoShell:

db.tick_data.distinct( "pair" ).forEach(function(per_pair) { 
  var lastTickPerPair =  db.tick_data.find({ "pair": per_pair }).sort({"dateTime": -1}).limit(1);
 var lastTickOfPair = lastTickPerPair.hasNext() ? lastTickPerPair.next() : null;
  print( "pair: " + lastTickOfPair.pair + ", dateTime:" + lastTickOfPair.dateTime); 
  } 
 );

This approach seems to use the composite index on pair and dateTime I defined and the results are lightning fast (for 3.5 billion records).

Maybe there are other ways, but after some digging around I couldn’t find any other method that would use indexes.

It’s been ages since I have posted some sample code. It’s mainly because I don’t have time to collect and post sample code anymore. This once was a bit more challenging and googling wasn’t help much, so now that I have some time I though I would post some sample code that achieves batch inserts with spring data.

For example this link:
http://forum.spring.io/forum/spring-projects/data/118203-bulk-insert-with-crudrepository indicated that I had to manually get the session and iterate/flush (which was true when using Spring/Hibernate/JPA). But when using the CRUDRepository it appears it’s much simpler.

FULL CODE

Full code sample (maven project) can be found on github: https://github.com/cyrus13/anastasakis-net-sample-code/tree/master/spring-data-batch

You basically need to have the following elements:

Add: ?rewriteBatchedStatements=true to the end of the connectionstring.

Make sure you use a generator that supports batching in your entity. E.g.

@Id
@GeneratedValue(generator = "generator")
@GenericGenerator(name = "generator", strategy = "increment")

Use the: save(Iterable<S> paramIterable); method of the JpaRepository to save the data.
Use the: hibernate.jdbc.batch_size configuration.

RESULT

So enabling the query log in MySQL:

SET global general_log = 1;
SET global log_output = 'table';

we can see the following mysql code is executed:

SET autocommit=0;
select max(id) from ExampleEntity;
SHOW WARNINGS;
select @@session.tx_read_only;
insert into ExampleEntity (exampleText, id) values 
('de32bec8-1cf9-4f14-b816-0ab7a00b1539', 4),
('c0c85b32-eb2d-4a69-ade4-ac70ea94241c', 5);
commit;
SET autocommit=1;

Note: Don’t forget to stop logging statements into MySQL general log!

SET global general_log = 0;

Since Spring 3.2 it should be possible to use a qualifier in the “Async” annotation of a method, to indicate which specific executor to use. For example, I have the following class, that is supposed to collect the HTML from a website asynchronously:

HTMLFetcher Interface

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.Date;
import java.util.concurrent.Future;

import org.springframework.scheduling.annotation.Async;

public interface HTMLFetcher {

Future&lt;HTMLFetcher.HTMLFetcherResult&gt; getHTML(String baseUrl,Date date);

interface HTMLFetcherResult {

String getHTMLResult();

Date getDate();
}
}

TestHTMLFetcher Class

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.Date;
import java.util.concurrent.Future;

import org.apache.log4j.Logger;
import org.springframework.scheduling.annotation.Async;
import org.springframework.scheduling.annotation.AsyncResult;
import org.springframework.stereotype.Component;

import agonesgr.html.HTMLFetcher;

@Component
public class TestHTMLFetcher implements HTMLFetcher{

@Async(value="htmlFetcherExecutor")
public Future&lt;HTMLFetcher.HTMLFetcherResult&gt; getHTML(String baseUrl, Date date) {
try {
System.out.println("Before execute!!");
Thread.sleep(100000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
HTMLFetcher.HTMLFetcherResult r = new TestHTMLFetcher.HTMLFetcherResultImpl("",new Date());

return new AsyncResult&lt;HTMLFetcherResult&gt;(r);
}

private static class HTMLFetcherResultImpl implements HTMLFetcherResult{

private String htmlResult;
private Date date;

public HTMLFetcherResultImpl(String htmlResult, Date date) {
super();
this.htmlResult = htmlResult;
this.date = date;
}

public String getHTMLResult() {
return htmlResult;
}

public Date getDate() {
return date;
}
}
}

And the following excerpt from the context.xml file:

This didn’t work! I submitted 10 tasks to the executor and 10 thread, instead of 1 as I had instructed it had been created. I digged into the Spring code a bit and found the AnnotationAsyncExecutionInterceptor class that was doing tbe job of assigning tasks from methods annotated as async to executors. Putting a breakpoint on its getExecutorQualifier method it became evident why it doesn’t work.

You need to annotate the interface method rather than the class with the:

@Async(value="htmlFetcherExecutor")

annotation. In my example that is the getHTML method of the HTMLFetcher interface. It now works. I am not sure if it was done on purpose (i.e. if it’s part of the specification). I don’t have time to read the related documentation or search the Spring Jira. However, I would assume that the right place to put the annotation is the implementation. I may want to have two implementations of the same method, one decorated with “Async” that will be asynchronously executed and another without any annotation that will be synchronously executed.

I wanted to add a new device to my wireless network, but I had forgotten the wireless key. My network uses WEP encryption. So, I used aircrack to recover the key. Basically what I had to do was:

use airodump to save a large number of transmissions between the wireless router and a device that is already connected to the network.
use aircrack to analyse the file produced by airodump and find the password

In particular, I managed to recover the key in the following simple steps:

Download BackTrack Linux Distribution and burn it on a DVD.
Boot my laptop using the live DVD
On command prompt type:
ifconfig to see the available network interfaces in Linux. Doing this I was able to find my laptop’s wireless interface.
Type: airodump-ng -write afile.cap wlan0 , where afile.txt is the file that airodump will save all communication and wlan0 is the wireless network interface of my laptop as discovered in step 3. Let it run and collect packets for quite some time. It may a few hours (in my case it took 6 hours) for this step to collect enough packets. The time it will require depends on the traffic of the network. The more traffic the better. Once enough packets have been collected press Ctr+c to kill the process.
Type: ivstools -convert afile.cap afile.ivs to convert the captured packets to ivs format compatible with aircrack
Use aircrack-ng afile.ivs. Aircrack will pop up a menu to ask you which network you want to crack. Select the SSID of your network and if enough packets have been collected in step 3, you will have the key of your wireless network in no time!

I run into an investment advice book printed in 2002 (i.e. before the recession) and the preface starts like this: “(The US people should be) positive about buying or refinancing a home at historically low interest rates or buying a new car under new no-interest offers”. Obviously the recession came later on and everyone who invested in homes (and cars) lost a good deal of money.Moreover, many articles about the recession and its effect on the housing market mentioned that when interest rates are so low, there’s one way to…UP.

So..I was wondering whether I should spend my valuable time to read the rest 280 pages … I guess not… 🙂

KYRIAKOS ANASTASAKIS – ΚΥΡΙΑΚΟΣ ΑΝΑΣΤΑΣΑΚΗΣ

Tag Archives: import_linkedin

Finding last date of a record in MongoDB enforcing the use of indexes

Relational Structure

MongoDB Aggregation Framework

Very Fast Result Using MongoShell

Batch inserts with Spring Data and MySQL

FULL CODE

RESULT

Spring Async Method Qualifier to Specify Executor Issue

Using aircrack to crack your wireless network

Investment Advice Book…Should I go Ahead and Read it?