I have recently started using MongoDB for a very demanding task. Storing all Forex pair ticks (e.g. every change in bid/ask prices). I know there are frameworks designed for this task (e.g. kdb+), but since I wanted to avoid the learning curve. Besides I already use Spring Data in my project and it works with a minimal number of changes for Mongo.

In Mongo I have a collection with more than 3.5 billion records (and growing) and I want to find the latest date for each pair. I tried using the aggregation framework of Mongo, but it doesn’t seem to use the indexes and takes ages (didn’t finish after one day).

Relational Structure

In relational DB the table structure  would look something like:

idpairdateTimebidask
1EUR/USD2015-04-03 21:32:31.4561.141411.14142
............

Then you would have to run the following query:

 

MongoDB Aggregation Framework

In MongoDB the document structure is the same. I am a very very novice user of Mongo, but I gather we could use the aggregation framework for this query:

However, this takes ages, even though I have used a composite index on pair and dateTime.

 

Very Fast Result Using MongoShell

I tried using a sort of iterative approach using MongoShell:

This approach seems to use the composite index on pair and dateTime I defined and the results are lightning fast (for 3.5 billion records).

Maybe there are other ways, but after some digging around I couldn’t find any other method that would use indexes.

It’s been ages since I have posted some sample code. It’s mainly because I don’t have time to collect and post sample code anymore. This once was a bit more challenging and googling wasn’t help much, so now that I have some time I though I would post some sample code that achieves batch inserts with spring data.

For example this link:
http://forum.spring.io/forum/spring-projects/data/118203-bulk-insert-with-crudrepository indicated that I had to manually get the session and iterate/flush (which was true when using Spring/Hibernate/JPA). But when using the CRUDRepository it appears it’s much simpler.

FULL CODE

Full code sample (maven project) can be found on github: https://github.com/cyrus13/anastasakis-net-sample-code/tree/master/spring-data-batch

You basically need to have the following elements:

  • Add: ?rewriteBatchedStatements=true to the end of the connectionstring.
  • Make sure you use a generator that supports batching in your entity. E.g.

  • Use the: save(Iterable<S> paramIterable); method of the JpaRepository to save the data.
  • Use the: hibernate.jdbc.batch_size configuration.

RESULT

So enabling the query log in MySQL:

we can see the following mysql code is executed:

Note: Don’t forget to stop logging statements into MySQL general log!

SET global general_log = 0;

Today I decided to update to OSX Yosemite. How much time would it take? 1 hour, 2hours? It appeared to get stuck to “one minute to finish” the installation for quite some time. During the upgrade process if you move the cursor to the top of the screen a menu will appear. One of the options is: “Show Log”. This option pops up a window that shows detailed information about the process.

In my case for around half an hour it was logging events like this:

It appears it was taking a lot of time processing the texlive files. I had installed texlive in the past, but i haven’t used it at least in a year. So, before upgrading to Yosemite make sure to delete any unwanted applications for a faster upgrade!

I installed a fresh Ubuntu desktop guest on VirtualBox 4.3.14. Trying to install VBox additions I was getting an error message:

The headers for the current running kernel were not found. If the following
module compilation fails then this could be the reason.

 

Apparently all I had to do is reinstall the headers using the following two commands:

 

Trying to reinstall Linux Additions for VirtualBox works like a charm!

I am using Logback to do most of my logging. I am using a third party library for some computation. Luckily the library uses slf4j, meaning it’s agnostic of the logging implementation.

The library caches two collections of items and depending on the circumstances the item I am looking for may be in any of the two collections. This is not important. What is important, is that the library for some reason logs an error if an item cannot be found in the first collection.

I wanted to get rid of this message because it was cluttering my logs. But I didn’t want to disable all logging from this specific class, because it may print some useful logs. I just wanted to get rid of this specific log. There is a way to do it Continue reading

I was reading a blog post the other day that in order to upgrade from OSX Lion to OSX Mavericks you need to first upgrade to mountain lion (paying the $20 upgrade fee). Checking the description by apple as pointed out here: http://www.apple.com/osx/how-to-upgrade/
it is evident that there is no such requirement:

You can upgrade to OS X Mavericks from Snow Leopard (10.6.8), Lion (10.7), or Mountain Lion (10.8).

So, I have backed up everything and I am going to start the upgrade process now!

Since Spring 3.2 it should be possible to use a qualifier in the “Async” annotation of a method, to indicate which specific executor to use. For example, I have the following class, that is supposed to collect the HTML from a website asynchronously:

HTMLFetcher Interface

TestHTMLFetcher Class

And the following excerpt from the context.xml file:

This didn’t work! I submitted 10 tasks to the executor and 10 thread, instead of 1 as I had instructed it had been created. I digged into the Spring code a bit and found the AnnotationAsyncExecutionInterceptor class that was doing tbe job of assigning tasks from methods annotated as async to executors. Putting  a breakpoint on its getExecutorQualifier method it became evident why it doesn’t work.

You need to annotate the interface method rather than the class with the:

annotation. In my example that is the getHTML method of the HTMLFetcher interface. It now works. I am not sure if it was done on purpose (i.e. if it’s part of the specification). I don’t have time to read the related documentation or search the Spring Jira. However, I would assume that the right place to put the annotation is the implementation. I may want to have two implementations of the same method, one decorated with “Async” that will be asynchronously executed and another without any annotation that will be synchronously executed.

I wanted to add a new device to my wireless network, but I had forgotten the wireless key. My network uses WEP encryption. So, I used aircrack to recover the key. Basically what I had to do was:

    • use airodump to save a large number of  transmissions between the wireless router and a device that is already connected to the network.
    •  use aircrack to analyse the file produced by airodump and find the password

In particular, I managed to recover the key in the following simple steps:

  1. Download BackTrack Linux Distribution and burn it on a DVD.
  2. Boot my laptop using the live DVD
  3. On command prompt type:
    ifconfig to see the available network interfaces in Linux. Doing this I was able to find my laptop’s wireless interface.
  4. Type: airodump-ng -write afile.cap wlan0 , where afile.txt is the file that airodump will save all communication and wlan0 is the wireless network interface of my laptop as discovered in step 3. Let it run and collect packets for quite some time. It may a few hours (in my case it took 6 hours) for this step to collect enough packets. The time it will require depends on the traffic of the network. The more traffic the better. Once enough packets have been collected press Ctr+c to kill the process.
  5. Type: ivstools -convert afile.cap afile.ivs to convert the captured packets to ivs format compatible with aircrack
  6. Use aircrack-ng afile.ivs. Aircrack will pop up a menu to ask you which network you want to crack. Select the SSID of your network and if enough packets have been collected in step 3, you will have the key of your wireless network in no time!