I have recently started using MongoDB for a very demanding task. Storing all Forex pair ticks (e.g. every change in bid/ask prices). I know there are frameworks designed for this task (e.g. kdb+), but since I wanted to avoid the learning curve. Besides I already use Spring Data in my project and it works with a minimal number of changes for Mongo.

In Mongo I have a collection with more than 3.5 billion records (and growing) and I want to find the latest date for each pair. I tried using the aggregation framework of Mongo, but it doesn’t seem to use the indexes and takes ages (didn’t finish after one day).

Relational Structure

In relational DB the table structure  would look something like:

idpairdateTimebidask
1EUR/USD2015-04-03 21:32:31.4561.141411.14142
............

Then you would have to run the following query:

 

MongoDB Aggregation Framework

In MongoDB the document structure is the same. I am a very very novice user of Mongo, but I gather we could use the aggregation framework for this query:

However, this takes ages, even though I have used a composite index on pair and dateTime.

 

Very Fast Result Using MongoShell

I tried using a sort of iterative approach using MongoShell:

This approach seems to use the composite index on pair and dateTime I defined and the results are lightning fast (for 3.5 billion records).

Maybe there are other ways, but after some digging around I couldn’t find any other method that would use indexes.

It’s been ages since I have posted some sample code. It’s mainly because I don’t have time to collect and post sample code anymore. This once was a bit more challenging and googling wasn’t help much, so now that I have some time I though I would post some sample code that achieves batch inserts with spring data.

For example this link:
http://forum.spring.io/forum/spring-projects/data/118203-bulk-insert-with-crudrepository indicated that I had to manually get the session and iterate/flush (which was true when using Spring/Hibernate/JPA). But when using the CRUDRepository it appears it’s much simpler.

FULL CODE

Full code sample (maven project) can be found on github: https://github.com/cyrus13/anastasakis-net-sample-code/tree/master/spring-data-batch

You basically need to have the following elements:

  • Add: ?rewriteBatchedStatements=true to the end of the connectionstring.
  • Make sure you use a generator that supports batching in your entity. E.g.

  • Use the: save(Iterable<S> paramIterable); method of the JpaRepository to save the data.
  • Use the: hibernate.jdbc.batch_size configuration.

RESULT

So enabling the query log in MySQL:

we can see the following mysql code is executed:

Note: Don’t forget to stop logging statements into MySQL general log!

SET global general_log = 0;

Since Spring 3.2 it should be possible to use a qualifier in the “Async” annotation of a method, to indicate which specific executor to use. For example, I have the following class, that is supposed to collect the HTML from a website asynchronously:

HTMLFetcher Interface

TestHTMLFetcher Class

And the following excerpt from the context.xml file:

This didn’t work! I submitted 10 tasks to the executor and 10 thread, instead of 1 as I had instructed it had been created. I digged into the Spring code a bit and found the AnnotationAsyncExecutionInterceptor class that was doing tbe job of assigning tasks from methods annotated as async to executors. Putting  a breakpoint on its getExecutorQualifier method it became evident why it doesn’t work.

You need to annotate the interface method rather than the class with the:

annotation. In my example that is the getHTML method of the HTMLFetcher interface. It now works. I am not sure if it was done on purpose (i.e. if it’s part of the specification). I don’t have time to read the related documentation or search the Spring Jira. However, I would assume that the right place to put the annotation is the implementation. I may want to have two implementations of the same method, one decorated with “Async” that will be asynchronously executed and another without any annotation that will be synchronously executed.

I wanted to add a new device to my wireless network, but I had forgotten the wireless key. My network uses WEP encryption. So, I used aircrack to recover the key. Basically what I had to do was:

    • use airodump to save a large number of  transmissions between the wireless router and a device that is already connected to the network.
    •  use aircrack to analyse the file produced by airodump and find the password

In particular, I managed to recover the key in the following simple steps:

  1. Download BackTrack Linux Distribution and burn it on a DVD.
  2. Boot my laptop using the live DVD
  3. On command prompt type:
    ifconfig to see the available network interfaces in Linux. Doing this I was able to find my laptop’s wireless interface.
  4. Type: airodump-ng -write afile.cap wlan0 , where afile.txt is the file that airodump will save all communication and wlan0 is the wireless network interface of my laptop as discovered in step 3. Let it run and collect packets for quite some time. It may a few hours (in my case it took 6 hours) for this step to collect enough packets. The time it will require depends on the traffic of the network. The more traffic the better. Once enough packets have been collected press Ctr+c to kill the process.
  5. Type: ivstools -convert afile.cap afile.ivs to convert the captured packets to ivs format compatible with aircrack
  6. Use aircrack-ng afile.ivs. Aircrack will pop up a menu to ask you which network you want to crack. Select the SSID of your network and if enough packets have been collected in step 3, you will have the key of your wireless network in no time!

I run into an investment advice book printed in 2002 (i.e. before the recession) and the preface starts like this: “(The US people should be) positive about buying or refinancing a home at historically low interest rates or buying a new car under new no-interest offers”. Obviously the recession came later on and everyone who invested in homes (and cars) lost a good deal of money.Moreover, many articles about the recession and its effect on the housing market mentioned that when interest rates are so low, there’s one way to…UP.

So..I was wondering whether I should spend my valuable time to read the rest 280 pages … I guess not… 🙂

Zenbe Lists has been down for the past few days and I am losing track of my things todo. Had a huge list of things to do on my days off, but…  Anybody got any suggestions for a TODO lists program that works with iOS 3.0.1 and will sync with a PC? Guess I could always try a pen and a piece of paper, but I am afraid it will take me some time to remember how to use this archaic user interface.

Recently I’ve been playing around with a webbrowser control to automate my interaction with a website for testing purposes. I am using C# and DOT NET.

I found it a bit difficult to change the value of an HTML combobox (i.e. dropdown), but as it turns out it’s rather easy.I used the following code to change the value of a combobox named: “test” to the value 12.

This code was used for the following HTML:

Finally the following code was required to press the “Submit” button.
 

I have a Windows XP installation on a VMWare hard disk. Today I tried to boot it, but… OOPS (no.. I don’t mean Object Oriented Programming and Systems… I mean..crap!). It seems I forgot the password of the installation. So a little adventure started…

1. After a bit of research I found out that there is ophcrack. I downloaded the live CD as an ISO image and set VMWare to load that CD.

2. When VMWare starts and before windows starts booting I clicked on the VMWare screen and pressed ESC. This gives me the menu to select the device I want to use to boot.

3. I chose to boot from the CD.

4. The ophcrack live CD starts loading, but when it finishes I get a:  “No partition containing hashes found” error.

5. The problem seems to be that the Windows installation is on a SCSI virtual disk that is not recognised by this distribution of linux. Tried “fdisk -l” on a terminal from within the ophcrack live CD and it didn’t return any results.  To be able to crack the password I need to have access to the “WIndows/System32/config/ folder of my virtual hard disk. So…

6.  I created a second virtual hard disk in the same VMWare virtual machine. I downloaded an ISO image of Ubuntu

7. Installed Ubuntu on the newly created hard disk.

8. Boot using Ubuntu. Ubuntu was able to access the virtual hard disk of the windows installation. I copied the folder “WIndows/System32/config/” on my local Windows 7 installation.

9. Downloaded ophcrack for Windows and installed it on my Windows 7. Also downloaded the XP Free Small Table.

10. Launched ophcrack and clicked on “Tables”->Install and selected the folder where I had downloaded the XP Free Small Table file (if it is a zip file you need to unzip it).

11. Select Load->Encypted SAM and select the “config” folder I had copied from the VMWare installation (through Ubuntu).

12. Got my password in 45 seconds!!

I have developed a component in Java that requires an HTML parser. The component goes through around 2000 webpages and gets some data.

It was quite easy to implement it using the org.htmlParser (http://htmlparser.sourceforge.net/). Even though some of the webpages are quite big (some of a size of up to a few hunders of MBs) the memory of the component seemed to grow inexplicably leading to a Java heap out of memory error. I spent a good deal of time trying to figure out the source of the leak thinking it was my code. After a few attempts to identify the problem, I used the IMB Support Assistant workbench and took a heap dump using the command:

jmap -dump:format=b,file=heap.bin processID

I was able to identify a lot of Finalizer objects referencing the org.htmlParser.lexer. This looks like a memory leak, where the garbage collector can’t collect the objects properly?

Well.. the fact of the matter is I haven’t spent an enormous amount of time reading the documentation and/or source code of the project.  It seems there is a close() method that can be called on the Page reference of the lexer and I haven’t used it. So, at the end of my method that does the parsing I added:

parser.getLexer().getPage().close();
parser.setInputHTML("");

The first statement closes the Page object. I added the second statement just to be on the safe side, even though it’s probably redundant.

And the “Memory Leak” seems to have vanished…