I have recently started using MongoDB for a very demanding task. Storing all Forex pair ticks (e.g. every change in bid/ask prices). I know there are frameworks designed for this task (e.g. kdb+), but since I wanted to avoid the learning curve. Besides I already use Spring Data in my project and it works with a minimal number of changes for Mongo.
In Mongo I have a collection with more than 3.5 billion records (and growing) and I want to find the latest date for each pair. I tried using the aggregation framework of Mongo, but it doesn’t seem to use the indexes and takes ages (didn’t finish after one day).
Relational Structure
In relational DB the table structure would look something like:
id | pair | dateTime | bid | ask |
1 | EUR/USD | 2015-04-03 21:32:31.456 | 1.14141 | 1.14142 |
... | | ... | ... | ... |
Then you would have to run the following query:
SELECT t.pair, MAX(t.dateTime)
FROM tick_data t
GROUP by t.pair;
MongoDB Aggregation Framework
In MongoDB the document structure is the same. I am a very very novice user of Mongo, but I gather we could use the aggregation framework for this query:
db.tick_data.aggregate(
{$group:{_id:"pair", "maxValue": {$max:"dateTime"}}}
);
However, this takes ages, even though I have used a composite index on pair and dateTime.
Very Fast Result Using MongoShell
I tried using a sort of iterative approach using MongoShell:
db.tick_data.distinct( "pair" ).forEach(function(per_pair) {
var lastTickPerPair = db.tick_data.find({ "pair": per_pair }).sort({"dateTime": -1}).limit(1);
var lastTickOfPair = lastTickPerPair.hasNext() ? lastTickPerPair.next() : null;
print( "pair: " + lastTickOfPair.pair + ", dateTime:" + lastTickOfPair.dateTime);
}
);
This approach seems to use the composite index on pair and dateTime I defined and the results are lightning fast (for 3.5 billion records).
Maybe there are other ways, but after some digging around I couldn’t find any other method that would use indexes.