RSS

org.htmlparser “Memory Leak”?

I have developed a component in Java that requires an HTML parser. The component goes through around 2000 webpages and gets some data.

It was quite easy to implement it using the org.htmlParser (http://htmlparser.sourceforge.net/). Even though some of the webpages are quite big (some of a size of up to a few hunders of MBs) the memory of the component seemed to grow inexplicably leading to a Java heap out of memory error. I spent a good deal of time trying to figure out the source of the leak thinking it was my code. After a few attempts to identify the problem, I used the IMB Support Assistant workbench and took a heap dump using the command:

jmap -dump:format=b,file=heap.bin processID

I was able to identify a lot of Finalizer objects referencing the org.htmlParser.lexer. This looks like a memory leak, where the garbage collector can’t collect the objects properly?

Well.. the fact of the matter is I haven’t spent an enormous amount of time reading the documentation and/or source code of the project.  It seems there is a close() method that can be called on the Page reference of the lexer and I haven’t used it. So, at the end of my method that does the parsing I added:

parser.getLexer().getPage().close();
parser.setInputHTML("");

The first statement closes the Page object. I added the second statement just to be on the safe side, even though it’s probably redundant.

And the “Memory Leak” seems to have vanished…


Installing Visual Editor on Eclipse Helios

I recently wanted to install Visual Editor on Eclipse Helios. Apparently there is an nice way, which works like a charm for me. For more details you can have a look here: http://sourceforge.jp/projects/tmdmaker/wiki/VisualEditor1.4.0ForHelios


Must have Eclipse plugins

From time to time I develop something on Eclipse. Apart from the usual Eclipse operations (build and test) that I do with the Eclipse Ant and JUnit plugins (they ship with Eclipse for Java), I also wanted to carry out some dependency analysis on existing code. Of course JDepend is the obvious option, but I found a nice article from IBM that suggests some additional plugins that help the coder carry out day to day tasks such as Complexity monitoring and Coding standard analysis.

The following is a summary of the plugins proposed by the article. Of course all of them are opensource/free.

Tool Purpose URL for Eclipse plugin
CheckStyle Coding standard analysis http://eclipse-cs.sourceforge.net/update/
Coverlipse Test code coverage http://coverlipse.sf.net/update
CPD Copy/Paste detection http://pmd.sourceforge.net/eclipse/
JDepend Package dependency analysis http://andrei.gmxhome.de/eclipse/
Metrics Complexity monitoring http://metrics.sourceforge.net/update

The full IBM article can be found here: http://www.ibm.com/developerworks/java/library/j-ap01117/


Exception using XMLConfiguration

I tried using the Apache XMLConfiguration to save the settings of an application I am developing to XML format. When I used it with Java 1.5 it all worked perfectly; however if I tried using it with Java 1.6 I was getting the following exception.


Exception in thread "main" java.lang.AbstractMethodError: org.apache.xerces.dom.DocumentImpl.getXmlStandalone()Z
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.setDocumentInfo(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO.parse(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown Source)
at org.apache.commons.configuration.XMLConfiguration.save(XMLConfiguration.java:880)
at org.apache.commons.configuration.AbstractHierarchicalFileConfiguration$FileConfigurationDelegate.save(AbstractHierarchicalFileConfiguration.java:454)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:546)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:513)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:491)
at org.apache.commons.configuration.AbstractFileConfiguration.save(AbstractFileConfiguration.java:403)
at org.apache.commons.configuration.AbstractHierarchicalFileConfiguration.save(AbstractHierarchicalFileConfiguration.java:199)

After doing some search on google I found that this can be a problem with the xerces XML library (xercesImpl.jar). Checking the 3rd party libraries I am using in my project, I found a library that was using and distributing an earlier version of xercesImpl.jar.

I downloaded the latest version (2.91) of xercesImpl.jar from the Apache xerces project (http://xerces.apache.org/mirrors.cgi#binary) and replaced in the 3rd party library I was using the xercesImpl.jar file with the one I downloaded. Now everything seems to work fine, even in Java 1.6.  :-)


Log4j result in JTextPane

I recently wanted to redirect the output of Log4j to a JTextPane, so as to output messages to the JTextPane using a different color depending on the severity.

I spent some time searching for information and I quickly put together a small example. The following screenshot shows the output.

screenshot.jpg

The source code is available here.