Wednesday, April 06, 2005

mystery NullPointerException from Hibernate

So, I spent the entire weekend on what was expected to be a fairly simple application bug. See, there is this page that displays a list of summary information about Thingies. Each Thingie has several other associated objects such as Thing-a-ma-bobs, etc. If you went to the page and hit refresh in your browser several times, you would eventually get a NullPointerException (that went unhandled, by the way). Hit refresh again, the listing of Thingies would display. Hit refresh a few more times and you'd probably get the Exception again.

The thought was that Hibernate (or this application's implementation of Hibernate) had a leak somewhere or was just doing too much dang work to display this screen. The application owners wanted to reduce the amount of work for this page, likely by not loading all the associated objects (what the geeks like to call 'lazy-loading').

Well, I tried that. And the Exception still happened. With the need to understand what the real problem was so I could actually fix it, I spent plenty of time getting the code setup in my workspace. I mean a _L_O_T_ of time. Cyclic dependencies and duplicate classes abound. Ick. Multiple properties files with some correct database information and some INcorrect database information. Which was correct? Oh yeah, and did I mention I was grabbing the code from the WRONG directory for the first few hours? Oy! Just because you see the name of the app in a directory name does not mean that's the correct directory. Several hours later, it was ready. And then: I couldn't reproduce the problem. Double check-checked jar versions and all was identical. Okay, what's different? Well, the entire operating system for one. JVM versions were the same (though from different vendors). There was the fact that Eclipse debugging is a dog on my desktop, so it could just be a timing issue. My slow machine could be slowing it down just enough to appear like everything was okay.

So, with the code on my machine and the problem observable on the (ahem) Windows server with only FTP access, I set about adding logs here and there and manually uploading changed class files as needed. The fact that the FTP server dropped my connection after about 7 seconds of inactivity was an added "bonus". The added logs told me nothing I didn't already know. Tried running Tomcat in profiling mode on the server. The problem went away! That made me think it's gotta be a race condition buried somewhere. Profiling slowed things down enough that it no longer stepped on itself. Then I thought, "Well, let's prove or disprove that the code being called from this page is actually the problem". I created a load test to repeatedly call the module 50 times each on 5 different threads. Worked great on my machine. Worked great on the server. Why didn't I think of this earlier?!?

It can't be just that code, then. It's got to be a combination of things. I double-checked the application flow and there wasn't much else going on outside of the load-tested method.....just some basic Struts stuff. Now, this particular server is painfully out-of-date as far as Tomcat is concerned. Hmmmm....could be. I downloaded a close-to-latest version and "installed" it on the server. I began configuration, then realized that the port would be blocked and I wouldn't be able to run it. (Even if I had the rights, I'm no network admin.) What I didn't think of (and am kicking myself) was to shut down the other Tomcat instance and use it's port for the test. Anyway, I left 'upgrade Tomcat' as my last words in the effort to fix the problem. Know what? pssst....(that was it)

I guess the lesson here is that if you're using the "next big thing" persistence framework, everyone assumes you're using the nearly latest servlet engine. Otherwise, Google would've helped me out a long time ago. To all you other poor saps out there, I hope the magical 'G' will find this post and it will help you.

0 Comments:

Post a Comment

<< Home