Improved html opinion text

January 26th, 2008

For the html view of the case, I previously just used the unprocessed text output from the pdf to text converter.  I’ve written a parser which enables some formatting of the opinion text.  So, now you’ll be able to see paragraph breaks, page breaks, and footnotes a little better.  Hopefully it is something that I can keep improving over time.

Cite Rate

January 16th, 2008

Nodes were previously color coded according to degree.  The more connected a node, the more toward the red end of the spectrum the node appears.  The nodes also appear larger in the graph as they become more connected.  Therefore, the color coding and node sizing are redundant, encoding the same information:  degree.  Moreover, the degree of a node is also explicitly shown by the number of edges that radiate from it.

So, I began to think about whether the color coding might be able to put to use for something else.  In legal opinion citation networks, citations tend to be biased towards older cases–to some extent, how often a case is cited is a function of how long ago the case issued.  So, I thought it might be worth checking out encoding citation rate with color.

So, now, for each case appearing in the graph, Citegraph calculates its citation rate (the number of times the case is cited over the number of days the opinion has been issued).  This has the effect of taking the age of the case out of the citation frequency equation.  Cases with high citation rates appear towards the red end of the spectrum; cases with low citation rates appear towards the blue end of the spectrum.

All this provides a little more information to the user.  Now, the size of the node indicates its degree (how many cases it is connected to in the graph) and the color indicates the citation rate, something of an age independent citation barometer of the case.

Consequently, a relatively recent red case may represent a new, important decision from the Court.  The larger highly connected cases with intermediate citation frequency could represent the oldies but goodies.

Being user friendly

January 12th, 2008

I’ve gone back and forth on what the front page needs to do. Maybe it should explain what the site is all about. Or maybe something less than that.

But, right now, I’ve chosen the latter, and you are greeted with just a graph. One that you didn’t even generate.

And maybe that’s not the friendliest thing to do.

What is a brand new user to think? It’s really difficult for me to imagine, because I get this and have been working on it for a couple of years. But for a new user, especially a lawyer [who is not used to pictures with their legal research], I don’t imagine many of them do get it.

“What the heck is this? There’s no explanation of what’s going on here at all. What are these bouncy colorful circles? Why are some connected to others. I give up. I want my boring large corporate legal search engine back.”

But, I figure smart patent lawyers can figure it out sooner or later.

New Host, Site Makeover

January 11th, 2008

Over the last couple of weeks, I’ve been substantially reworking the site.

First, I’ve tried to make things more web standard-ish.  Goodbye tables, hello CSS.  While I agree, in principle, with most of the rationale for CSS, there was something to be said for the old fashioned table and tags that all browsers understood and handled consistently.  The same cannot be said for CSS.  Yet.

I’ve also significantly changed things on the backend.  I’m still using Kyle Scholz’s terrific jsviz package to do all the visualization stuff in browser, sans applet.  But I’ve rewritten some of the SQL and scripts handling query processing so that query results start to render a little quicker.  I’ve also finally figured out how to patch up some of the UI irritations I was having with jsviz, in particular, keeping track of where the cursor or node is within the graph display, which enables the case cite tooltip and node dragging to work as you would expect.

The changes also motivated me to find a new host, and I’ve happily landed at Media Temple with their base level grid-service hosting.  Sure, it’s expensive, but they’ve got some nice, slick UIs for account access and configuration, and the service seems, so far, to be a step or two quicker than before.  Also, they make it extremely easy to incorporate a blog into your site, explaining this blog’s sudden appearance.

On the to-do list is to try to figure out a way to whittle down the search results that are returned in the graph.  Right now, I just dump any case that is a text match to the query into the graph.  More often than not, that means a more nodes than jsviz and your computer can reasonably handle, and a lot of unconnected nodes.  The logical way to do it would be to select the top x-number of nodes by degree, but the graphs are such that figuring out what to do in the event of a tie is a problem.  For example, after taking the top 23 nodes, there might be 34 nodes with a degree of 1 left, and if I only want to return 30 nodes total, how do I choose among those 34 candidates for the final 7 spots?