Summary

I had a major impact at SecondSpace. I introduced a lot of open source software into this Microsoft shop including:

Subversion for source control to replace Visual Source Safe.
Apache Solr to replace the SQL Server based browse/search backend.
Java for indexing documents into Solr.
HAProxy + Keepalived + Linux for a fully redundant load balancing solution.
Nagios for monitoring.

My major success at SecondSpace was replacing the entire search backend with Apache Solr. This was a massive undertaking that resulted in a 100x speedup of the search functionality on the website.

Details

I was the first non-Microsoft developer hired at SecondSpace. All of the other developers had come from Microsoft. It was a huge culture shock coming from Amazon and Sparkart where everything was Linux and Open Source. SecondSpace ran some real estate websites including Land Watch and Resort Scape which were built using C#, ASP.NET and SQL Server. I was eager to learn from the more experienced developers who were Microsoft veterans!

“Inference Engine”

There were a lot of buzz terms thrown around like “inference engine” ( even has a patent) and talk of doing all sorts of magical things with SQL Server stored procedures which all sounded very impressive. I was ready to learn! These guys claimed to have all sorts of advanced and sophisticated technology. They even had several patents.

Unfortunately, it was neither advanced nor sophisticated. In fact the whole setup seemed rather dumb. The big secret was that almost everything was done using SQL Server Stored Procedures. Sometimes these stored procedures were 2,000+ lines of mostly duplicated Transact-SQL code.

Microsoft Visual Source Safe?!

SecondSpace was using Microsoft Visual Source Safe (VSS) for version control. At the University of Washington we were always warned against using VSS since it was easy to corrupt the VSS database and that it had locking issues (which kind of defeats the purpose of using source control, right?!). However, since it was the official publicly available version control system from Microsoft, that is what SecondSpace was using. And sure enough, we ran into problems with database corruption and locking issues while using VSS at SecondSpace.

I had used CVS at Sparkart and at the University of Washington but had switched over to Subversion (the successor to CVS) for personal projects and was also playing around with Git by the time I started working at SecondSpace. Git would have been an obvious choice to switch to from VSS but unfortunately support for Windows was non-existant at the time. And I knew trying to pitch using command line Git via Cygwin was not going to fly.

So I pitched Subversion as a replacement for VSS since it had decent Windows support via TortoiseSVN. Subversion and TortoiseSVN were an easy sell since nobody liked using VSS. So I helped the company switch from VSS to Subversion.

Switching from Visual Source Safe to Subversion was my first open source win at SecondSpace.

One of the downsides of Subversion is that the versioning history is only stored on the Subversion server. So if you want to see the history of commits or history of a file you had to be connected to the server. So I actually ended up using git-svn via Cygwin which allowed me to keep a full copy of the Subversion repository history on my local development machine with support for bi-directional syncing (i.e. I could push and pull from the Subversion repository).

“Scrum”

“Scrum” at SecondSpace was not scrum.

SecondSpace claimed to follow scrum but had their own way of doing it. To me it seemed like the opposite of everything scrum and agile stands for. Here was what the process looked like as a developer:

I was handed a list of tasks that were mine to complete during the sprint.
The tasks already had estimates (which I had no input on).
I was expected to complete all of my tasks during the sprint regardless of whether the estimates were accurate or not.
I had no input on the tasks, estimates, prioritization, etc.

Seriously?! WTF?!

I expressed concerns to the scrum master, other developers and anybody else who would listen. But nobody seemed interested in changing the process. Maybe this was how they operated at Microsoft since they seemed set in their ways. It felt like you had two tiers of development team members:

A few senior people who would do all of the planning, discussing, estimating, prioritizing, etc.
The rest of the team who were bascially just code monkeys (which I will define as someone who just implements what they are told without asking any questions)

This was not how I liked to work at all.

Over two thousand lines of SQL in one stored procedure?!

Me: All of this duplicated SQL cannot be good. WTF?!

Architect: We need the duplicated code for performance reasons.

Customer: ...<waiting 30+ seconds for a results page to load>...

SecondSpace’s “secret sauce” was that almost all of the business logic was in SQL Server stored procedures. It was not uncommon for a stored procedure to contain 2000+ lines of SQL. And most of those lines were duplicated multiple times in the same stored procedure as well as across other stored procedures. This was a big red flag for me since it violates the Don’t Repeat Yourself (DRY) principal. If there is a bug in one of those duplicated statements that needs fixing then you also need to remember to apply that fix to all other copies of the statement.

The architect (who had written most of these SQL stored procedures) explained that everything was setup this way for performance. I questioned that assertion and expressed concern over having too much logic in the stored procedures. To me it seemed neither performant nor maintainable. The SQL queries (of which there were many per stored procedure) could have up to a dozen or more joins with many where conditions across multiple tables. Those just do not perform well and can end up doing full tables scans and/or making use of temporary tables.

Things Fall Apart

Sure enough, as more real estate property were added to the system and more users created accounts, things began to fall apart. Some queries were taking over 30 seconds to load!

The architect and some senior developers threw in simple caching solution (using SQL Server, of course) that would precompute results for some of the larger pages. The downside was that all personalization had to be turned off which was one of the big selling points of the website.

I think their longer term plan was to shard data across multiple SQL Servers or something that seemed like it didn’t solve the actual underlying problem.

Inverted Indexes

I knew there must be a better way to do things. Using complex SQL queries was never going to be fast. So I took it upon myself to figure out a more efficient way to do things because that was way more interesting than whatever else I was working on at the time. Each real estate property had attributes associated with it. Attributes could be things like:

Location (e.g. State, County, City, ZIP)
Type of property (e.g. land, single family home, condo)
House Attributes (e.g. square footage, bedrooms, bathrooms, stories, etc.)
Land Attributes (e.g. acreage, water access, recreational activities, etc.)

The search results page was basically just filtering on any number of these attributes. So I figured that if each attribute value had a list of real estate properties we could just intersect the list of matching properties for whatever attribute values we were interested in. Better yet, if each list of properties was just a sorted set of integers I found some research papers on efficiently intersecting sorted sets. Perfect!

I basically just discovered how an inverted index works.

I built a really simple prototype that would take our real estate data and build up in-memory indexes and then allow you to filter on whatever attribute values you wanted. The prototype was very simplistic and mostly focused on demoing the performance of the algorithms and inverted index structure. I excitedly showed my prototype to the Architect and a senior software developer thinking I had cracked the performance code. They seemed mildly intrigued but overall unimpressed. Maybe it was the simplistic nature of my demo or the fact that everything was in-memory?

I was undeterred. I knew I needed a more complete demo and something that was practical for production use. I started searching through open source projects to see if there was anything off-the-shelf that did what I was looking for. I came across Apache Solr but originally dismissed it because it was advertised as an “enterprise search engine”. The “enterprise” part usually makes me run away because “enterprisey” stuff is usually overly complicated. I also assumed it was more focused on freetext searching which is not what I was looking for. However, after some more research and seeing that Solr originated out of CNET where it was powering faceted search I decided I should give it a try.

I usually have to get my hands onto something to really understand what it is and Solr was no different. Once I had a Solr server up and running on my Windows development laptop I was able to index some sample data and play around with the query API.

Apache Solr

Apache Solr was exactly what I was looking for! It allowed me to index documents (e.g. real estate properties in my case) and then filter and facet on fields in that document. And best of all it was fast! I set out to build a more complete prototype. I created a Solr schema that matched what we needed for our real estate properties and indexed all of our data. I also threw together a Ruby on Rails front-end that mimicked our currently slow search experience.

I am sure I was supposed to be working on boring website bugs or features as part of the “scrum” process but this seemed more important and way more interesting.

This time I bypassed our Architect and senior developers and went straight to the CTO. I showed it to him and he suggested that I demo it to the entire company (it was a small company). So I did! I demoed my solution side-by-side with the existing website search results. The production website load times were all measured in seconds with a large search like “all land in montana” taking 20-30 seconds to load. My Ruby on Rails demo was taking ~300ms to query Solr, process the results and render the search results page. This was all from my underpowered developer laptop! I think a good portion of that ~300ms was just Ruby on Rails (which was a little on the slow side at the time).

Implementation Green Light!

After that meeting I got the go ahead to do a full implementation with one major constraint added by the Architect. The Solr solution had to be implemented as a SQL Server C# Stored Procedure that exactly matched the existing search results stored procedure. I could not just query Solr directly from the C#/ASP.NET front-end. Wait… what?!

My non-SQL Solr based solution still needed to go through SQL Server?! WTF?!

I could kind of see an argument being made for making it easy to switch back to the pure SQL solution but that could also be accomplished from the C#/ASP.NET side of things with a feature flag. I debated the issue with the Architect for a bit insisting that it added additional complexity and made no sense to implement it that way but finally gave up. This roadblock meant I was going to have to fully understand the 2000+ line SQL Server Stored Procedures and replicate all of the existing result sets that were being returned even if some of those results sets had nothing to do with my Solr implementation. It seemed like a lot more work to me but it wasn’t going to stop me. Challenge accepted!

I got to work for the next month or two implementing my Solr solution. They might have even let me do my own scrum estimations 😉.

Then I bascially re-implemented all browse and search functionality on the site.

The official SolrJ client library was written in Java so I used that as an excuse to write a simple Java app that would query SQL Server for all of the real estate properties and index them into Solr. Then I bascially re-implemented all browse and search functionality on the site. This included all of the “secret sauce” of personalizing the results (which used the Solr boosting capabilities). It was a lot of work and required several rounds of testing to work out all of the bugs and to make sure it very closely matched the existing SQL solution behavior. But in the end I had a SQL Server C# Stored Procedure that would query Solr and itself to form the various result sets that needed to be returned!

My Solr based implementation was over 100 times faster than the SQL Server implementation.

I helped successfully push my solution out to production and everything worked great! My Solr based implementation was over 100 times faster than the previous SQL Server implementation. The load on the production SQL Servers dropped to nothing 😉.

Innovation Stock Grant

I received an “Innovation Stock Grant” for my Solr work. I think it was the first Innovation Stock Grant ever given out at SecondSpace.

HAProxy

At some point the operations team bought a few Barracuda Load Balancers that they were having trouble making work. I think thay had returned them (otherwise I would have offered to help make them work) and I heard they were searching for alternative solutions. So I sent an email to the effect of:

“Give me two spare desktop computers and I will build you a fully redundant load balancing solution.”

I was a little surprised when they took me up on the offer. I think I gained a lot of credibility after delivering my Apache Solr based backend. I grabbed two spare desktop computers that were laying around the office, installed CentOS, HAProxy and Keepalived. I knew of HAProxy and Keepalived but this was my first time using them (however, I had previously used CARP on FreeBSD). After some configuration and testing in the office I presented my solution.

The solution was accepted and we drove those desktop computers down to the Fisher Plaza data center and installed them into one of our cabinets. After some additional testing they were put into production and load balancing traffic! Meanwhile, some proper rackmount servers were ordered to replace the temporary desktop systems.

Once the rackmount servers arrived the company insisted that I use Red Hat Linux instead of CentOS so they had someone they could pay. I thought that was silly, but whatever. So I setup those servers using Red Hat Linux, HAProxy and Keepalived. We swapped out the temporary desktop computers for the rackmount servers in the cabinet and everything worked great.

Another open source win!

Nagios

SecondSpace had some sort of monitoring system that seemed very limited and was slow to catch problems. I forget if it was homegrown or something off-the-shelf. Either way it wasn’t very good. So I took a spare desktop system, installed Linux on it and setup Nagios. Nagios can be a pain to configure but does a pretty good job once you get it all setup. It was certainly much better than the previous solution SecondSpace was using. My setup had more detailed monitoring and would catch issues much quicker!

Another open source win!

2008 Real Estate Crash

Just about the time I was starting to feel like I fit into the company after making major improvements to the backend (via Apache Solr) and successfully introducing lots of open source software (Subversion, HAProxy, Linux, Nagios, etc.) the 2008 Real Estate Crash hit. SecondSpace’s whole business model was real estate so things did not go well.

I was ultimately laid off from the company along with a bunch of other people. Shortly after the CEO and CTO also left. The company attempted to pivot by renaming itself to DataSphere Technologies and changing business models. It was eventually acquired for a much lower valuation compared to when I joined the company.

So much for those stock options being worth anything!

A few of us that left SecondSpace went on to form Frugal Mechanic.