Archive for the ‘Architecture’ Category

August 13th, 2010

SQL Server Compact + Full-Text Search?

SQL Server Compact is a great in-process database engine: it’s free, lightweight, and xcopy deployable. However, since one of its selling points is its small footprint, Microsoft excludes from it many features that the full version of SQL Server has. One such feature is full-text search (FTS), which helps you build search engine capabilities into your applications.

So if SQL Compact doesn’t support full-text search, how do you implement it in your SQL Compact application? More specifically, how did we do it in Glo?

Plus, Glo holds textual information not only in relational tables, but also in lots of files external to the database. How does Glo retrieve search results from all of those files?

Glo’s search engine uses Lucene.NET, which is a C# port of the open-source Lucene search engine written by Doug Cutting. Lucene is an information-retrieval API that helps you build and read FTS indexes programmatically. Essentially, it can make any text searchable within your application.

There’s nothing wrong with executing LIKE ‘%term%’ queries on database tables for simple search functionality, although it’s likely to be slow. But in the same way that searching the index in the back of a book is faster than searching every page in the book, searching FTS indexes is faster than searching your entire database. A LIKE ‘%term%’ query that starts with a wildcard will usually force a table scan or at least an index scan, and will take much longer than an index seek.

A few years ago, a man asked me how it’s possible for Google to return so many results so quickly when he searches the Internet. “How can Google search the whole Internet so fast?” I explained to him that executing a search doesn’t search the Internet at all, it instead reads a set of pre-built index files, which, like the index at the back of the book, hold the answers all in one place.

Lucene uses the same principle to get search results. In fact, retrieving search results from FTS indexes is straightforward; getting the proper and latest data into the indexes is the part that takes creativity. I’ll expound on how Glo accomplishes that in a future post.

Props to my co-worker friend Adam Hill, who encouraged me to check Lucene out.

May 4th, 2010

No Clustered Indexes in SQL Server Compact

Contrary to what my copy of “Microsoft SQL Server 2005 Compact Edition” (Dhingra & Swanson, ©2008 SAMS Publishing) says twice on page 513, SQL Server Compact Edition only supports NONCLUSTERED indexes.

Clustered indexes aren’t always absolutely necessary, but for some queries they increase retrieval speed significantly. Having spent several weeks tuning Glo’s content database for performance, I was a little discouraged to read about this lack of support:

http://technet.microsoft.com/en-us/library/ms345331.aspx

The reference above is from 3.5 SP1, and the SP2 docs don’t allude to a change in this regard. Fortunately most (but not all) of Glo’s queries run extremely fast — sub-millisecond in places.

March 1st, 2010

Lesson Learned: WCF vs. NOT

For the first ~18 months of Glo’s development, it self-hosted a fairly large set of WCF services through which we pumped the data coming from SQL Compact over named pipes. Data access & entity-mapping layers sat behind five WCF endpoints used by the Presenters for data retrieval.

The reasoning behind this architecture was simple: it would be nice to leverage the same code for both the desktop and Silverlight apps, so let’s build it as a WCF service, then just point to a different URI to connect to the service based on the platform.

This worked fine until we started hitting WCF limits. Of course it’s simple to adjust MaxReceivedMessageSize, MaxItemsInObjectGraph, MaxArrayLength etc defaults, but we also saw we needed to code around perf issues when returning big data chunks. I dealt with tweaking limits for awhile, but after failing to fix WCF errors saying I had exceeded MaxNameTableCharCount, it made me start thinking about bailing on it altogether. The services appeared to be too big & had caused me one too many headaches.

I did tests to gauge potential benefit of eliminating WCF, and was sort of amazed to see the overhead they had imposed. Serialization & transport issues aside, I had attended Juval Lowy’s lecture on “Every Class a WCF Service” and came away thinking perf would not be terribly impacted.

So I spent a couple days replacing the services with a layer that loads data entities directly. This boosted backend performance by over 30%, and naturally eliminated associated headaches.

Moral of the story: even if I have to replicate Glo’s data layer as online WCF services for a web version, removing them from the desktop version was definitely worth it. LESSON LEARNED.

February 3rd, 2010

Team, Technologies, Tools

Here’s some background info as an introduction to the Glo Development Blog.  We’re an eight-man team, and most of us have been working on Glo since early 2008.  We develop in an Agile environment.  Since we live in a .NET developer-rich community, but cannot find any local WPF developers to join our team, we figure we must be the b[igg]est WPF team around.

Technologies Glo Uses:

  • Microsoft .NET Framework 3.5
  • C#, C++, WPF, CAB (Prism), Unity, LINQ
  • SQL Server Compact Edition 3.5
  • MapDotNet
  • Lucene.NET

Tools We Use To Develop Glo:

  • Visual Studio 2008
  • SQL Server 2008
  • Expression Blend
  • Enterprise Library 4.1
  • WCF, ASMX
  • WinForms, ASP.NET
  • ANTS Profilers
  • SQL Toolbelt
  • Sourcegear Vault