Mo-Search - Performance

All other things being equal, faster is better than slower. And need for Search speed was a significant motivating factor that spurred creation of Mo-Search.

Mo-Search 1.x, 2.x and 3.x all had a JET backend, and with 3 years of revisions + optimization it was too slow. It worked fine if you had a limited amount of files to index/search, but simply did not scale. Over time, this became a problem as users had more files and data they needed to search.

Mo-Search 4.x switched the backend (database format) to SQLCE 3.5, which worked better. This scaled a lot better than JET, and both Mo-Search Index and Query engines were rewritten. Performance improved further during 18 months of iterations, but still left a lot to be desired for larger amounts of files.

Mo-Search 5.x switched (again) to SQLCE 4.0. Also 5.x received a major architectural shift to horizontal partitions (FullText index spread over 27 tables, instead of 1). Over the course of 3 years, performance improved further. But these iterations (up to 5.6.2) still hit a performance bottleneck for really large sets of files (including an internal 1+ million file test corpus).

Mo-Search 6.x focus moved further to higher scalability + breaking the 4GB index barrier, through the following architectural changes. The first two improving performance; The last two building upon that new-found performance.

Data Sharding - Previously we’ve always had a single backend database index file (1.x through 5.x). Now it’s split into 3 files, each holding different segments of indexed data (and each of those 27-way horizontally partitioned). This enables breaking the 4GB index barrier (previous versions started having performance challenges around 3GB). And just as importantly performance scales much more linearly
Data De-duplication - Improves indexing scalability in environments where duplicate files exist, such as software development with multiple branches. This works by de-duplicating index data (in RAM) during Indexing, to reduce the amount of data written into the index. Then upon Searching, very specific data subsets are re-associated into an N-Way horizontal partition cache to facilitate proper ranking, multiple filters, all the things the search UI provides. (Initially the goal was to perform search re-association purely in RAM, but complexity started to spiral - may possibly revisit in 7.x).
AutoSearch - In Simple Search mode (only Text filter is visible), as you type... the results auto-populates. This feature has been on the back burner for years, so its about time.
AutoComplete - In Advanced Search mode (All Search filters: Text, Filename, Path, ModifedDate), as you type... all relevant indexed data is used to AutoComplete. Prior versions only used recent search terms within AutoComplete, and at most 400 recent items. Now *ALL* indexed words, filenames, and paths are used in AutoComplete for the corresponding search filter. If you have 1+ million files, that's your AutoComplete list within the Filename search filter. What about 400 million words in your index? No worries, that's your Text search filter AutoCompelete list. This feels a lot more productive to me, but... its a big change. (Like everything - feedback is requested and hugely appreciated!).

And as the corpus size (volume of files) increases, older versions slow further at an increasing rate. But just as critical as performance, much effort has gone into making Mo-Search usable and intuitive. Please try it, we expect you will agree.

Performance matters. Usability matters. And so does the constant work to keep moving forward.