Search software is good and still a long way from perfect. Each new generation improves the average search while losing accuracy for specific searches.
The Google approach
Google used to lead the way back in the middle of the last decade. The Google style approach improves some searches then Web site owners learn how to fake the responses needed to direct Google searches to the wrong pages. Advertisers pay for fake links to create artificial popularity. Google gives preferences to their customers using Google analytics and other Google products.
Years ago Altavista was the best search engine then they went commercial and spoilt there own product. Fast replaced Altavista because Fast was both better in the way it searched and quicker than the bloated Altavista Web site. Fast left themselves open to replacement by using a similar page interpretation to Altavista and Fast was victim of the same fake pages.
Google replaced Fast because Google worked on links, something few people were faking at the time. Google indexed more pages than Fast. Google was slightly faster than Fast. Plus Google moved into the market before everybody switched to Fast. Google was blazingly fast compared to the still widely used Altavista.
Now Google is under attack because Google does not have a useful advanced search facility for focused searches. Faceted search is the current wave of new searches aimed at helping people refine searches in specific areas.
Apache Solr is adding a useful improvement. The old Web site searches, including the search built into Drupal, need a revamp. Google does not understand why a person visits a specific Web site or what the person wants in a search. The standard Web site search might pick up product attributes if built for a shop. Almost everything else fails. Solr adds an easy way to provide advanced search options based on the specific information at the Web site.
Think of a Web site selling framed panoramic photographs from around the world. You select the location
Great Barrier reef then select the tag
fish. Hundreds of photographs are listed. You type
coral into the search box then hit the search button. The search returns every photograph with a description containing the word coral, including thousands of photographs not from the Great barrier Reef and not tagged as fish.
Faceted search lets you keep your site specific selections in the search. If the site offers information classified by location, you select a location of interest and all subsequent searches will look only in that location. The same with size, color, all the usual product specifications, plus subjective tags.
If your site has articles tagged by visitors, you could select a visitor then select from their tags. A movie review site would let you pick a reviewer with similar a taste in movies then look at movies tagged by that reviewer.
The next wave
Then there is the next wave of search. The next wave has rippled through non Web site searches for years. Today the hardware is cheap enough to let the next wave loose on the Web. One big advantage of the next wave is the ability to add the next wave on top of existing searches.
Or, more specifically, the next wave is added into the data interpretation stage to build better information. Then the next wave is merged into the search request software to retain the quality of the request. The original search request is passed to the next wave then, optionally, to the old search based on results.
If the next wave search is too specific and does not find results, the old search can return a selection of results to show the searcher the type of information on the web site. This gives the searcher a guide to the value of the information on the site. Perhaps the site uses different terminology or does not host the required information.
Solr as a substitute
Solr is a big investment for small Web sites, too big, too time consuming to tune for a site, and when used to advantage, chews up more resources that the whole Web site. In the simplest terms, it costs more than it is worth.
The cost of Solr does not go up when you move to a larger site. The CPU processing overhead does go up but not the people cost. You spend a similar time configuring Solr, a similar time tuning Solr to make it useful, and a similar time analysing usage. If two sites are equally focused on one topic, their cost is the same despite one site having few web pages and few visitors.
When a large site covers many areas of interest, the site might require separate tuning for each area and that does drive up the cost. My site has Australia as one area, technology as another, then some aspects of technology as specific areas. If I used Solr to gain the full advantage that Solr offers, I would have to invest a big chunk of time tuning Solr for each area, a cost too great for a Web site that generates no income. My site is a way of answering the questions I am asked. I make some answers public because lots of people ask the the same questions and I can answer them once.
Some of the questions are asked tens of thousands of times. If I do not put the answer up on a web page, I would be answering the same question by email hundreds of times per week for years. Multiply that by hundreds of different questions.
Google takes most people to my pages. When Google fails and the question is asked again by email, I refine my articles to help Google. Adding Solr would not help visitors as much as making their first Google search more accurate.
The next wave of search is about making the first search as accurate as the second search you might perform when using a Solr style faceted search. I prefer to invest in the next wave instead of Solr because it reduces the number of searches.
The next wave will cost a million dollars or one hundred million dollars, depending on how it is developed. Big corporate America will invest the $100,000,000.00 then try to sell it for $10,000,000,000.00 as the next Google. The open source community could develop the next wave but it will take ten years, be fragmented, and most versions will try to reuse Solr, which will make the result too expensive for most Web site owners.
A single focused team, perhaps starting with two people working in a rapid development style, could perform the bulk of the work in a year. They would need lots of resources for short focused periods. One million dollars invested in the project would cover $300,000 per year for the three years of production, refinement, and diffusion into the community.
Unfortunately nobody is going to donate a million dollars to produce a free open product. We are unlikely to see a thousand sponsors donating a thousand dollars each. The billion dollar commercial project is the most likely direction for the next wave.
Solr is a good step forward for medium to large web sites focused on specific areas of interest, especially shops. The next wave is a better choice for most web sites, will take some initial investment to get example Web sites running, and can drive Solr if you choose to use both. What is needed is a million dollars invested in the next wave then the result donated for public use.