- PeterMoulding.com
- Author
- Trainer
- Speaker
- Business Coach
- How to write a How To book
- PHP Courses
- Speaking
- Web Architect
- Australia
- Books
- Authors
- Akkana Peck
- Alex Berenson
- Andrew Nugent
- Ben Sanders
- Brock Clarke
- Chris Simms
- David Mercer
- Dianna Mullet
- Don Winslow
- Dori Smith
- Harlan Coben
- Jack McDevitt
- James Wines
- Jerry Yudelson
- John Grisham
- Kevin Mullet
- L. E. Modesitt Jr.
- Laurell K. Hamilton
- Marshall Karp
- Martina Cole
- Michael Marshall Smith
- Michel Roux Jr
- Nadia Sawalha
- Philip Pullman
- Raymond Khoury
- Richard North Patterson
- Robert Masello
- Sally Roth
- Sarah Langan
- Stella Rimington
- Stephen Booth
- Stephen King
- Stephen Leather
- T.C. Boyle
- Tom Negrino
- Tony Hillerman
- Urban Waite
- Val McDermid
- Valerio Massimo Manfredi
- Beginning GIMP
- Beginning Visual C++
- Culturalism
- Fiction
- A Drink Before The War
- A Talent for War
- Bag of Bones
- Blood and Ice
- Burn
- Dark Lady
- Dead Line
- Eclipse
- Empress of Eternity
- Exley
- Flipping Out
- Just One Look
- Nightfall
- Pet Sematary
- Savage Moon
- Skinwalkers
- Starvation Lake
- The Fallen
- The Gardens of the Dead
- The Jump
- The Last Templar
- The Mermaids Singing
- The Midnight Mayor
- The Secret Soldier
- The Summons
- The Terror of Living
- The Testament
- The Tower
- Under the Dome
- Virus
- AJAX and PHP
- Aging with Grace
- Food books
- Green Architecture
- Life Is So Good
- SQL: The Complete Reference
- The Backyard Bird Lover's Ultimate How-to Guide
- The Garden Gurus
- Authors
- Sustainability
- -18 hours left to decide the future of Australia
- Campbells vegetable stock or Massel vegetable stock?
- Carbon Sequestration
- Carbon tax for Australia is a fraud
- Copenhagen will fail
- Cost of living in Australia
- Dick Smith jumps on the population bandwagon
- Dry Run: Preventing the Next Urban Water Crisis
- Energy Saving Lights
- Garlic
- How many people can live in Australia?
- Its obsolete, throw it out!
- Julia Gillard offers 9.9 billion dollars bribe to Rob Oakeshott
- Laundry detergent
- Petrol or Diesel?
- Reflective foil batts kill
- RoHS
- Sea level to rise 3mm due to climate change
- Solar power
- Spring again in Sydney
- Sustainable fuels
- The CRUD Tax is back
- The people who make building regulations do not own houses
- Water efficiency
- Which insulation is safer, foil or wool?
- Will Australia reduce greenhouse gas emissions?
- Technology
- Android or Blackberry or iPhone or a flip phone?
- Apple versus Google 2011
- Cameras
- Cars
- Colour
- Burgundy
- Colour Blindness
- Colour Names
- Dulux colours
- Pantone colours
- Safe Colours
- Seculine ProDisk Mini colour balance card
- What Causes Colour Blindness?
- Hardware
- Batteries for the Digital Age
- Cables
- Cases
- Computer reliability
- Computrace
- Disks
- Astone ISO Gear 481E
- Best SSD for your notebook computer
- Disk block size
- Hitachi disk HDS722020ALA330
- LaCie USB 2.0 250 GB mobile hard drive design by F.A. Porsche
- SMART disk
- Samsung 2 TB HD204UI quiet low power disk for mass storage
- Seagate and Samsung merge disk business
- Select the right disk for your RAID array
- USB disk speed
- Western Digital WD20EARX 2 GB SATA 3 disk
- How long should computer hardware last?
- Keyboards
- Mainframe
- Memory cards
- Monitors
- Netbooks, notebooks, tablets, and xPads
- Network Attached Storage
- OLED Displays
- PC's are a thing of the past
- Printers
- Quiet
- Samsung Galaxy S
- Speed
- Television
- Tools
- USB
- Worst computer movies
- Xserve is dead. What next?
- Your backup will not work
- Z68 motherboards
- iPad or Acer Aspire One?
- IQ
- LG Intello Washing Machine
- Lack of a challenge
- Networks
- 802.11n wireless networking
- D-Link DIR-655 wireless router
- D-Link DWA-160 Xtreme N dual band USB adapter
- D-Link DWA-556 Xtreme N PCI Express desktop adapter
- MIMO
- NBN spends another $12 billion of our tax money on nothing
- National Broadband Network
- Netgear wireless modem router DGND3300 with 300 Mbps 802.11n
- Refrigerator kills wireless broadband
- Small Wireless Network
- TP-LINK TL-SG10005D 5 port gigabit switch
- TP-Link TL-WR1043N wireless N gigabit router
- Telstra Pre-paid Mobile Wi-Fi
- Where are the router plus proxy server combinations?
- Open Source documentation
- Software
- 7-zip
- Accounting
- Asterisk
- Audacity
- Backup software
- Bloat only in Windows
- CAD
- CDex
- Disk imaging software for copying and backup
- Exact Audio Copy
- Filezilla
- Firefox
- Java
- LibreOffice or OpenOffice?
- Linux
- 1 in 5 servers will ship with Linux
- Android phones outsell iPhone
- Another Move to Linux
- CentOS 5.5 installation on SSD and RAID 5
- Debian
- Debian 5.0.5 AMD64 installation
- Debian 5.06 installation
- Fedora
- Fedora or Ubuntu?
- Gnome or KDE?
- K9copy
- Linux 2.6.38
- Linux Gnome login settings lost
- Linux Mint
- Linux RAID, a rant
- Linux Speed
- Linux Time
- Linux reliability as demonstrated by Ubuntu 10.10
- Linux reliability as demonstrated by Ubuntu 11.4
- Linux still a struggle in 2011
- Linux workstation disk RAID 1
- Linux, NT, Windows, and SETI
- Linux, three years of progress
- London Stock Exchange switches to Linux
- Mandrake Linux 9.2
- The partition is misaligned by 48128 bytes - warning from Linux RAID
- Ubuntu
- How to fix the scroll bars in Ubuntu 11.4 Gnome
- Kubuntu 10.10 alternate installation on desktop with RAID 1
- POWbuntu
- Ubuntu 10.10 after 6 months use
- Ubuntu 10.10 alternate installation
- Ubuntu 10.10 desktop RAID 1
- Ubuntu 10.10 desktop RAID 5
- Ubuntu 10.10 desktop install on a netbook
- Ubuntu 10.10 desktop installation
- Ubuntu 10.10 netbook install on a netbook
- Ubuntu 10.10 server AMD64
- Ubuntu 10.10 upgrade to version 11.4 beta 2
- Ubuntu 10.4
- Ubuntu 11.10
- Ubuntu 11.10 first upgrade
- Ubuntu 11.4 after one month use
- Ubuntu 12.04 beta1 desktop amd64
- Ubuntu One
- Ubuntu by Microsoft?
- Ubuntu desktop upgrade 10.4 to 10.10 failed because I did not check the media
- Ubuntu strikes again
- Upgrade Ubuntu to Linux Mint 12 LDXE for extra speed
- Yes, use Linux but not that distribution!
- Nero
- OpenOffice
- OpenOffice is now Apache Office
- Project management
- Scribus
- Software for Windows and Linux
- Text editors
- Time
- Todo applications
- Tomboy notes
- Top text editors
- Version control
- VideoLAN VLC media player
- Visio
- Webmin
- Webmin installation on CentOS for Web development
- Webmin installation on Ubuntu
- What is the most popular open source software today?
- Windows
- Another Windows person goes Linux
- BAD_POOL_CALLER
- Cygwin
- Microsoft Malicious Software Removal Tool cannot find a common virus
- One of the developers of Windows XP is criminally insane
- There are unused icons on your desktop
- W32time
- Which Windows version?
- Windows 7 Home Premium
- Windows XP Stop 0x0000007B during installation
- Windows XP is a disaster
- Windows processes
- XML
- Zip, bzip, gzip, or 7zip?
- configFree
- Technology Succession Planning
- VoIP
- Web Sites
- Drupal
- Do Drupal themes have to use the GPL?
- Drupal 7
- A better search facility for Drupal
- Drupal - performance or flexibility
- Drupal 7 Fields are hard to fix
- Drupal 7 new features
- Drupal 7 ships on January 5
- Drupal 7.14
- Drupal 7.4 hits PeterMoulding.com
- Drupal function sequence
- The evolution of a module
- Undefined index: headers in DefaultMailSystem->mail() (line 54 of /modules/system/system.mail.inc).
- Undefined index: to in DefaultMailSystem->mail() (line 83 of /modules/system/system.mail.inc).
- implode(): Invalid arguments passed in DefaultMailSystem->format() (line 23 of /modules/system/system.mail.inc).
- Drupal 8
- Drupal Code Load Cut
- Drupal How To
- Drupal Modules
- Backup and Migrate
- Browscap
- CKEditor with Drupal WYSIWYG
- Captcha
- Cel
- Colorbox
- Content Construction Kit
- Content type
- Devel module for Drupal
- Drupal Rules as an automation language
- Drupal Spam add-on module
- Form alter to node
- IMCE
- IMCE Wysiwyg bridge
- ImageAPI
- Jdog
- Lightbox2
- Module variable
- Node Gallery Access
- Node_Gallery
- Path
- Path redirect
- Pathauto
- Pet
- Search
- Service links
- Session Variable
- Statistics
- Taxonomy
- Token
- Token ex
- Transliteration
- Trigger
- Watch
- Other modules
- Drupal Training
- Drupal access controls need a major rewrite
- Drupal coding tricks
- Drupal performance
- Drupal themes for the future
- Drupal.org colours
- Import existing data into Drupal
- Multiple Web sites made easy using Drupal multisite and the right start
- drupal_lookup_path()
- Adobe PDF
- Apache
- Apache Mahout
- Audi.com
- Bleet
- CSS Strikes Again
- CSS or xCSS
- Can you believe Facebook or email?
- Content Management Systems
- Databases
- Facebook scam
- Font
- Fonts
- HTML
- Install Apache, MySQL, and PHP 5 in Ubuntu 11.4 using the Ubuntu Software Centre
- Language Codes
- Marketing
- Memcache
- Nginx
- Open source development hits another roadblock
- Oscars
- PHP
- SPDY
- Search software
- Techoni.com.au
- Theme themes
- Things to hate on Web sites
- U.S. Patent No. 6,985,875
- Virtual Private Server
- Visible Improvement
- Web 4.0
- Web browser usage
- Web browsers
- Web site development
- Bluefish
- Crying over spilt code
- Eclipse and PHP
- Getting a Git client, a story of ancient technology and pain
- HTTrack
- MVC
- Netbeans
- PHP or ..., CakePHP/Symfony/ZF versus ...
- Programming
- Superfish
- Web browser emulators for testing your Web site
- Web development frameworks
- Web site books
- Web site development on your own computer
- Webmin or phpMyAdmin or cPanel for creating databases?
- aiki framework
- jQuery
- Views development - Learn Fields first
- Views development - Learn Actions and Rules
- jQuery .each()
- jQuery .has()
- jQuery .is()
- jQuery and Firefox Firebug
- jQuery children
- jQuery for people not using Drupal - Installation and getting started
- jQuery hover
- jQuery hover de-duplication example
- jQuery or CSS?
- jQuery performance
- jQuery tests
- Web site hosting
- Westpac Web site still broken after two years and ten months
- Wordpress wins another CMS survey
- Drupal
Search software
Submitted by Peter on Sat, 2011-10-08 12:16
Search software is good and still a long way from perfect. Each new generation improves the average search while losing accuracy for specific searches.
The Google approach
Google used to lead the way back in the middle of the last decade. The Google style approach improves some searches then Web site owners learn how to fake the responses needed to direct Google searches to the wrong pages. Advertisers pay for fake links to create artificial popularity. Google gives preferences to their customers using Google analytics and other Google products.
Years ago Altavista was the best search engine then they went commercial and spoilt there own product. Fast replaced Altavista because Fast was both better in the way it searched and quicker than the bloated Altavista Web site. Fast left themselves open to replacement by using a similar page interpretation to Altavista and Fast was victim of the same fake pages.
Google replaced Fast because Google worked on links, something few people were faking at the time. Google indexed more pages than Fast. Google was slightly faster than Fast. Plus Google moved into the market before everybody switched to Fast. Google was blazingly fast compared to the still widely used Altavista.
Now Google is under attack because Google does not have a useful advanced search facility for focused searches. Faceted search is the current wave of new searches aimed at helping people refine searches in specific areas.
Apache Solr
Apache Solr is adding a useful improvement. The old Web site searches, including the search built into Drupal, need a revamp. Google does not understand why a person visits a specific Web site or what the person wants in a search. The standard Web site search might pick up product attributes if built for a shop. Almost everything else fails. Solr adds an easy way to provide advanced search options based on the specific information at the Web site.
Think of a Web site selling framed panoramic photographs from around the world. You select the location Great Barrier reef then select the tag fish. Hundreds of photographs are listed. You type coral into the search box then hit the search button. The search returns every photograph with a description containing the word coral, including thousands of photographs not from the Great barrier Reef and not tagged as fish.
Faceted search lets you keep your site specific selections in the search. If the site offers information classified by location, you select a location of interest and all subsequent searches will look only in that location. The same with size, color, all the usual product specifications, plus subjective tags.
If your site has articles tagged by visitors, you could select a visitor then select from their tags. A movie review site would let you pick a reviewer with similar a taste in movies then look at movies tagged by that reviewer.
The next wave
Then there is the next wave of search. The next wave has rippled through non Web site searches for years. Today the hardware is cheap enough to let the next wave loose on the Web. One big advantage of the next wave is the ability to add the next wave on top of existing searches.
Or, more specifically, the next wave is added into the data interpretation stage to build better information. Then the next wave is merged into the search request software to retain the quality of the request. The original search request is passed to the next wave then, optionally, to the old search based on results.
If the next wave search is too specific and does not find results, the old search can return a selection of results to show the searcher the type of information on the web site. This gives the searcher a guide to the value of the information on the site. Perhaps the site uses different terminology or does not host the required information.
Solr as a substitute
Solr is a big investment for small Web sites, too big, too time consuming to tune for a site, and when used to advantage, chews up more resources that the whole Web site. In the simplest terms, it costs more than it is worth.
The cost of Solr does not go up when you move to a larger site. The CPU processing overhead does go up but not the people cost. You spend a similar time configuring Solr, a similar time tuning Solr to make it useful, and a similar time analysing usage. If two sites are equally focused on one topic, their cost is the same despite one site having few web pages and few visitors.
When a large site covers many areas of interest, the site might require separate tuning for each area and that does drive up the cost. My site has Australia as one area, technology as another, then some aspects of technology as specific areas. If I used Solr to gain the full advantage that Solr offers, I would have to invest a big chunk of time tuning Solr for each area, a cost too great for a Web site that generates no income. My site is a way of answering the questions I am asked. I make some answers public because lots of people ask the the same questions and I can answer them once.
Some of the questions are asked tens of thousands of times. If I do not put the answer up on a web page, I would be answering the same question by email hundreds of times per week for years. Multiply that by hundreds of different questions.
Google takes most people to my pages. When Google fails and the question is asked again by email, I refine my articles to help Google. Adding Solr would not help visitors as much as making their first Google search more accurate.
The next wave of search is about making the first search as accurate as the second search you might perform when using a Solr style faceted search. I prefer to invest in the next wave instead of Solr because it reduces the number of searches.
Million dollars
The next wave will cost a million dollars or one hundred million dollars, depending on how it is developed. Big corporate America will invest the $100,000,000.00 then try to sell it for $10,000,000,000.00 as the next Google. The open source community could develop the next wave but it will take ten years, be fragmented, and most versions will try to reuse Solr, which will make the result too expensive for most Web site owners.
A single focused team, perhaps starting with two people working in a rapid development style, could perform the bulk of the work in a year. They would need lots of resources for short focused periods. One million dollars invested in the project would cover $300,000 per year for the three years of production, refinement, and diffusion into the community.
Unfortunately nobody is going to donate a million dollars to produce a free open product. We are unlikely to see a thousand sponsors donating a thousand dollars each. The billion dollar commercial project is the most likely direction for the next wave.
Conclusion
Solr is a good step forward for medium to large web sites focused on specific areas of interest, especially shops. The next wave is a better choice for most web sites, will take some initial investment to get example Web sites running, and can drive Solr if you choose to use both. What is needed is a million dollars invested in the next wave then the result donated for public use.








