Free Search Engine Software

rewritten by Andrew.

This paper will look at some of the most popular search engine software packages currently available which are free for end users. We will take time to compare the features offered by the different packages and go into detail explaining how the search engines actually operate. We will also look at how effective the search engines, crawler and indexing functions are.

Need for search engines

Computers have become a particularly popular way to organize and store information. The internet has made this information accessible to everyone. The sheer volume of information available on the internet makes it difficult for users to find what they actually want.

This is the main purpose of search engines and that’s to help users retrieve the information that they are looking for. There are lots of different tools which can be used to make your own search engine. Searctools.com lists a lot of these different software packages and reviews their usefulness. What’s more most of these search engine software packages are free as long as they are not used for commercial use.

Because there are so many different packages out there for search engine software it can be very difficult to choose the right one. This paper is interested in helping you to decide which search engine would be best suited for your website. We will do this by highlighting the main benefits, features and basic information of all of the most popular search engine software packages.

We will start off by looking at a simple introduction to the world of free search engine creation software. We will then look at some of the basic information of the most popular software packages currently available. At the end of the paper we will compare the different pieces of software available.

Introduction

There are lots of examples of free search engine software, and you can find these in a number of different sources. There are search engine software packages available at codebeack.com, searchenginewatch.com, searchtools.com and sourceforge.net. A lot of these packages are freeware while some of them are open source which means they have the source code distributed with them.

Generally speaking the free search engine software is quite hard to understand because the documentation on it is not very good. This can make it very difficult to understand the features and functionality that it provides.

There are two different types of search engine software depending on who it is that actually does the search. There are server side search engines and remote side search engines. On remote site search engines all of the indexing and querying is done on a remote server. Server side search engine software will create an independent search engine. This runs on your computer and creates a genuine search engine.

We will only look at server side search engine software as this is what constitutes a real search engine.

The types of search engines are further categorized into two groups. There are website based search engines and file system search engines. The file system search engines only catalog files which are stored on the local network. Website search engine software on the other hand can index remote websites by using web crawlers.

Lots of search engines use both of these systems and are capable of indexing remote servers and the local file system.

If you are interested in a website search engine package then you will need to make sure that it offers a complete package. Any effective web search engine needs to include:

All of the software packages that we will look at have these main features built into them.

Basic Information

In this section we will look at some of the most popular search engine software packages and provide some basic information about each. The information given will include the website address, whether it is open source, licensing information, documentation, the platforms it can be used on, who built the software, and how complete the package is.

Licensing

The licensing will look at whether the software is distributed as freeware for anyone to use or whether there are some conditions on its use. It will also mention any upgrade options available if any.

Source code

If the source code is available then the package will be described as being open source. This is particularly useful if you want to customize the system.

Documentation

It’s important that there is enough documentation available to use the package correctly. This will show you where the documentation can be found.

Platforms

This section will tell you which operating systems are required to use the various different solutions. I.e. Can it run on windows, Mac, or Linux?

Functionality

This will look at the functionality of the package. Does it provide everything that you need. Does it have a web crawler, indexer, query engine and interface? If so then it will be a complete package, if not you might need other tools to complete it.

This information is really useful so that an administrator can make an informed decision when choosing the ideal search engine software package. Being able to find out whether the search engine will work on your platform in one place will make life much easier than normal. Once you have determined which packages will work on your equipment you can then find out more information about the packages and decide which one you want to install.

Name

Licence

Website

Source code

Language

O/S

Complete

Developer

Alkaline

Free ?none commercial use only

www.alkaline.vestris.com

Not open source although source code can be purchased

C++

Solaris, IRIX, Linux, FreeBSD, Windows

Complete

Founder is Daniel Doubrovkine. Plus various people from Lavtech Corp including Aleksey Botchkov, Kalimullin and Sergei Kartashoff.

Fluid Dynamics

Freeware version available or a free trial of commercial version available

www.xav.com/

Source code available from website

Perl

Unix, Linux, Windows 95, 98, ME, 2000

Complete

Created by Zoltan Milosevic. Owned by Fluid Dynamics Software Corporation

ht://Dig

Free

www.htdig.org/

Source code available from website

C and C++

Linux, Mac OS, IRIX, HP/UX, SunOS

Complete

Created by Loic Dachary and Geoff Hutchinson.

Juggernaut-search

Free for non-commercial use

www.juggernautsearch .com

Available at website

Perl

Linux, Windows NT, 2000. XP also supported with commercial version

Complete

Created by Donald Kasper.

msoGoSearch

Free ?Unix version is free

www.mnogosearch.org

Available freely

C

Unix

Linux

FreeBSD

Complete

Alexander Barkov

Perlfect

Free for both commercial and non commercial use

www.perlfect.com

Available at the perlfect website

Perl

Unix, Linux, Windows NT

Complete

N.Moraitakis

SWISH-E

Free for all

www.swish-e.org

available at the swish-e website

C and Perl

FreeBSD, SunOS, NET BSD, Windows NT

Not complete - Need to use additional CGI code to start searching

The first version of SWISH was designed and made by Kevin Hughes. In 1996, The Library of UC Berkeley asked for permission to enhance the application which created SWISH-E

Webinator

Free - Free version only supports a maximum of 10,000 pages and 10,000 hits each day

www.thunderstone. Com

source code is not available

Vortex-Tex

Unix, Linux, Windows NT, Windows 2000

Complete

Thunderstone Inc

Webglimpse

Free - Free version only suitable for educational and governmental use

www.webglimpse.net

Available at website

C and Perl

IRIX, OSF, Rhapsody, AIX, SunOS

Complete

University of Arizona

LEXST-SEA

Free - Supports up to 200,000 pages with a maximum of three nodes. Commercial versions can support billions of pages

www.lexst.com

not available

unknown

64 bit windows operating system

complete

Dust Gem Co. Ltd

Google GSA

Proprietary

www.google.com/enterprise/gsa

strictly under lock and key

unknown

stand alone system

complete

Google

Zoom Search Engine

Free version available - limited indexing

http://www.wrensoft.com/zoom/

not available

unknown

Cross platform support

Complete

Wrensoft

Site Search Pro

Proprietary - cost depends on licence

http://www.site-search-pro.com/

not available

unknown

Cross platform support

incomplete

Shedix

Search Engine Packages compared

We will now spend some time comparing the various different search engine packages available so that you can decide which one will suit your website the best. Each of the search engine packages will be compared by looking at four qualities.

Method of Searching

The method of searching will be the method used by the search engine to rank the search results. There are two different methods of this, each of which affects how the server will be set up. It will also affect the speed of results, and how much disc space is required on the server.

Indexing method

Almost every search engine speeds things up by indexing data before it is searched for. It is generally much quicker and easier to search through data that has already been indexed rather than raw data. It’s very important however that the indexed information are in a useful format, up to date and contain useful information.

There are a number of different indexing methods; the one that is normally used is the full text inverted index. This does require a considerable amount of disc space and the indexing process can be very slow. This is because much of the information is stored in the index.

There are also more efficient methods such as indexing only certain parts of the documents being indexed. This might include extracting the title, description, keywords and possibly the author and indexing these. This makes indexing much quicker and speeds up the whole process. There are lots of interesting indexing methods which can be used. WebGlimpse for example uses two level indexing. Alkaline uses a secret algorithm when indexing and many of the other packages use unique features too.

Relevance ranking

Ranking is the way a search engine ranks a document in relation to a query. There are a number of factors that the search engine software can use to decide whether or not the document is relevant. These factors can include word position, popularity, and word density. Different search engines will utilize different factors when ranking sites.

Crawler Features

We will also spend time looking at a number of features of indexers and web crawlers.