Search Engine Appliance


F.A.Q on How to build a search engine by LEXST-SEA(Search Engine Applicant)

 

I have only 2 servers(PCs), how can I build a search engine?

You can installed all 5 nodes on one server, and install 5 standby nodes on another servers, then, they can backup each other. Your search engine will not halt unless both of your servers down..

How Lexst-SEA works ?

Lexst-SEA connects many servers together, and let them working together, this group of servers forms a "cluster". If each data server(data node) store 25G data( circa 500,000 web pages), then 10 data nodes can store 250G(5,000,000 web pages).

Besides, workload on the whole system would be distributed to 10 nodes, it also solves the bottleneck of performance gracefully.

What is Master Node ?

Master Node is the nerve center of LEXST-SEA system, it controls all the nodes in the cluster, checks the status of each nodes from time to time at interval of a few seconds.

What is User Node ?

User node stores the database table info, and user informations.

What is LexstTomcatCall Node ?

LexstTomcatCall is a Tomcat Web server, it accepts searching requests, assigns the searching order to each data node, and returns the result to user.

LexstTomcatCall node also accepts the data sent from "crawlers" and save data to data nodes.

What is Data Node?

Data nodes are the server where data store.

What is paritioning ?

LEXST-SEA divides the data into many parts, each node store one part which is "partitioning technology". The bigger data volumn grows, the more partition(nodes) needed. This technology is a must for modern search engine software.

What is Standby Node ?

Standby Nodes has the identical data to their respective primary Nodes, they keep checking if the primary nodes are working properly, and keep the data same to primary nodes. When a primary nodes stop working, it would be kicked out, and its standby node would upgrades to a "primary node", the whole system would work properly without interruption.

There are 3 kind of standby nodes: Standby Data Nodes, Standby User Nodes, Standy Master Node.

Why LexstTomcatCall node has no standby node ?

Lexst-SEA allows multiple LexstTomcatCall nodes in the cluster, each LexstTomcatCall node are independent, ie, they working parallelly.

To prevent from halt of whole system, you need deploy more than one LexstTomcatCall node.

How standby node synchronize the data ?

You need not do anything, standby nodes would automatically do their job.

What is Idle Node ?

An idle node actually is a blank node without any data, the system would wake up idle node in following 2 situations:

1.The disk space is nearly full, or the data volumn would exceeding the space limit soon. The idle node upgrades to a data node, the system would partition the data and store new generated data into this new data node.

2.If a primary node has no standby nodes, the system would let an idle node to become a standby node, if there is any idle node in the cluster.

There are 3 kind of Idle nodes: Idle Data Nodes, Idle User Nodes, nodes. If you start the node as a data node, then it is an Idle data node. If you start the node as a user node, it woudl become an idle user node.

How many pages each node should store ?

It depends on the capacity of your hardware.

Typically each node store 500,000 pages,presuming the node is a Pentium IV pc, with 512 RAM.

If your PCs(servers) are much more powerful, you can store more data (pages) on each node.

On other aspect, you should also take it into consideration that how many concurrent searches would take place.

Average size of web pages is 25K, if most of your pages/files are quite bigger, it would need much more space.

How many Nodes I need to build my own search engine?

It depends on how big your data volumn is, and the capacity of your hardware. here is an example below:

If each node store 500,000 pages, and the planed data volumn is 2,000,000 pages. Then you need 4 primary data nodes, 4 standby data nodes.

Besides, you need one user node and one LexstTomcatCall node, and their standby nodes.

Then totally you need buy a license of 8 nodes.

(The license does not limit the number of master node, log node and their standby nodes.)