Skip Nav

Sample Essay on How a Search Engine Works

15 scholarly search engines every student should bookmark

❶The factor includes the capacity and coverage of the database of the search engine, the content and the depth of indexing, and the novel rate of the database.

Works Cited

Not what you're looking for?
History of Internet search engines
Start searching

The site provides students and teachers with current, valid information for school and university academic projects using an index gathered from research portals, universities and library internet subject guides recommended by teachers and librarians. Simply ask a question or enter search topics or tools, and iSeek will pull from scholastic sources to find exactly what you are looking for. The search engine is safe, intelligent and timesaving—and it draws from trusted resources from universities, government and established non-commercial sites.

ResearchGate is a unique social networking site for scientists and researchers. Over 11 million researchers submit their work, which totals more than million publications, on the site for anyone to access.

You can search by publication, data and author, or you can even ask the researchers questions. This site is perfect for those studying anything related to healthcare or science. National Library of Medicine. The database contains more than 3 million full-text journal articles. Lexis Web is your go-to for any law-related inquiries you may have. The results are drawn from legal sites, which can be filtered by criteria such as news, blog, government and commercial.

Users can also filter results by jurisdiction, practice area, source and file format. Pulling up an Internet search might be second nature to you by now. But a little forethought into where you begin your hunt can make your life much easier. Save yourself the time wading through basic Google search results and utilize some of these tools to ensure your results will be up to par with academic standards.

Let us know in the comments below! This article was originally published in December It has since been updated to include information relevant to We value your privacy and will never share your number with any third parties. One of our experienced program managers will get in touch to make sure you get the personalized information you need, and to answer any questions you may have.

By requesting information, I authorize Rasmussen College to contact me by email, phone or text message at the number provided. Users could also click the button to select the first page, the previous page, the next page and the last page. Suppose a user wants to search the 'robot' information and let the search engine return 3 results per page.

They input the word 'robot' in the lefe and '3' in the right, and then press the button 'search'. The next page would be as figure We could find out from the picture above that there are six results in total and it shows the first five webpages that include the word 'robot', which is highlighted in the current page.

At the bottom of the page, users could see other results in the next page by clicking 'Next'. The pattern of next page is almost the same as the one above. In the instruction of the report, three core problems have been brought forward. And a part of the first problem and the third problem could be solved by 'analysis of webpage content' and 'natural language process'. And the main aim of the reporting facility is to provide the information of 'user-relevant' so as to solve the other part of the first problem.

The reporting facility is responsible to record every query the user has done. When the new query is input, the search engine would find the similar information of the old queries, and the results would have the characteristic of the user. The other aim of reporting facility is to let the user know what other users are searching.

When the user inputs the query, the query would be stored in the database. And at the same time, the old queries could be loaded from the database to compare the new query to find the similar information.

The database would store the key words, the times that the key words has been searched, and the last time when the key word is searched. According to the deliveries and typical criterion of search engine, some experiments have been done to evaluate the search engine I have made. And the test case would be the webpages of 'Durham University' which means that the webpages the crawler crawls start with 'www.

The criterion and the results of the experiments would be shown. Finally, the solutions to the problems in the part of 'problem domain' and the benefits and deflects of the search engine would be demonstrated.

In , Lancaster and Fayen listed 6 factors to evaluate the search engines, coverage, recall ratio, precision ratio, response time, user effort, and format output Lancaster and Fayen, Although the index has been brought forward over 40 years ago, and the target is the traditional online information retrieval system. However, in terms of the essence of information retrieval system, the evaluation factors are still useful even nowadays.

Later, after Heting Chu and Marilyn Rosenthal analyze and compare three famous search engines Alta Vista, Excite, and Lycos , they come up with five aspects to evaluate the search engines, respectively indexing, search power, search effect, output, and users' burden Chu and Rosenthal, With the development of the research, the user experience is paid more and more attention. Belkin considers the essential issue in the information retrieval system is anomalous state of knowledge ASK Belkin, , which is an important part of the information retrieval model.

Bell considers the complexity of the web based on the former research, and focus on the user experience when evaluating the search engine Bell, Considering the standard criterion above and the concrete condition of the current mainstream search engines, I think the following criterion is reasonable to evaluate the search engine in general.

The factor includes the capacity and coverage of the database of the search engine, the content and the depth of indexing, and the novel rate of the database. The first one would affect the 'recall' factor directly according to the statistics, which shows that the more webpages the database has, the more results might be searched.

Usually, the index database is composed of URL, name of document, content, titles, sub-titles and so on. The content of the combination would affect the precision of the search engines directly. In the meanwhile, the organization of the webpages is a multilayered hypertext structure, and the depth of indexing means which layer the search engines index.

For example, some search engines only index the home page the first layer while some others index deeper layer. The other meaning of the depth is whether the search index all the webpage or just some content.

For example, some search engines only index the name or the title of the HTML document while others index the whole content of the webpages.

Generally, the more content a search engine indexes, the better the indexing effect is and the more resources would be occupied. The novel rate includes two aspects, uniqueness and update frequency. Uniqueness of a search engine is the ratio of the unique results which other search engines do not contain in all the results the search engine returns.

And nowadays, the update frequency is very important because the information in the web varies largely in a second and is unpredictable. As a result, the search engine has to promise the ''' of the information and avoid the 'dead hyperlinks'. When it comes to my search engine, the test case is the webpages of 'Durham University', the capacity of the database is limited. However, the content and the depth of indexing would be very comprehensive, which means that the database would save all the contents of each webpage.

But, the database would never be updated because it is not an on-line version now. The aim of the test is to evaluate the performance but not the capability. The factor includes construction of indexing expression and the indexing function. In , Rousseau et al. In fact, the search engine has the same problem with the on-line library.

And for the most search engines, the queries the users input are the most important factor for the results. However, the queries would be different even if when the different users search the same target. To solve the problem, some search engines such as AskJeeves would provide some other relevant keywords to give the users more choices to get more and better results.

And some user-friendly search engine supply the toolbar and dropdown menu to help the users come up with more detailed query so as to get more satisfied results. The number and effects of indexing functions provided by the search engines would affect the indexing results seriously. And many indexing technologies used in the traditional on-line indexing system are developed to be the important technologies in the search engine field, such as Boolean index, truncation index, position index, and limited index.

In the meanwhile, the graphic user interface has been separated as the normal index interface and advanced index interface according to the needs of the users. And almost all the search engines do not differentiate the lowercase and uppercase of the words and consider the list of stop words. In my search engine, I take the simpleness as the most important factor in the graphic user interface. As a result, there is no toolbar and dropdown menu existing.

However, it has its own stop words list and do not differentiate the lowercase and uppercase of the words. Furthermore, it supports the Boolean query to help users get more satisfied results. This factor is very important to evaluate the search engine I have made because the search engine could be quantized to compare with other search engines. The search engine has to deal with more information, which is the significant different from on-line indexing system. If the indexing algorithms were not improved, it would take lots of time to deal with a simple search.

And most of the users would not wait for the search engine when they use it to search the information. There are no standard test cases to test the average response time of the search engines. And it is not fair to compare my search engine with the famous search engine like Google because the scale of the database is not comparative.

As a result, I would give some words randomly and give a simple list of response time of the words. The experiments have been tested when the number of webpages is and the number of the keywords is The detail information would be listed as table 1. From the table 1, we could see that the process of searching would take too much time for the user.

There are two main reasons slowing down the speed of the search engine. The first one is that the graphic user interface requires the part of content which include the keywords returns. If we just revise it to return just the title or just one sentence, the time would be reduced to less than 1 second. The second one affecting time is the query in the database. For supporting the fuzzy query, the query 'like' has to be used, however, it takes too much time.

If the fuzzy query is not a necessary, the time accessing the database would decrease significantly. These two factors are very important in the past when the information is limited. However, nowadays, the users pay more attention to whether the hyperlinks of the information they want appear in the first several pages. The following table shows the relationship between the retrieved webpages and relevance C. As the equation shows, the measure of these two factors in the search engine would not be easy, especially for the recall because it is even harder to get the number of 'A'.

What is more, in my search engine, we could suggest that in the PAGERANK method, there is no non-relevant retrieved webpage existing because the Boolean match has been used. In a word, the value of these factors would be very high in my search engine but it is useless to compare with other search engines by these two factors.

The outputs of the results are different among search engines, however, there are almost three parts: What is more, some search engines add introduction, type, webpage size, date, hyperlinks to detail the results. Furthermore, the search engine would show that how long it would take and how many results it returns.

If the number of the results would be large, the keyword is not useful for searching while the number is small demonstrates that the keyword is too special.

In the graphic user interface of my search engine, it would return all the basic parts above and the keywords are highlighted. What is more, it would return time and the number of results. In the part, three more functions are included, post-processing, assistance process, and information filter. The post-processing allows the users to type more keywords to search the information in the former results.

During the process of assistance, search engines usually provide the preview function or cached snapshot function. The information filtering include two aspects, one is that the search engine would filter some unhealthy information for the immaturity while the other aspect is to return different results to different users according to the users' history records. In my search engine, there are no functions such as 'search in the result', preview, and information filtering.

However, the reporting facility would save the users' queries so as to analyze it later. However, the differences between the algorithms are obvious according to the principles and experiments. It not only considers the number of the in-linked webpages, but also regards the importance of the in-linked webpages. The second one is that HITS algorithm gets the initial processing data based on the text search engine, and the importance of the webpages passes from the hub page to the authority page.

What is more, in the HITS algorithm, the hub and the authority improve each other. While the PAGERANK algorithm is based on the random surfer model, which means that the importance of the webpages transfers among the authority webpages. Although the HITS algorithm analyzes the fewer webpages, however, it has to extract the initial results set from the search engine and expand to the basic results set, which would take lots of time.

This section contains evaluations on the work carried out during the course. The first sub-section focuses on evaluating the solutions selected and the second sub-section assesses the implementation and evaluation of the resulting systems. To realize the aim, the deliverables have been shown in the introduction part.

The following part is to evaluate each part in the deliverables. The first one in the minimum objectives is to design a web crawler. The detail has been introduced and it actually works. However, there are some problems in the crawler part. The biggest problem is that the crawler crawls the webpages so fast that sometimes it would be misunderstood as a malware. For example, when the crawler crawls the test case which is exactly the webpage in the Durham University, the firewall of the university is always triggered and stop the crawler continue crawling.

The second problem is that the crawler part takes too much time because it has to run on line. Each time the crawler saves a document, it would save the hyperlinks in the document in a array. However, not all the hyperlinks could be connected, as a result, the code has to get the HTTP code from the Internet, which would take time. Only when the HTTP code is , the hyperlink could be connected, otherwise, the hyperlink could not be access.

Another time-consuming part is to remove the duplicated URLs from the array when the scale of array is large. As a result, it is very easy to solve the problem actually. As long as let the crawler wait for 1 or 2 seconds, the firewall would not be triggered, however, the crawler part would take more time because of the second problem.

The second in the minimum objectives is to build and maintain an index. Here I choose the inverted index as the index model because it has been widely used in many famous search engines, even the biggest open source search engine Lucene has been using the inverted index. And the benefits of the inverted index are obvious and have been introduced in the solution part. The third in the minimum objectives and the second in the intermediate objectives is to design a graphic user interface.

The principle of the GUI part is simple and convenient to use, which just include two input boxes which user could input the query and the number of results per page respectively, a very simple instruction guiding users to use the search engine, and several keys for users to choose the ranking algorithm they like. If I have more time, there are lots of things to do. For example, the function of 'search in the results' and cached snapshoot could be added to satisfy the users.

More Boolean operators could be added if the more time is given. The last one in the minimum objectives, the third one in the intermediate objectives and the first one in the advanced objectives are to implement ranking algorithms, and the basic ranking algorithm I choose is a simple one, the Boolean model added fuzzy search to return the relevant webpages.

The method promises that the returned results are definitely relevant to the query, which means that the precision of the search engine would be percent. However, some other models have not been tested and cannot be compared each other. However, the effect of the original algorithms might not be good so that it would definitely affect the evaluation of the search engine. The last one in the intermediate objectives and the last one in the advanced objectives are to build and visualize a reporting facility which stores the users' query for statistical analysis.

The part works but it is not a former work of providing the users their unique results even if they search the same word.

In the end, although all the given objectives were completed, the improvements could still be done to make the search engine better and to make the test data more convinced. According to the 'Results' part, the search engine has been evaluated.

However, there are still some problems need to be solved. First of all, the test case is not good. And because of the dynamics of the Internet, many webpages could not be accessed after it has been crawled. Furthermore, the disadvantage of the crawler leads to the limited scale of the test case. As a result, it seems that the capability of the search engine is not good.

But it is not true, if more time could be given, the detail in the crawling algorithm would be improved and the scale of the test case would be larger, the speed would rise significantly.

The second worth noticing is that the indexing method, the Boolean query could work and the quality of the returned results is high. However, the toolbar and dropdown menu would be added if I have more time. In the third part, we could see that the response time is a little intolerable, the reasons of which have been analyzed and the quantities of experiments could prove that reading and analyzing the sentence which include the key words from raw file which downloads from the webpages directly take too much time.

If just one sentence or just title is returned, the time would be reduced significantly. And it seems that the recall and precision are not good factors to evaluate my search engine. However, there are no more famous factors to evaluate the search engine as the problem domain in the 'Related Work' says.

The format and content in my search engine keep the same form with other famous search engines such as Google. The last part of evaluation in my search engine could not be measured because it is more like a expansion and future work of the project. But there are lots of details that could be improved. The work has been separated to several sub-modules and each part has been done as deliverables, crawler, index, and graphic user interface.

Eventually, though, over the course of the decade of the s, Google rose to prominence as the undisputed premier search engine on the Internet. To a large extent, this was due to the level of integration that Google was able to achieve in its operations: Google provided not only a search service but also, over time, an advertising service, an e-mail service, a video service, a scholarly database, and a virtual library. Google also offered the option of stratifying searches according to genre e.

By now in the year , the suggestion can surely be made that although other search engines most notably Yahoo still continue to exist, it is safe to conclude that in the minds of most people, the technology of the search engine itself has become indissociably connected with Google. Indeed, this is reflected in the fact that "Google" itself has by now become an English verb i.

In order to access any requested information from the Internet and provide it to the user in a matter of a fraction of a second, the search engine must, first of all, develop a comprehensive index of all the material available on the Internet.

As Moz has written:. Each stop is a unique document. The search engines need a way to 'crawl' the entire city and find all stops on the way, so they use the best path available—links" 2. The robots used by the search engine, often called crawlers or spiders, serve the purpose of mapping out all of the relevant interactions of the Internet through the link structure.

It is this structure that is utilized when a given Internet user posts a search query to the search engine. The main key for navigating the link structure is the algorithm. As Google itself has indicated:. The more sophisticated the algorithms involved, the more likely it is that the results provided for a given search query will be salient and meaningfully address the needs of the Internet user Mostafa.

By these times, most people perhaps almost take for granted as a fact that Google will almost always provide one with the actual information that one is looking for. This was not, however, always the case: One of the primary metrics used by a modern search engine such as Google when retrieving and sorting information is relevance. This would seem to be straightforward enough. As Moz has pointed out, though:. Today, hundreds of factors influence relevance" 3.

In general, relevance for search engines has come to mean a broad metric pertaining to the quality of the web pages retrieved; and this is for the simple reason that if a web page is of low quality, then the information it contains likely will not be relevant for the Internet user, even in the event that the words on the web page technically match up with the words of the search query. Therefore, one of the key purposes of the algorithms utilized by a modern search engine is to evaluate the quality of a retrieved webpage.

This is often done by considering the extent to which a given webpage is formatted accord to generally accepted quality standards, the extent to which it is linked to other webpages on the Internet, and so on Enge.

Essays on Search engine

Main Topics

Privacy Policy

Search engines are widely used in marketing. For instance, search engine optimization ensures the volume and quality of traffic to a website. The earlier a site is presented in the search engine results, the more people will visit its site. Search engines can also download free .

Privacy FAQs

Free search engine papers, essays, and research papers.

About Our Ads

A search engine is one type of these tools. Search engines find and display websites that match input criteria. Google is one of the biggest search engines on the web, and is the most widely used. 3/5(7). Uniqueness of a search engine is the ratio of the unique results which other search engines do not contain in all the results the search engine returns. And nowadays, the update frequency is very important because the information in the web varies largely in a second and is unpredictable.

Cookie Info

This sample computer science essay will provide information on internet search engines, including an overview of search engine optimization (SEO) and keywords.5/5(1). The Search engine is one of the most popular assignments among students' documents. If you are stuck with writing or missing ideas, scroll down and find inspiration in the best samples. Search engine is quite a rare and popular topic for writing an essay, but it certainly is in our database.