Create a Search Engine In PHP, MySQL | Part 3


This is the Part 3 of “How To Create A Search Engine In PHP”. In this part, We make additional adjustments and make the search engine more awesome and cool.

Bot Page

Our crawler leaves a mark on every pages it visit. In other words a custom User Agent String goes with each requests to web pages. In the previous part, we set the user agent string as :

DingoBot (http://search.subinsb.com/about/bot.php)

We add a URL with it, because if the site owner sees this in their stats, they would think “what’s that ?”. So, to answer that question we add our bot info link. Also to promote our site.

The site owner can put this in to their robots.txt file to block a certain page for the Dingo! crawler :

User-agent: DingoBot
Disallow: /dontgohere/

So, what I’m saying is that you should add a “Bot Info” page.

Stats

What’s more great than showcasing the crawler’s job ? Here’s the stat page of Web Search :

<?include("../inc/functions.php");?>
<html>
 <head>
  <?head("Stats");?>
 </head>
 <body>
  <?headerElem();?>
  <div class="container" style="width:100px;">
   <h2>Stats</h2>
   <p>See information about the crawled URLs by DingoBot.</p>
   <h3>Total URLs Crawled</h3>
   <strong>
   <?
   $sql=$dbh->query("SELECT COUNT(id) FROM `search`");
   echo $sql->fetchColumn();
   ?>
   </strong>
   <h3>Last Crawled URLs</h3>
   <ul style="width: 400px;overflow: auto;">
   <?
   $sql=$dbh->query("SELECT `url` FROM `search` ORDER BY id DESC LIMIT 5");
   while($r=$sql->fetch()){
    echo "<li style='margin-bottom:5px;'>".$r['url']."</li>";
   }
   ?>
  </ul>
 </div>
 <?footer();?>
 </body>
</html>

Did You Mean ? (Spell Check)

Google and other search engines gives you a suggestion when there’s a typo on the query you submitted. I have found a way to implement this using Google. It’s a very easy implementation.

Create a file named spellcheck.php in inc folder and add this code to it :

<?
class SpellCheck{
 private $url="http://translate.google.com/translate_a/t";
 function __construct(){
  return true;
 }
 private function makeURL($s){
  $s=urlencode($s);
  $url=$this->url."?client=t&sl=en&tl=en&hl=en&sc=2&ie=UTF-8&oe=UTF-8&uptl=en&oc=1&otf=1&ssel=3&tsel=0";
  $url.="&q=$s";
  return $url;
 }
 public function check($s){
  $a="";
  $c=file_get_contents($this->makeURL($s));
  $c=substr_replace($c, "", 0, 41);
  preg_match('/u003e","(.*?)",[1]/', $c, $m);
  if(isset($m[1])){
   $a=$m[1];
   $a=str_replace('",', '', $a);
  }
  return $a;
 }
}
?>

Now, we’ll add the code in the search.php file that displays the “Did You Mean” suggestion. Actually, In Part 2 where we made the search.php file, we already added the code. So, no need for codes.

Starting the Crawler

When you visit the page crawler/runCrawl.php on your browser, the crawler starts. But, if you keep visiting the page the crawler won’t start again and again. It will only start the first time you visit the page. When visited, the page prints whether the crawler is currently running or just started running.

The only thing you have to make sure is that the file crawler/crawlStatus.txt have permissions to Read & Write.

.htaccess

You can make all error pages to a single file like Google showing the same page for all errors :

DirectoryIndex index.php /inc/error.php
ErrorDocument 403 /inc/error.php
ErrorDocument 404 /inc/error.php
ErrorDocument 405 /inc/error.php
ErrorDocument 408 /inc/error.php
ErrorDocument 410 /inc/error.php
ErrorDocument 411 /inc/error.php
ErrorDocument 412 /inc/error.php
ErrorDocument 413 /inc/error.php
ErrorDocument 414 /inc/error.php
ErrorDocument 415 /inc/error.php
ErrorDocument 500 /inc/error.php
ErrorDocument 501 /inc/error.php
ErrorDocument 502 /inc/error.php
ErrorDocument 503 /inc/error.php
ErrorDocument 506 /inc/error.php

In Part 1, we already created one Error Page File. We’re just linking that file path in .htaccess.

Stats

If you would like to track visitors and count pageviews using Google Analytics or StatCounter, you should paste the code in inc/track.php file. If the file is not present, create the file and add your Tracking code inside it.

If you don’t want tracking, you should remove this line from inc/functions.php :

include("track.php");

That’s about it. The Search Engine is completed. You can enhance the search engine more with your coding skills like I did on search.subinsb.com

It is possible that I missed out things from the 3 part tutorial. If you found one, I’ll be really glad that you point the problem in the comments. Thank you and hope you get it well.