7 Quick Steps to Boost WordPress Search Using Elasticsearch

CICI Solutions - WordPress Search and WooCommerce Search

Table of Contents

WordPress Search and WooCommerce Search are part of the critical points you need to take care of when planning to build an online presence.

In some cases, having an efficient search engine within site can be crucial to increasing user engagement.
Especially e-commerce, or sites with a high volume of content, risk losing the user if the search time for a particular product or subject lasts too long.

The user is not patient by nature and demands precise and immediate answers.

This article will summarize some basic concepts about the WordPress Search (and therefore WooCommerce Search) search engine and a solution to improve search performance significantly using Elasticsearch, a distributed search engine for all types of data based on APACHE LUCENE.

Full-text search on MySQL

The full-text search (FTS) is the classic search on Google or any other search engine when we search for a particular word or phrase.

This type of search is a technique to find documents that may not precisely match the search criteria.

Here’s an example:

We can search for the words “Sun and Sea.”

The FTS will return documents that contain either exclusively each of the two words, “Sun” or “Sea,” or results that have both words either in the exact order or in another order, for example, “Sun and Sea” or “Sea and Sun.”

Technically speaking, MySQL supports partial text searches using the “LIKE” operator and regular expressions.

This type of search has several limits:

  • Increasing the size of the text field where the search is done.
  • Increase in the number of records in the table in which the search is done.

The primary limits are:

  1. Poor performance because MySQL must search for terms in all fields of the table before returning results.
  2. Poor flexibility in the search because it becomes difficult to find results that exclude one of the search terms (for example, “Sea” but not “Sun”).
  3. The impossibility to have a relevance score in the results obtained.

Because of these limitations, MySQL, starting from version 5.6, has introduced a particular index for full-text searches.

WordPress and Woocommerce search

Unfortunately, the MySQL database structure that comes with the wordpress installer does not use the new features introduced in MySQL. In the table ‘wp_posts’, for example, we can see that the indexing used is the standard one, the ‘BTREE’.

Without going into the dynamics of the operation of this type of index, we can say that this index allows you to perform searches with equality operators (= or <=>), operators that give results within a range (>, <, >=, <=, BETWEEN), or the LIKE operator.

Now comes the fun part because we’re going to structure an environment to measure the actual load of WordPress search queries and examine its structure.

Test environment installation

To perform our benchmark and understand what the real cost of search queries on WordPress is and have a term of comparison with a query performed on Elasticsearch, we have to follow three easy steps:

  1. Install an instance of WordPress.
    To perform this step quickly, we took advantage of the ready-to-run instances on DIGITAL OCEAN’s marketplace.
    We opted for a configuration with 2 dedicated CPUs and 8GB of memory to add some shine to the search.
  2. Install an instance of Elasticsearch using a Docker container on Amazon AWS.
    In this case, the power of the instance is minimal: a T2.small with 1CPU and 2GB of ram.
  3. Insert a dummy data set to be able to do full-text searches on a consistent database.
    To use the WordPress API-rest in a fast but not secure way, we installed a plugin that allows Basic authentication.
    This practice is not recommended for production use.
    On the Elasticsearch side, we created an index that would repurpose the full-text fields of the WordPress MySQL database to perform the exact search on both databases in parallel.
    Using the API service 
    Loripsum we inserted 11.000 articles on our WordPress with titles and random content.

This is the source of the file that performed the data upload:

				
					function wp_insert($title,$content){
$username = 'xxxxxxx';
$password = 'yyyyyyyyyyy';
$rest_api_url = "https://my.test.site/wp-json/wp/v2/posts";

$data_string = json_encode([
	'title'    => $title,
	'content'  => $content,
	'status'   => 'publish',
]);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $rest_api_url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_HTTPHEADER, [
	'Content-Type: application/json',
	'Content-Length: ' . strlen($data_string),
	'Authorization: Basic ' . base64_encode($username . ':' . $password),
]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
return json_decode($result);
}
function dbg($s){
	echo "\n";
	print_r($s);
	echo "\n";
}
function es_insert($title,$content,$link){
	$topost = array(
		"link" => $link,
		"title" => $title,
		"description" => $content,
		"pubdate" => date("Y-m-d H:i:s"),
		"author" => "Admin"
	);
	$data = json_encode($topost,JSON_PRETTY_PRINT);
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, 'elasticearch.host.ip:9200/post/_doc/?pretty');
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_POST, 1);
	curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
	$headers = array();
	$headers[] = 'Content-Type: application/json';
	curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
	$result = curl_exec($ch);
	if (curl_errno($ch)) {
	    echo 'Error:' . curl_error($ch);
	}
	curl_close($ch);
	return json_decode($result);
}
function li_get($wich){
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $wich);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	$output = curl_exec($ch);
	curl_close($ch);
	return $output;      
}
for($i=0;$i<10000;$i++){ $title = li_get("https://loripsum.net/api/1/short"); $incipit = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."; $title = str_replace($incipit, "", $title); $title = substr(trim($title),0,100); $title = strip_tags($title); $title = preg_replace('/\[.*\]/', '', $title); dbg($title); if($title == "") die('aaaaargh'); $content = li_get("https://loripsum.net/api/20/verylong/headers"); $wp = wp_insert($title,$content); if(isset($wp->link)){
		dbg($wp->link);
		$uno = strip_tags($content);
		$testo = preg_replace('/\[.*\]/', '', $uno);		
		$es = es_insert($title,$testo,$wp->link);
		dbg($es);
	}
}
				
			

Let’s check the size of the database we created using this SQL code:

				
					mysql> SELECT
    ->   TABLE_NAME AS `Table`,
    ->   ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)`
    -> FROM
    ->   information_schema.TABLES
    -> WHERE
    ->     TABLE_SCHEMA = "wordpress"
    ->   AND
    ->     TABLE_NAME = "wp_posts"
    -> ORDER BY
    ->   (DATA_LENGTH + INDEX_LENGTH)
    -> DESC;

				
			

This is the result: 366MB

				
					+----------+-----------+
| Table    | Size (MB) |
+----------+-----------+
| wp_posts |       366 |
+----------+-----------+
1 row in set (0.01 sec)

				
			

Enabling slow query log in MySQL

To correctly measure the execution time of WordPress search queries, we still need to instruct MySQL to write in a log file details of queries whose execution exceeds a specific time.

To do this, we add some instructions in the MySql configuration file. 

First, let’s open the configuration file with an editor:

				
					nano /etc/mysql/mysql.conf.d/mysqld.cnf
				
			

And add this code:

				
					slow_query_log         = 1
slow_query_log_file    = /var/log/mysql/slow.log
long_query_time = 1
log-queries-not-using-indexes = ON

				
			

Then we create the log file and make it writable (avoid the last instruction in production, but we have to give the MySQL user the ownership of the file):

				
					touch /var/log/mysql/slow.log
chmod 777 /var/log/mysql/slow.log
				
			

Now let’s restart MySQL with the new configuration:

				
					/etc/init.d/mysql restart
				
			

Testing a WordPress search query

We now have all the elements to be able to test a search query on WordPress.

We have 11,000 posts filled with Lorem Ipsum, and we can choose 2 random terms that we know are present in an article, enter them into the search form and see how WordPress behaves.

The search I chose to do is on the terms “Cicero cognoscere”.
Now I run the search and I go to read in the slow query log what happened.

This is the query that WordPress made:

				
					SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  WHERE 1=1  
AND (((wp_posts.post_title LIKE '%Cicero%') 
OR (wp_posts.post_excerpt LIKE '%Cicero%') 
OR (wp_posts.post_content LIKE '%Cicero%')) 
AND ((wp_posts.post_title LIKE '%cognoscere%') 
OR (wp_posts.post_excerpt LIKE '%cognoscere%') 
OR (wp_posts.post_content LIKE '%cognoscere%')))  
AND wp_posts.post_type IN ('post', 'page', 'attachment') 
AND (wp_posts.post_status = 'publish' OR wp_posts.post_author = 1 AND wp_posts.post_status = 'private')  
ORDER BY (CASE WHEN wp_posts.post_title LIKE '%Cicero cognoscere%' THEN 1 WHEN wp_posts.post_title LIKE '%Cicero%' AND wp_posts.post_title LIKE '%cognoscere%' THEN 2 WHEN wp_posts.post_title LIKE '%Cicero%' OR wp_posts.post_title LIKE '%cognoscere%' THEN 3 WHEN wp_posts.post_excerpt LIKE '%Cicero cognoscere%' THEN 4 WHEN wp_posts.post_content LIKE '%Cicero cognoscere%' THEN 5 ELSE 6 END), wp_posts.post_date DESC LIMIT 0, 10;

# Query_time: 3.922342  Lock_time: 0.000138 Rows_sent: 10  Rows_examined: 12734

				
			

WordPress searched for the two required words in the title, excerpt and content fields of the posts table if the post type was an article, a page or an attachment (i.e. an image).

It then ordered them as importance depending on whether the title had the complete string or one of the two terms and with the same criteria to follow and arrange them according to the date of creation in a descending manner and in number of 10.

The execution of the query took 3.9 seconds to search through 12,734 records.

Considering the power of the machine and the fact that this query was the only activity in progress at that moment, the performance does not seem in fact very brilliant. Other tests made on less performing machines have given, obviously, much worse results.

But above all the criterion is perplexing, because as you can see from the post ids obtained in response to the query only very recent posts have been selected, which is not a guarantee for search result optimization.

It is also necessary to underline the fact that all the plugins that exist to implement different search modes for Worpress or WooCommerce can improve the detail of the result but certainly not the performance which, on the contrary, almost always worsen.

				
					+-------+
| ID    |
+-------+
| 11465 |
| 11459 |
| 11452 |
| 11451 |
| 11443 |
| 11442 |
| 11437 |
| 11434 |
| 11432 |
| 11416 |
+-------+

				
			

The same query executed on Elasticsearch.

Let’s now implement the query towards Elasticsearch, which, as we said, resides on another machine, not exceptionally performing, on another cloud.
The two machines are, however, in the same US-EAST region.
For brevity, we will take a shortcut that generally should be done differently.

Note that there are plugins to integrate WordPress with Elasticsearch, but alas, they don’t work. 😉

Let’s create a WordPress page with slug /elasticsearch-search.

Then create a WordPress page template and call it “Elastic Page Search.”

Inside the template, we need to create a form with search text input and the action to the slug of our page.

To do that, insert this code after the form:

				
					if(isset($_POST['string']) && $_POST['string'] != ""){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'my.elasticsearch.ip:9200/post/_search?pretty');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "\n{\n  \"query\": {\n    \"multi_match\" : {\n      \"query\":    \"".$_POST['string']."\", \n      \"fields\": [ \"title^2\", \"description\" ] \n    }\n  },\n   \"fields\": [\"title\", \"link\"],\n   \"_source\": false\n}\n");
$headers = array();
$headers[] = 'Content-Type: application/json';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$result = curl_exec($ch);
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close($ch);
}
$res = json_decode($result);
print_r($result);
				
			

Analyzing the JSON code with which we go to query Elasticsearch, we discover that we are looking for our phrase in the title and description of the post.

If one of the terms is found in the title it would be double valued in the Elasticsearch ranking attribution.

As a result we get Elasticsearch to return only the title and slug of the article:

				
					{
  "query": {
    "multi_match" : {
      "query":    "Cicero cognoscere", 
      "fields": [ "title^2", "description" ] 
    }
  },
   "fields": ["title", "link"],
   "_source": false
}

				
			

This is what we get:

				
					{
  "took" : 846,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 876,
      "relation" : "eq"
    },
    "max_score" : 3.5842853,
    "hits" : [
      {
        "_index" : "post",
        "_type" : "_doc",
        "_id" : "KIuDyXYBaPVcgdaTjmHU",
        "_score" : 3.5842853,
        "fields" : {
          "link" : [
            "https://my-test.site/quae-diligentissime-contra-aristonem-dicuntur-a-chryippo-scrupulum-inquam-abeunti-contemnit/"
          ],
          "title" : [
            " Quae diligentissime contra Aristonem dicuntur a Chryippo. Scrupulum, inquam, abeunti; Contemnit "
          ]
        }
      },
      {
        "_index" : "post",
        "_type" : "_doc",
        "_id" : "8YuJyXYBaPVcgdaTLmEi",
        "_score" : 3.572124,
        "fields" : {
          "link" : [
            "https://my-test.site/post-enim-chrysippum-eum-non-sane-est-disputatum-duo-reges-constructio-interrete-nihil-illinc/"
          ],
          "title" : [
            " Post enim Chrysippum eum non sane est disputatum. Duo Reges: constructio interrete. Nihil illinc"
          ]
        }
      },
				
			

We analyze the most important messages that Elasticsearch returns to us to know the query execution time, the number of documents found, and the number of results obtained:

  • Query execution time 0.846 seconds (“took” : 846)
  • 876 documents containing our key were found (hits->total->value = 876)
  • Elasticsearch returned 10 results (we could have chosen the number) sorted according to the score (“_score”: 3.572124 ) that he calculated using his sophisticated indexing system.

Conclusions

Implementing an effective search system in a WordPress-based website or a WooCommerce-based e-commerce allows us to improve the site’s performance.
If you have to manage a large volume of data, using a search engine like Elasticsearch becomes essential.

Thanks to this test, we have practically demonstrated what we can expect by integrating Elasticsearch into our site:

  1. Up to 4 times faster speed.
  2. Greater precision in the search.
  3. Greater effectiveness of the results obtained thanks to the Elasticsearch ranking system.

Do you need some additional information or want to start a project with us?
Please send us an inquiry​

You may also be interested in: