How to integrate your Laravel app with Elasticsearch

March 6th, 2017 by Tony Messias

This is a revamp of my article with the same name located here.

Searching is an important part of many applications, and it is most of the time treated as a simple task. “Just query using LIKE and we’re good to go”. Well, while the LIKE clause can be handy, sometimes we have to do it in a better way. After researching for a while I found a few good resources on the subject. The most attractive one is Elasticsearch. Yes, we can go far with full-text search and other searching techniques, however Elasticsearch is very powerful and comes with a variety of useful functionalities. I’m going to cover the basics here and link more resources at the bottom, so we can dig further.

Topics

What is Elasticsearch?

From the official website:

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows us to store, search, and analyze big volumes of data quickly and in near real time.

In other words: we can use Elasticsearch for logging (see the Elastic Stack) and for searching. This article aims to explain the usage for searching, maybe I’ll cover the logging and analytics in another article.

Basics about Elasticsearch

We used to translate the Relational concepts to Elasticsearch, but now this kind of comparison is out-dated. In order to fully understand the tool, we better start off from scratch, no SQL comparison.

First of all, Elasticsearch is document oriented and talks REST, so it can be used in any language. Now, let’s dive a bit deeper on its basic concepts.

Index and Types

As I said before, Elasticsearch is a document oriented search-engine. That means we search, sort, filter, etc., documents. A document is represented in JSON format and it holds information that can be indexed. We usually store (aka index, from “to index”) documents with similar mapping structure (fields) together and we call it an index. There can be one index for users, another for articles and another for products, for example.

Inside the index, we can have one or more types. We usually have one type, but having multiple types in an index can be useful sometimes. Let’s say we have a Contact entity which is the parent (inheritance) of Lead and Vendor entities. Although we could store both entities in the same “contacts” type inside the “contacts” index, it might be interesting to store these contacts in separate types inside the “contacts” index, so having “leads” and “vendors” types inside the “contacts” index.

It’s Worth saying that Elasticsearch is schema-free but not schema-less (see here), this means that we can index whatever we want and it will figure out the data types, but we can’t have the same field holding different data types. In order to have better query results and avoid unexpected behavior, we better define those data types per field. And stick to them.

Check the docs for more accurate and well-described documentation.

Local environment

It’s likely that you don’t have Elasticsearch running in your local machine. We are going to be using Docker here, but don’t worry, you can run it without Docker by following the official docs, I just wanted to play with Docker a bit more.

Install Docker and docker-compose in your machine. Now that you have it installed, you can run:

$ docker run -d -p 9200:9200 elasticsearch
8dffec05d61d1002d885b48c1dd445541718344e01273f1d035347eb67003a0a

If you don’t have an Elasticsearch Docker image in your machine, it will pull one from the Docker registry, so it might take a little while the first time.

  • the -d flag means we want to run it detached, in other words don’t block the terminal;
  • the -p 9200:9200 param is saying to Docker that we want the port 9200 bound in our localhost:9200 port, so we can access it as if it was on localhost.

If you have any problems with the vm_max_map_count kernel settings, like I did, the docs got you covered here.

If everything went well, you can do a curl request to check if your Elasticsearch server is running:

$ curl localhost:9200
{
  "name" : "TdMgPiU",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "wGbe3F6iRpOtcxGPevh48A",
  "version" : {
    "number" : "5.1.1",
    "build_hash" : "5395e21",
    "build_date" : "2016-12-06T12:36:15.409Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}

A response like this means we are good to go. If you can’t resolve it, it might be that your Elasticsearch server is still booting, checkout if your container is running using docker ps (it should be).

The demo application

First thing to know is that we have to have DATA to use Elasticsearch, so in this example we have a seed command that populates the database and, while it does that, it indexes all of the data on Elasticsearch (because of the observer, see below). I’ll show it in a while, first let’s see how we can integrate it with our Eloquent models.

You can create a Laravel app with composer or using the Laravel installer, like so:

$ laravel new es-laravel-example

Now you have a Laravel application ready to go. But since we are going to be using Elasticsearch in this app, we need to pull a Composer dependency before we change anything, so run composer require elasticsearch/elasticsearch inside your new es-laravel-example folder. If you don’t have composer, check it out here.

If you run php artisan serve and go to localhost:8000 in your browser, you will see the Laravel welcome page:

Laravel welcome page.

Let’s get started. We are going to use the concept of articles to demonstrate here. So we need to create an Article model and its migration. We can do so by running: php artisan make:model Article -m, where the -m flag tells artisan to also create the migration for this model.

We have to tweak the migration, there should be a new file inside the database/migrations/ folder called create_articles_table prefixed with a datetime. Open it and set the fields we are going to use, like so:

<?php

Schema::create('articles', function (Blueprint $table) {
    $table->increments('id');
    $table->string('title');
    $table->text('body');
    $table->json('tags');
    $table->timestamps();
});

We can create a seeder for our model so we have data available. Run php artisan make:seeder ArticlesTableSeeder and open it. Inside its run() method, all we have to do is add these two lines:

<?php

class ArticlesTableSeeder extends Seeder
{
    public function run()
    {
        DB::table('articles')->truncate();
        factory(App\Article::class, 50)->create();
    }
}

It’s using Laravel’s Model Factory functionality to create 50 fake articles for us. But we haven’t configured it yet. To do so, open the database/factories/ModelFactory.php file and add a new factory entry, like so:

<?php

use Faker\Generator;

$factory->define(App\Article::class, function (Generator $faker) {
    $tags = collect(['php', 'ruby', 'java', 'javascript', 'bash'])
    	->random(2)
        ->values()
        ->all();

    return [
        'title' => $faker->sentence,
        'body' => $faker->text,
        'tags' => $tags,
    ];
});

Now we can add a simple view and route so that we can list articles routes/web.php:

<?php

Route::get('/', function () {
    return view('articles.index', [
        'articles' => App\Article::all(),
    ]);
});

and the view should be in resources/views/articles/index.blade.php, with content:

<html>
    <head>
        <title>Articles</title>
        <link rel="stylesheet" href="{{ elixir('css/app.css') }}" />
    </head>

    <body>
        <div class="container">
            <div class="row">
                <div class="panel panel-primary">
                    <div class="panel-heading">
                        Articles <small>({{ $articles->count() }})</small>
                    </div>
                    <div class="panel-body">
                        @forelse ($articles as $article)
                            <article>
                                <h2>{{ $article->title }}</h2>

                                <p>{{ $article->body }}</body>

                                <p class="well">{{ implode(', ', $article->tags ?: []) }}</p>
                            </article>
                        @empty
                            <p>No articles found</p>
                        @endforelse
                    </div>
                </div>
            </div>
        </div>
    </body>
</html>

We are using blade here and what this does is loop over a given $articles variable and showing it in the page. It’s also using Elixir here, just so we can use Twitter Bootstrap. You can run yarn to pull your dependencies, or npm install if you don’t have yarn installed. Then run $ gulp or $ ./node_modules/.bin/gulp to build the assets. After that, it’s time to run our Seeder. However, we haven’t configured our database yet, let’s use sqlite just because it’s simpler.

$ touch database/database.sqlite

and edit the .env file to change the DB credentials to:

DB_CONNECTION=sqlite

Remove every other DB_* entry. Now, you have to stop the php artisan serve (if it’s still running) and run it again. After that, the app reloaded the configs. It’s time to run the migrations and seed our database. Run php artisan migrate --seed and then open the app in your browser (at localhost:8000). You should see something like this:

articles app.

Now, let’s implement a search endpoint. At first, we will implement it using plain SQL. We will write a Repository here, it’s useful for fetching data. Our Repository interface would be something like this:

<?php

namespace App\Articles;

use Illuminate\Database\Eloquent\Collection;

interface ArticlesRepository
{
    public function search(string $query = ""): Collection;
}

And the implementation would be like:

<?php

namespace App\Articles;

use App\Article;
use Illuminate\Database\Eloquent\Collection;

class EloquentArticlesRepository implements ArticlesRepository
{
    public function search(string $query = ""): Collection
    {
        return Article::where('body', 'like', "%{$query}%")
            ->orWhere('title', 'like', "%{$query}%")
            ->get();
    }
}

Now, we can bind the interface in the AppServiceProvider, like so:

<?php

namespace App\Providers;

use App\Articles\ArticlesRepository;
use Illuminate\Support\ServiceProvider;
use App\Articles\EloquentArticlesRepository;

class AppServiceProvider extends ServiceProvider
{
    public function register()
    {
        $this->app->bind(ArticlesRepository::class, function () {
            return new EloquentArticlesRepository();
        });
    }
}

Cool. Now, let’s create the search route in our routes/web.php file, like so:

<?php

use App\Articles\ArticlesRepository;

Route::get('/search', function (ArticlesRepository $repository) {
    $articles = $repository->search(request('q'));

    return view('articles.index', [
    	'articles' => $articles,
    ]);
});

We can then add the search field in our view (resources/views/articles/index.blade.php):

<div class="panel panel-primary">
    <div class="panel-heading">
        Articles <small>({{ $articles->count() }})</small>
    </div>
    <div class="panel-body">
        <div class="row">
            <div class="container">
                <form action="{{ url('search') }}" method="get">
                    <div class="form-group">
                        <input
                                type="text"
                                name="q"
                                class="form-control"
                                placeholder="Search..."
                                value="{{ request('q') }}"
                        />
                    </div>
                </form>
            </div>
        </div>

        <div class="row">
            <div class="container">
                @forelse ($articles as $article)
                    <article>
                        <h2>{{ $article->title }}</h2>

                        <p>{{ $article->body }}</body>

                        <p class="well">
                            {{ implode(', ', $article->tags ?: []) }}
                        </p>
                    </article>
                @empty
                    <p>No articles found</p>
                @endforelse
            </div>
        </div>
    </div>
</div>

Now you can search for something like:

searching 101.

It works. Now, we can finally implement the Elasticsearch version of it.

Integrating Elasticsearch

Since Elasticsearch talks REST, what we’re going to do here is basically hook into the Eloquent models we want to index on it and send some HTTP requests to the Elasticsearch API. The concepts shown here I took from a Laracon Talk linked at the bottom. It is using Laravel, but the concepts can be applied to any language/framework.

We are going to be using Model Observers in this example, so we have a regular Eloquent Model, let’s say Article. Then we can write a generic pair of trait and observer that will handle indexing for all of our models (the ones that uses the trait, of course), so we would have something like:

<?php

namespace App\Search;

use Elasticsearch\Client;

class ElasticsearchObserver
{
    private $elasticsearch;

    public function __construct(Client $elasticsearch)
    {
        $this->elasticsearch = $elasticsearch;
    }

    public function saved($model)
    {
        $this->elasticsearch->index([
            'index' => $model->getSearchIndex(),
            'type' => $model->getSearchType(),
            'id' => $model->id,
            'body' => $model->toSearchArray(),
        ]);
    }

    public function deleted($model)
    {
        $this->elasticsearch->delete([
            'index' => $model->getSearchIndex(),
            'type' => $model->getSearchType(),
            'id' => $model->id,
        ]);
    }
}

We need to bind this observer into all of our Models that we want to index in Elasticsearch. We can do that by introducing a new Searchable trait. This trait will also provide the methods the observer uses.

<?php

namespace App\Search;

trait Searchable
{
    public static function bootSearchable()
    {
        // This makes it easy to toggle the search feature flag
        // on and off. This is going to prove useful later on
        // when deploy the new search engine to a live app.
        if (config('services.search.enabled')) {
            static::observe(ElasticsearchObserver::class);
        }
    }

    public function getSearchIndex()
    {
        return $this->getTable();
    }

    public function getSearchType()
    {
        if (property_exists($this, 'useSearchType')) {
            return $this->useSearchType;
        }

        return $this->getTable();
    }

    public function toSearchArray()
    {
        // By having a custom method that transforms the model
        // to a searchable array allows us to customize the
        // data that's going to be searchable per model.
        return $this->toArray();
    }
}

We can register our Observer in our model like this:

<?php

namespace App;

use App\Search\Searchable;
use Illuminate\Database\Eloquent\Model;

class Article extends Model
{
    use Searchable;
}

Now whenever we create, update or delete an entity using our Eloquent Article model, it triggers the Elasticsearch Observer to update its data on Elasticsearch. Note that this happens synchronously during the HTTP request, a better way is to use queues and have an Elasticsearch handler that indexes data asynchronously to not slow down the user’s request.

The Elasticsearch Repository

We can feed Elasticsearch with our data, it’s time to perform some searching. We can still have that SQL implementation as a backup to fallback in case our elasticsearch servers crash. In order to do so, we can create another implementation of our Repository interface, like so:

<?php

namespace App\Articles;

use App\Article;
use Elasticsearch\Client;
use Illuminate\Database\Eloquent\Collection;

class ElasticsearchArticlesRepository implements ArticlesRepository
{
    private $search;

    public function __construct(Client $client) {
        $this->search = $client;
    }

    public function search(string $query = ""): Collection
    {
        $items = $this->searchOnElasticsearch($query);

        return $this->buildCollection($items);
    }

    private function searchOnElasticsearch(string $query): array
    {
    	$instance = new Article;

        $items = $this->search->search([
            'index' => $instance->getSearchIndex(),
            'type' => $instance->getSearchType(),
            'body' => [
                'query' => [
                    'multi_match' => [
                    	'fields' => ['title', 'body', 'tags'],
                        'query' => $query,
                    ],
                ],
            ],
        ]);

        return $items;
    }

    private function buildCollection(array $items): Collection
    {
        /**
         * The data comes in a structure like this:
         * 
         * [ 
         *      'hits' => [ 
         *          'hits' => [ 
         *              [ '_source' => 1 ], 
         *              [ '_source' => 2 ], 
         *          ]
         *      ] 
         * ]
         * 
         * And we only care about the _source of the documents.
        */
        $hits = array_pluck($items['hits']['hits'], '_source') ?: [];
        
        $sources = array_map(function ($source) {
            // The hydrate method will try to decode this
            // field but ES gives us an array already.
            $source['tags'] = json_encode($source['tags']);
            return $source;
        }, $hits);

        // We have to convert the results array into Eloquent Models.
        return Article::hydrate($sources);
    }
}

We opted for hydrating the models with the documents we got from Elasticsearch. This is only possible because we are indexing the whole model as it came from the database (our single source of truth) using the $article->toArray(). Although we might gain some time with this, it might be limiting in another scenario. Another way of doing this is doing a Article::find($ids) using the IDs that came in the Elasticsearch documents. Since IDs are indexed, chances are it’s faster when you perform the query/filtering on Elasticsearch and load the models from the database than performing the whole query/filtering in the Database itself. Using find($ids) instead of hydrate($sources would also allow you to change the schema in a way that makes sense and facilitates your searches vs. fighting the schema to build a more complex search.

The trick to switch the repository is to replace the binding in the ServiceProvider, like so:

<?php

namespace App\Providers;

use Elasticsearch\Client;
use App\Articles\ArticlesRepository;
use Illuminate\Support\ServiceProvider;
use App\Articles\EloquentArticlesRepository;
use App\Articles\ElasticsearchArticlesRepository;

class RepositoriesServiceProvider extends ServiceProvider
{
    public function register()
    {
        $this->app->singleton(ArticlesRepository::class, function($app) {
            // This is useful in case we want to turn-off our
            // search cluster or when deploying the search
            // to a live, running application at first.
            if (!config('services.search.enabled')) {
                return new EloquentArticlesRepository();
            }

            return new ElasticsearchArticlesRepository(
                $app->make(Client::class)
            );
        });
    }
}

Whenever we request an ArticlesRepository interfaced object from the IoC container, it will actually give an ElasticsearchArticlesRepository instance if it’s enabled, otherwise it will fallback to the Eloquent version of it.

We need to do some customizations to configure the Elasticsearch client, we can bind it in the AppServiceProvider or create a new one, I’m going to use the existing AppServiceProvider, so something like:

<?php

namespace App\Providers;

use Elasticsearch\Client;
use Elasticsearch\ClientBuilder;
use Illuminate\Support\ServiceProvider;

class AppServiceProvider extends ServiceProvider
{
    public function register()
    {
        $this->bindSearchClient();
    }

    private function bindSearchClient()
    {
        $this->app->bind(Client::class, function ($app) {
            return ClientBuilder::create()
                ->setHosts(config('services.search.hosts'))
                ->build();
        });
    }
}

Now that we have the code almost ready, we need to finish the configuration. You might have noticed the usages of the config helper method in some places in the implementation. That loads the configuration files data. Here is the configuration I used in the config/services.php:

<?php

return [
    // there are other configs here...
    'search' => [
        'enabled' => env('SEARCH_ENABLED', false),
        'hosts' => explode(',', env('SEARCH_HOSTS')),
    ],
];

We set the configuration here and tell Laravel to check the environment variables to find our configuration. We can set it locally in our .env file, like so:

SEARCH_ENABLED=true
SEARCH_HOSTS=localhost:9200

We are exploding the hosts here to allow passing multiple hosts using a comma-separated list, but we are not using that at the moment. If you have your php server running, don’t forget to reload it so it fetches the new configs. After that, we need to populate Elasticsearch with our existing data. To do so, we are going to need a custom artisan command. Create one using php artisan make:command ReindexCommand (see code below). This command will also be really useful later on if we come to change the schemas of our Elasticsearch indexes, we could change it to drop the indexes and reindex every piece of data we have (or using aliases for a more zero-downtime approach).

<?php

namespace App\Console\Commands;

use App\Article;
use Elasticsearch\Client;
use Illuminate\Console\Command;

class ReindexCommand extends Command
{
    protected $name = "search:reindex";
    protected $description = "Indexes all articles to elasticsearch";
    private $search;

    public function __construct(Client $search)
    {
        parent::__construct();

        $this->search = $search;
    }

    public function handle()
    {
        $this->info('Indexing all articles. Might take a while...');

        foreach (Article::cursor() as $model)
        {
            $this->search->index([
                'index' => $model->getSearchIndex(),
                'type' => $model->getSearchType(),
                'id' => $model->id,
                'body' => $model->toSearchArray(),
            ]);

            // PHPUnit-style feedback
            $this->output->write('.');
        }

        $this->info("\nDone!");
    }
}

After registering the command in the Console Kernel at app/Console/Kernel.php, we can run php artisan search:reindex to index existing articles to Elasticsearch.

Now if we type the same search, it should show a similar result:

searching 02

You can see that we are now getting more results than before. I’m not going to try to explain why now, but you might be thinking there’s nothing new here. We could have achieved a similar result with plain SQL. Yes, we could. But Elasticsearch brings other toys to the table. Let’s say, for instance, that you care more about matches in the title field than any other field and you have some tags searching, like so:

searching 03

If you check it out, each of the results have either php or javascript or both tags. Now, let’s define that relevance rule we got about the title field:

<?php

'query' => [
    'multi_match' => [
        'fields' => ['title^5', 'body', 'tags'],
        'query' => $query,
    ],
],

We are defining here that the matches in title field are 5 times more relevant than the other fields. If you reload, nothing happens. But watch now:

searching 04

The first match doesn’t have the right tags, but the title matches with the last term we used, so it boosts up. Cool, we only had to do a few configuration changes, right?

Defining relevance is a very sensitive topic. It might need some meetings and discussions, as well as prototypes before you and your team can decide on what to use.

Wrapping up

We covered the basics here and how to integrate your Laravel app with Elasticsearch. If you want to know more about how to improve your queries, we got you covered: check out these posts Basic understanding of text search in elasticsearch and also How to build faceted search with facet counters using Elasticsearch.

Also, did you know that Laravel has its own full-text search official package? It’s called Laravel Scout, it supports Algolia out-of-the-box and you can write your custom driver. It seems to be only for full-text search, though. In case you need to do some fancy searches with aggregations, for example, you can integrate in your own way. There are packages out there to help you, check out our elasticsearcher package, for example.

You can check this repository for an example of what I’m describing here.

Useful Resources

See you all soon.

Comments