I have already covered how you can easily integrate Elasticsearch with your app, but I haven’t talked anything about how you can query your data.
I won’t cover the basics of querying or filtering here, instead I will cover a cool feature called aggregations, it’s a way to perform some analysis over your data. And I’m also going to cover a still experimental feature called scripted metric.
To start, you need to setup Elasticsearch locally or in a VM (Vaprobash has a good script here). You will also need some schema and data. You can use the example given in the documentation:
This will create our structure, now we just need some data:
Time to analyse
First of all, let’s see how Elasticsearch stores our data, and how we can query them all:
This will result in something like so:
_source is our data. Let’s do some analyses using some built-in aggregations.
This will result in something like this:
We are using
size:0 because we don’t care about the query result, just our agrregation. This is like
group by clause (at least it’s how I see it), for those who are used to SQL. Pretty bool, right?
You can also send a query string param
search_type=countand this will return just your aggregations results and counts, not the search hits. If you do so, no need to use the
Wow, this just gave us some analyses about our sales. It says that we have
minimum amount is 10,
maximum is 130,
average is 62.5 and the
sum is 250. Ok, That’s cool, but we can’t use this data. Our sales also store some costs, so all of these fields are telling nothing to us in this context. Don’t get me wrong, this aggregation is pretty cool, just not useful in this context. How can we perform some analyses about our
profit then? Well, we can build our own aggregation using some Groovy scripts, it’s called
Let’s see the scripts and then I’ll try to explain what they actually are:
That is our profit. Let’s now analyse what is going on here. We created a new aggregation called
profit of type
scripted_metric, this requires only the
map_script property, but let’s see all of them:
init_script: it acts in the begining of the process, prior to the collection of documents and we are ust creating an array called
_aggobject (our aggregation object);
map_script: executed one per document. This, as I said, is the only one required. If you are not using any other script, you have to assign the resulting state to the
_aggobject in order to see what it does. Those familiar with
map/reducealready got it, we can change the structure of our document for the aggregation result here. In fact, we are checking if our document type is
subtractits value properly;
combine_script: we can use this to change our aggregation structure, right now we have an array of object with a
transactionsarray containing each document amount. We are using this to tranform our array of objects into a single array containing all our
reduce_script: the combine_script transformed our aggregation value into an array of
amounts, we can reduce it to a single value in this script (again, those familiar with
map/reduceconcept already knew this);
You can try performing this aggregation removing each script (except for the
map_script that is the only one required and for the
map_script uses). It clarifies a little more. :)
What is available for the
Well, pretty much you have
doc, which is (of course) your document itself. But you also have a
_source object, which corresponds to the source of your document (this one is slower than the
The downside of using scripts in aggregations is that we can end up with a bunch of Java (Groovy) code that we have to take care of. You can store your scripts in elasticsearch and just reference them, but you need to be careful with it and treat it well.
This is a special
bucket aggregation that you can use to perform aggregations on nested documents.
First, you can nest documents on any document by setting the property type to “nested”, as in the example below:
This mapping lets you have a
matches document with nested
players, you can have an array of players in a single match. Let’s say you want to display the “top players” and it is just a SUM of all scores of each player, then you could use nested aggregations like so:
key field is the
player.id and the
doc_count is in how many docs this user exists (which, in this case, can be used as “how many matches this user have played”). It’s possible to limit the aggregation result by specifying a
size:X on your aggregation definition, like so:
In the example above we are fetching only the 10 best players.
You can perform, for example, the
stats aggregation and get a report about users scores, like so:
Which will return something like this:
Pretty cool. The example above shows us how you can have nested documents and how you can nest/chain aggregations (not related to nested documents).
The built-in aggregations are pretty cool and you can perform a lot of analyses on your data with them, with
scripted_metric you can build your own aggregation that fits best on your context.
Also worth saying that your aggregations will use the documents your query returned, so you can filter your documents in the query section and aggregate on them. You can also filter the returned documents inside another aggregation and so on.
At madewithlove we have been experimenting with a new package called Elasticsearcher, it is working great so far. We will have a dedicated post about the package covering how to easily setup and some usage examples. Stay tuned!