Okay, so you've set up elasticsearch. You've indexed your data. Search is super fast. All's good. But, suddenly, you have a requirement for which you need to change the mapping of your index. Maybe you need to use a different analyser, or maybe it's as simple as adding a new field to your document, which requires you to add the associated static mapping.
If you find yourself in such a situation, here are a few approaches you can take
Approach 1
- with downtime; index from external data source.
- This assumes that you have an external data source such as a database from which you can index data all over again, as if you were doing it for the first time.
When to use?
This approach only makes sense for testing purposes in local or in staging. This should not be used in a production environment because downtime isn't really desirable.
Steps
- Delete the index using the Delete API
- Create the index, and set the new mapping using the PUT Mapping API
- Index documents from external data source. You could do this using the Bulk API
Approach 2
- without downtime; index from external data source
When to use?
- You could use this approach in production, but if you have a large number of documents, indexing from an external data source like a DB can be a time-consuming process.
Steps
- If not done already, create an alias
index_alias
for your existing index (old_index
) and change your code to use the alias instead ofold_index
directly. - Create a new index
new_index
- Index documents from external data source. You could do this using the Bulk API
- Move the alias
index_alias
fromold_index
tonew_index
.
Caveats
- While the downtime is essentially zero, there could still be consistency issues
- Indexing from an external data source like a DB can be a time-consuming process if you have a large number of documents.
Approach 3
- without downtime; index from elasticsearch
When to use?
- Can be used in production when you want to change the mapping of an existing field. If you are merely adding a field mapping, prefer Approach 4
Steps
- If not done already, create an alias
index_alias
for your existing index (old_index
) and change your code to use the alias instead ofold_index
directly. - Create a new index
new_index
- use elasticsearch reindex API to copy docs from
old_index
tonew_index
. - Move the alias
index_alias
fromold_index
tonew_index
.
Caveats
- While the downtime is essentially zero, there could still be consistency issues
Approach 4
- without downtime; update existing index
When to use?
- Can be used in production when you want to merely add a new field mapping.
Steps
- Update mappings of index online using PUT mapping API.
- Use _update_by_query API with params
conflicts=proceed
- In the context of just picking up an online mapping change, documents which have been updated during the process, and therefore have a version conflict, would have picked up the new mapping anyway. Hence, version conflicts can be ignored. wait_for_completion=false
so that it runs as a background taskrefresh
so that all shards of the index are updated when the request completes.
Caveats
- Can't be used if you want to change the mapping of an existing field. Use Approach 3 instead.