tech ramblings

Updating the mapping of an elasticsearch index

Okay, so you've set up elasticsearch. You've indexed your data. Search is super fast. All's good. But, suddenly, you have a requirement for which you need to change the mapping of your index. Maybe you need to use a different analyser, or maybe it's as simple as adding a new field to your document, which requires you to add the associated static mapping.

If you find yourself in such a situation, here are a few approaches you can take

Approach 1

  • with downtime; index from external data source.
  • This assumes that you have an external data source such as a database from which you can index data all over again, as if you were doing it for the first time.

When to use?

This approach only makes sense for testing purposes in local or in staging. This should not be used in a production environment because downtime isn't really desirable.

Steps

  • Delete the index using the Delete API
  • Create the index, and set the new mapping using the PUT Mapping API
  • Index documents from external data source. You could do this using the Bulk API

Approach 2

  • without downtime; index from external data source

When to use?

  • You could use this approach in production, but if you have a large number of documents, indexing from an external data source like a DB can be a time-consuming process.

Steps

  • If not done already, create an alias index_alias for your existing index (old_index) and change your code to use the alias instead of old_index directly.
  • Create a new index new_index
  • Index documents from external data source. You could do this using the Bulk API
  • Move the alias index_alias from old_index to new_index.

Caveats

  • While the downtime is essentially zero, there could still be consistency issues
  • Indexing from an external data source like a DB can be a time-consuming process if you have a large number of documents.

Approach 3

  • without downtime; index from elasticsearch

When to use?

Steps

  • If not done already, create an alias index_alias for your existing index (old_index) and change your code to use the alias instead of old_index directly.
  • Create a new index new_index
  • use elasticsearch reindex API to copy docs from old_index to new_index.
  • Move the alias index_alias from old_index to new_index.

Caveats

Approach 4

  • without downtime; update existing index

When to use?

  • Can be used in production when you want to merely add a new field mapping.

Steps

  • Update mappings of index online using PUT mapping API.
  • Use _update_by_query API with params conflicts=proceed - In the context of just picking up an online mapping change, documents which have been updated during the process, and therefore have a version conflict, would have picked up the new mapping anyway. Hence, version conflicts can be ignored.
  • wait_for_completion=false so that it runs as a background task
  • refresh so that all shards of the index are updated when the request completes.

Caveats