Heroku SeachBox ElasticsearchがRails Searchkickのデフォルト設定で動かない

実運用しているアプリが2020/2/13あたりからインデックスの新規作成が上手くいかず調査しました。

エラー

class Post < ApplicationRecord
  searchkick
end

動かない状態のサンプル(GitHub)

reindexでエラーが出ます。

irb(main):001:0> Post.reindex
Traceback (most recent call last):
        1: from (irb):1
Elasticsearch::Transport::Transport::Errors::BadRequest ([400] {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[oin-1][10.0.24.93:9300][indices:admin/create]"}],"type":"illegal_argument_exception","reason":"The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [49]. This limit can be set by changing the [index.max_ngram_diff] index level setting."},"status":400})

create_indexでエラーが出ている模様

対応

searchkick のインデックス設定を変更

  • max_shingle_size を4にする
  • min_gram, max_gram の差を1にする

※値は各自の環境、必要要件に合わせてください

class Post < ApplicationRecord
  MIN_GRAM = 1.freeze
  searchkick settings: {
    analysis: {
      filter: {
        searchkick_suggest_shingle: {
          max_shingle_size: 4
        },
        searchkick_ngram: {
          min_gram: MIN_GRAM,
          max_gram: MIN_GRAM + 1
        }
      }
    }
  }
end

サンプル(GitHub)

調査過程

index.max_ngram_diff を1に設定

The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [49]. This limit can be set by changing the [index.max_ngram_diff] index level setting.

index.max_ngram_diff は1にしないといけないということで設定する

class Post < ApplicationRecord
  searchkick settings: {
    index: { max_ngram_diff: 1 }
  }
end

結果

エラー変わらず

irb(main):001:0> Post.reindex
Traceback (most recent call last):
        1: from (irb):1
Elasticsearch::Transport::Transport::Errors::BadRequest ([400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [49]. This limit can be set by changing the [index.max_ngram_diff] index level setting."}],"type":"illegal_argument_exception","reason":"The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [49]. This limit can be set by changing the [index.max_ngram_diff] index level setting."},"status":400})

indexの設定を確認する

irb(main):001:0> Post.searchkick_index.index_options
{:settings=>
  {:analysis=>
    {:analyzer=>{},
     :filter=>
      {:searchkick_index_shingle=>{:type=>"shingle", :token_separator=>""},
       :searchkick_search_shingle=>
        {:type=>"shingle",
         :token_separator=>"",
         :output_unigrams=>false,
         :output_unigrams_if_no_shingles=>true},
       :searchkick_suggest_shingle=>{:type=>"shingle", :max_shingle_size=>5},
       :searchkick_edge_ngram=>
        {:type=>"edge_ngram", :min_gram=>1, :max_gram=>50},
       :searchkick_ngram=>{:type=>"ngram", :min_gram=>1, :max_gram=>50},
       :searchkick_stemmer=>{:type=>"snowball", :language=>"English"}},
     :char_filter=>{:ampersand=>{:type=>"mapping", :mappings=>["&=> and "]}}},
   :index=>{:max_ngram_diff=>1, :max_shingle_diff=>4}},
# ...省略

max_ngram_diff は1になったが min と max の実際の差が問題のよう

:searchkick_ngram=>{:type=>"ngram", :min_gram=>1, :max_gram=>50},

analysis.filter.searchkick_ngram.max_gram を2に設定

class Post < ApplicationRecord
  searchkick settings: {
    analysis: {
      filter: {
        searchkick_ngram: {
          max_gram: 2
        }
      }
    }
  }
end

結果

エラーが変わった

irb(main):001:0> Post.reindex
Traceback (most recent call last):
        1: from (irb):1
Elasticsearch::Transport::Transport::Errors::BadRequest ([400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"In Shingle TokenFilter the difference between max_shingle_size and min_shingle_size (and +1 if outputting unigrams) must be less than or equal to: [3] but was [4]. This limit can be set by changing the [index.max_shingle_diff] index level setting."}],"type":"illegal_argument_exception","reason":"In Shingle TokenFilter the difference between max_shingle_size and min_shingle_size (and +1 if outputting unigrams) must be less than or equal to: [3] but was [4]. This limit can be set by changing the [index.max_shingle_diff] index level setting."},"status":400})

次は max_shingle_diff は3にしないといけないとのこと

index.max_shingle_diff を3に設定

class Post < ApplicationRecord
  searchkick settings: {
    index: { 
      max_shingle_diff: 3 
    },
    analysis: {
      filter: {
        searchkick_ngram: {
          max_gram: 2
        }
      }
    }
  }
end

結果

エラー変わらず

irb(main):001:0> Post.reindex
Traceback (most recent call last):
        1: from (irb):1
Elasticsearch::Transport::Transport::Errors::BadRequest ([400] {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"In Shingle TokenFilter the difference between max_shingle_size and min_shingle_size (and +1 if outputting unigrams) must be less than or equal to: [3] but was [4]. This limit can be set by changing the [index.max_shingle_diff] index level setting."}],"type":"illegal_argument_exception","reason":"In Shingle TokenFilter the difference between max_shingle_size and min_shingle_size (and +1 if outputting unigrams) must be less than or equal to: [3] but was [4]. This limit can be set by changing the [index.max_shingle_diff] index level setting."},"status":400})

max_ngram_diff と同じで実際の数値が問題のよう :searchkick_suggest_shingle=>{:type=>"shingle", :max_shingle_size=>5}, 現状5になっている

analysis.filter.searchkick_suggest_shingle.max_shingle_size を4に設定

class Post < ApplicationRecord
  searchkick settings: {
    analysis: {
      filter: {
        searchkick_suggest_shingle: {
          max_shingle_size: 4
        },
        searchkick_ngram: {
          max_gram: 2
        }
      }
    }
  }
end

結果

reindex成功

irb(main):001:0> Post.reindex
D, [2020-05-06T04:53:35.428208 #4] DEBUG -- :   Post Load (632.2ms)  SELECT "posts".* FROM "posts" ORDER BY "posts"."id" ASC LIMIT $1  [["LIMIT", 1000]]
D, [2020-05-06T04:53:35.532908 #4] DEBUG -- :   Post Import (84.4ms)  {"count":3}
=> true