Generating Solr Queries

stuff about computer science and programming
Post Reply
User avatar
dendiz
Site Admin
Posts: 234
Joined: Wed Oct 10, 2018 3:48 am

Generating Solr Queries

Post by dendiz » Thu Feb 14, 2019 11:02 pm

an interesting problem that I came across while working with n11 was this:
Some users search by substituting umlaut letters for their plain versions. E.g. instead of "önlük" they will search for "onluk" or "onlük" or even "önluk" or any other combination. Since the titles of the products only (usually) contained the correct spelling for the query it would lead to bad search result quality. So we needed a way of generating all combinations of these queries and joining them with a logical OR before we sent the query to the search engine. Here is a simple way of doing it

Code: Select all

public class SolrQueries {
    Map<String, String> charMap = new HashMap<>();
    public SolrQueries() {
        charMap.put("o", "ö");
        charMap.put("u", "ü");
        charMap.put("c", "ç");
        charMap.put("g", "ğ");
        charMap.put("i", "ı");
        charMap.put("s", "ş");
    }

    public static void main(String[] args) {
        List<String> result = new ArrayList<>();
        new SolrQueries().run("mohou", 0, "", result);
        System.out.println(result);
    }

    void run(String query, int idx, String cur, List<String> result) {
        if (idx == query.length()) {
            result.add(cur);
            return;
        }
        String letter = query.charAt(idx) + "";
        if (charMap.containsKey(letter)) {
            run(query, idx + 1, cur + charMap.get(letter), result);
        }
        run(query, idx+1, cur + letter, result);
    }
}
There are some cases that this code doesn't take it into account.
1. it only goes from non-umlaut to umlaut version. Ideally it should also go the other way as well.
2. it only supports 1 mapping per character. There cases where a character could map to 2 different characters. This would require a different data structure to solve then a map for lookups (Hint: Union-Find)

Post Reply