Saturday 10 February 2018

Adding Field Logic in Microsoft Text Recognition

Use Case

Let suppose we have an image of a bill recipt and we want to extract the texts in linear. Maestro 11.45 from the image below.

But by using computer vision api we are able to identify text as a two region and not in a continuous line.

Here we are interested in extracting data in linear and not in two separate group. So after debugging we found the Top and Height Coordinated of (Region 1) Maestro and (Region 2) 11.45

So using the Top value which has the nearest common value of each. We can try to write some code to checks whether the difference between the top value of each word is close enough.

In other word we can check the difference of <= 15 pixels to consider those two text is in a same line.

Source Code

You can view full source Code at my Github Repo.

Code Explanation

public static void TextExtraction(string fname, bool wrds, bool mlines)
{
    Task.Run(async () =>
    {
        string[] res = await TextExtractionCore(fname, wrds, mlines);

        if (mlines && !wrds)
            res = MergeLines(res);

        PrintResults(res);

        Console.WriteLine("\nDate: " + GetDate(res));
        Console.WriteLine("Highest amount: " + HighestAmount(res));

    }).Wait();
}
    

In the above method TextExtraction third paramater mlines will let us know where we want to merge the line from two different regions.

public static string[] MergeLines(string[] lines)
{
    SortedDictionary dict = MergeLinesCore(lines);
    return dict.Values.ToArray();
}
    

In order to merge line we created method MergeLines which invokes another method MergeLinesCore that will return a sorted dictionary.

public static SortedDictionary MergeLinesCore(string[] lines)
{
    SortedDictionary dict = new SortedDictionary();

    foreach (string l in lines)
    {
        string[] parts = l.Split('|');

        if (parts.Length == 3)
        {
            int top = Convert.ToInt32(parts[0]);
            string str = parts[1];
            int region = Convert.ToInt32(parts[2]);

            if (dict.Count > 0 && region != 1)
            {
                KeyValuePair item = FindClosest(dict, top);

                if (item.Key != -1)
                    dict[item.Key] = item.Value + " " + str;
                else
                    dict.Add(top, str);
            }
            else
                dict.Add(top, str);
        }
    }

    return dict;
}
    

The above method MergeLinesCore loops through all the line in all the regions and find Closest line on the other reagion. Once it is found both line ate combined into a dictionary into single entry.