Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how can I use my own corpus to generate images? #2

Closed
damengdameng opened this issue Oct 14, 2020 · 3 comments
Closed

how can I use my own corpus to generate images? #2

damengdameng opened this issue Oct 14, 2020 · 3 comments

Comments

@damengdameng
Copy link

In the code, words are randomly generated and then pictures are generated.
so how can I use a fixed corpus to generate data?
thanks.

@hiyali
Copy link
Owner

hiyali commented Oct 15, 2020

You can provide your own corpus' word here L121.

@damengdameng
Copy link
Author

damengdameng commented Oct 15, 2020

thank you for the reply.

I tried to put my own corpus like this:

    word = 'جىنپىڭ'
    put_word = ''.join(reversed(word)) # for put into the img
    font = ImageFont.truetype(get_rand_font(), get_rand_font_size(len(put_word)))
    size = font.getsize(put_word)

but the letters on the picture are separated. it seems like Uighur should be first converted to Latin letters through the uly_char_map.

uly_char_map = {
    'ﺎﺋ': { 'Type': 'vowel', 'Latin': ['a', 'A'] },
    'ﺏ':  { 'Type':  None  , 'Latin': ['b', 'B'] },
    'ﭺ':  { 'Type':  None  , 'Latin': ['ch', 'Ch'] },
    'ﺩ':  { 'Type':  None  , 'Latin': ['d', 'D'] },
    'ﻪﺋ': { 'Type': 'vowel', 'Latin': ['e', 'E'] },
    'ﯥﺋ': { 'Type': 'vowel', 'Latin': ['é', 'É'] },
    'ﻑ':  { 'Type':  None  , 'Latin': ['f', 'F'] },
    'ﻍ':  { 'Type':  None  , 'Latin': ['g', 'G'] },
    'ﮒ':  { 'Type':  None  , 'Latin': ['gh', 'Gh'] },
    'ﮪ':  { 'Type':  None  , 'Latin': ['h', 'H'] },
    'ﻰﺋ': { 'Type': 'vowel', 'Latin': ['i', 'I'] },
    'ﺝ':  { 'Type':  None  , 'Latin': ['j', 'J'] },
    'ك':  { 'Type':  None  , 'Latin': ['k', 'K'] },
    'ل':  { 'Type':  None  , 'Latin': ['l', 'L'] },
    'م':  { 'Type':  None  , 'Latin': ['m', 'M'] },
    'ن':  { 'Type':  None  , 'Latin': ['n', 'N'] },
    'ڭ':  { 'Type':  None  , 'Latin': ['ng', 'Ng'] },
    'ﻮﺋ': { 'Type': 'vowel', 'Latin': ['o', 'O'] },
    'ﯚﺋ': { 'Type': 'vowel', 'Latin': ['ö', 'Ö'] },
    'پ':  { 'Type':  None  , 'Latin': ['p', 'P'] },
    'ق':  { 'Type':  None  , 'Latin': ['q', 'Q'] },
    'ر':  { 'Type':  None  , 'Latin': ['r', 'R'] },
    'س':  { 'Type':  None  , 'Latin': ['s', 'S'] },
    'ش':  { 'Type':  None  , 'Latin': ['sh', 'Sh'] },
    'ت':  { 'Type':  None  , 'Latin': ['t', 'T'] },
    'ﯘﺋ': { 'Type': 'vowel', 'Latin': ['u', 'U'] },
    'ﯜﺋ': { 'Type': 'vowel', 'Latin': ['ü', 'Ü'] },
    # v
    'ۋ':  { 'Type':  None  , 'Latin': ['w', 'W'] },
    'خ':  { 'Type':  None  , 'Latin': ['x', 'X'] },
    'ي':  { 'Type':  None  , 'Latin': ['y', 'Y'] },
    'ز':  { 'Type':  None  , 'Latin': ['z', 'Z'] },
    'ژ':  { 'Type':  None  , 'Latin': ['zh', 'Zh'] }
}

But the Uyghur characters I got from here [https://github.com/JaidedAI/EasyOCR/blob/master/easyocr/character/ug_char.txt]
is completely different from the one in uly_char_map and some items in uly_char_map seem to be composed of two letters. Can you give some suggestions?

@hiyali
Copy link
Owner

hiyali commented Oct 22, 2020

Here is your answer.

    # from lang.ug.util.convert import br_2_pf
    word = br_2_pf('جىنپىڭ')

@hiyali hiyali closed this as completed Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants