Skip to content

Add a review command to evaluate the quality of translations #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

jonkan
Copy link

@jonkan jonkan commented May 17, 2024

(This is a continuation of #30)

I have a lot of failed (and poor) translations in my string catalogs that I want to correct. Basically #14. So I went ahead to try and add this.

  • Add a mark-needs-review command that marks all translated strings as NEEDS REVIEW. To allow for existing translations to be reviewed.
  • Add a review command to evaluate the quality of translations marked as NEEDS REVIEW. Good translations are marked as translated.

Example output:

Evaluating key `Restore Purchases` [Comment: n/a]
ro     ✅ Good:    Restabilire achiziții
nl     ✅ Good:    Aankopen herstellen
el     ✅ Good:    Επαναφορά Αγορών
fi     ✅ Good:    Palauta ostot
it     ✅ Good:    Ripristina Acquisti
es     ✅ Good:    Restaurar Compras
pl     ✅ Good:    Przywróć zakupy
hu     ✅ Good:    Vásárlások visszaállítása
de     ✅ Good:    Käufe wiederherstellen
nb     ✅ Good:    Gjenopprett kjøp
tr     ✅ Good:    Satın Alımları Geri Yükle
fr     ✅ Good:    Restaurer les achats
hr     ⚠️ Poor:    Vrati kupnje
-      🤖 Reason:  The translation correctly reflects the correct language (Croatian) and the general meaning; however, the specific term 'Restore Purchases' often refers to a functionality in apps or software. A more appropriate translation in this context could be 'Obnovi kupnje'. Therefore, the quality is marked as 'poor' rather than 'good'.
sv     ✅ Good:    Återställ köp
sk     ✅ Good:    Obnoviť nákupy
da     ⚠️ Poor:    Gendan Køb
-      🤖 Reason:  The translation 'Gendan Køb' is almost correct but not entirely accurate. A better translation would be 'Gendan køb' using lowercase 'k' for 'køb'.
cs     ✅ Good:    Obnovit nákupy
pt-PT  ✅ Good:    Restaurar Compras

Still work in progress and a couple of todo's etc.:

  • Still some false positives that should be able to be fixed by adjusting the prompt.
  • Fix/decide how the --lang option should work. I think "all" should mean "all in string catalog" and not "some common ones".
  • Also, I've refactored a couple of things to try and get this to fit better into the code. There's probably room for improvement.
  • It's rather (very!) slow. Maybe it would be possible to parallelize...
  • Some breaking changes that might be avoidable (e.g. moving the previous root command into a translate subcommand).
  • ...

Let me know if you would be interested in merging this. If so, I'll happily take feedback and discuss improvements.

Cheers,
Jonas

@andrewtheis
Copy link
Contributor

andrewtheis commented May 22, 2024

Going to take some time to review this. Regarding the language input -- I believe the way I built this originally was if you omit that flag it will look at what is in the String Catalog. The issue is, unless there are existing translations, the target languages are known only at the project file, and not the catalog files

This was referenced May 22, 2024
@andrewtheis
Copy link
Contributor

andrewtheis commented Jun 2, 2024

Seems like there's a lot of overlap between these PRs and there's a lot going on here. I'd like these broken down into more digestible PRs.

The sourceKey property change should be reverted...

sourceKey in the context of localization is a reference to the identifier (key) of the text to translate. This could either be the literal display text in the source language such as Cannot save file, no disk space remaining or something like ALERT_OUT_OF_DISK_SPACE, which then would have an English value in the String Catalog for the text to display.

It's often recommended to use the latter, especially when it comes to multi-line text. This allows you to just use the key in your Text() or String(localized:) definitions, reducing clutter in your source code.

@jonkan
Copy link
Author

jonkan commented Jun 3, 2024

Seems like there's a lot of overlap between these PRs and there's a lot going on here.

Yes, I know. I've added a lot of features I needed/wanted for my use-case. It's all in one straight line of commits. I tried to "package" it the best I could into review:able chunks. Unfortunately, when creating PR:s to a fork it's not possible to create PR dependencies and have each chunk/PR only show the diff to the previous one.

I'd like these broken down into more digestible PRs.

I totally understand. I'll can try and do that. What would you like me to start with? I've basically added these sub-commands:

  • review Review the quality of translations marked as NEEDS REVIEW. Marking good translations as translated/green.
  • mark-needs-review Mark translated strings as NEEDS REVIEW. E.g. if you want to review your translations using the review command.
  • lint Lint translations in String Catalog(s), optionally marking failed as NEEDS REVIEW.
  • remove Remove languages from String Catalog(s).

Then I've also added:

  • Support for fastlane metadata (I refactored the spec format slightly from what I first mentioned in Add support for fastlane metadata or other text files #12, see commit message in 74fc715).
  • Parallelization support to the translate and review commands, to speed them up. Very basic though, this should probably be improved, but it works, which was good enough for me at this point.

The sourceKey property change should be reverted...

sourceKey in the context of localization is a reference to the identifier (key) of the text to translate.
This could either be the literal display text in the source language such as Cannot save file, no disk space remaining or something like ALERT_OUT_OF_DISK_SPACE, which then would have an English value in the String Catalog for the text to display.

It's often recommended to use the latter, especially when it comes to multi-line text. This allows you to just use the key in your Text() or String(localized:) definitions, reducing clutter in your source code.

Well, I see what you're saying. But if the key is "ALERT_OUT_OF_DISK_SPACE" and the source language translation/value is "Cannot save file, no disk space remaining" I very much expect a variable named sourceKey to return "ALERT_OUT_OF_DISK_SPACE" and not "Cannot save file, no disk space remaining" like it currently does.

@andrewtheis
Copy link
Contributor

How break these down is difficult. There's a lot of architectural changes--I had a mental roadmap of how the existing architecture could work with other features planned. So, some of these changes conflict with that. That's on me for not documenting this in the tickets... this unfortunately has taken a backseat to many other projects I'm involved in.

Before moving on the rest of this, tests are probably the priority. Mainly for StringCatalog, as it's the most finicky and really the core of how this all works. The actual translation code is really basic in comparison. I'll try and find time to document in tickets how I planned to implement the other features. There's a lot of usable stuff in your work, so I appreciate that. I just need to figure out how all the pieces fit together.

Some more notes so far:

  1. Regarding sourceKey: after checking again this does appear to be an oversight after I did some refactor work to correctly support variations. It was previously the key, but that has moved to the key of the var localizableStringGroups Dictionary in StringCatalog. So I'm ok with this change, can be done alongside unit tests.
  2. Setting Needs Review: Could be an issue, as this is something that Xcode does when it's not sure how to reconcile the strings found during compilation with what's already in the catalog. You can set it manually, but I don't know how that might change what Xcode operates. If you know or can point to any docs that would be useful.
  3. Evaluating translations: The way I planned this to work wasn't to ask ChatGPT about the quality (as it could just hallucinate the quality of it's own work), but instead to have it translate the value back into the source language and diff the strings to see how well it matches. Or, even better, to use a secondary service like Google Translate to translate back to the source language. This is how I was manually verifying ChatGPT's work in our products. I have a branch open for Google Translate support using their basic v2 API. Their v3 API which is LLM based is more complicated, support can be added at a later time.
  4. Prompts/etc, I still think using Functions is unnecessary now, as the system prompt seems to work 100% of the time to get ChatGPT to only output translations in it's response. Check develop and run against your own catalogs. Or use the ExampleCatalogs/LargeTestBench.xcstrings I added, which is pulled from one our products. It includes every possible scenario of strings/variances/etc in it.
  5. Config: Using YML is def how this feature should be implemented. There are Swift libraries for parsing it. Just want to make a plan on the structure and how it all fits together.

All that being said, I have a hunch that Apple might announce something similar to this library directly integrated into Xcode next week. So may be good to pause on any work until after WWDC keynotes Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants