Game Quality Forum 2023
Hey! Here’s all the stuff for the Game Quality Forum talk on the integrations and automation I develop and use at Coffee Stain, and some ideas on how to improve localization workflows in Unreal Engine.
↓ Scroll to the presentation text on this page
Crowdin issues → Google Sheets @ Github
Presentation Text
1 Intro
Hey! I’m Alex, I take care of localization at Coffee Stain Publishing.
2 My timeline: Now I can finally be a good client! =)
I’ve been a freelance translator for over 10 years. Then I worked at a localization studio: loc management and engineering, lead work, a bit of sales, you name it… It was a small company, and I’m extremely curious =) During all that time, I’ve always seen things that could have been done by the developers early on to make the localization of their games a much better experience, while also boosting quality, and saving time for both us and them. It was always a pain to be on the outside, unable to affect this early enough to fix things.
2 Coffee Stain: Numbers
I’ve joined a couple of years ago and I was the first person to take care of localization at the company. We’re a small company, again, just like 7 people in the production team now. Our games are usually somewhere between 10k and 100k words. Some of our projects are translated by community, some by professional translators, some use a combination of both. The number of projects that need localization support has been growing steadily, and now, we have over 10 projects in different stages of development, anywhere from early localization checks and talks to post-release patching and support.
Early Loc Checks and Prep Work
3 Push files vs Fun
When I joined the product company, I knew what I have to focus on: early localization talks and prep work, doing things in a way that enables localization later down the line and makes it go well. That meant I had to learn a lot about Unreal (and Unity), and that was quite a journey. It also meant that if I wanted to have any chance to get some meaningful work done, I had to eliminate as much manual work as possible. That’s what led me to develop the tools I’ll talk about. And what started off as a couple of simple automation and integration scripts, is evolving into something more.
Sorting PO
4 + 5 Unreal PO entry, sorting
By default, when you create an in-place text—as in, not a string table entry—in Unreal, it creates a GUID key for it. It’s a long-ass hexadecimal string. And then Unreal sorts the PO file by these keys. And since they’re hashes, this kind of sorting essentially randomizes the file order. It’s a disaster for translation.
In this same PO, there are also source references: paths to the assets where the text is stored. And you can sort the file by those references. And it becomes much better because assets in a project are usually well organized, and this sorting method naturally groups things that belong to the same folder or asset.
So the first thing I’ve done is I’ve taken care of this sorting with a Python script. And it’s not that hard, there’s a library to work with PO files, and it turns out that just calling po.sort() does the trick: apparently, sorting by reference is default.
Loc tools
6 How it all started? Manual work
on Satisfactory, time spent numbers?
When I joined, all was done manually. You’d go into Unreal, gather and export the text via the Loc Dashboard, go into the folder with your source locale, upload the file to Crowdin or any other CAT tool. Then you’d build the project on Crowdin, download the zip, unpack it, rename a bunch of folders to match the Unreal locale names, copy the folders to the Unreal project folder, then go to the Loc Dashboard to import and compile the text.
It’s as tedious as it sounds, and it takes a lot of time for 40+ languages that we had at the time. And it’s a lot of actions, so you can’t really do other things.
The worst part: If something goes wrong, you have to do it again.
(Funny thing: I have another project now where the devs have lost the knowledge of how to work with PO files, and they’d somehow get the translations as an Excel table and copy-paste them into the translation editor in Unreal. Manually. Hundreds of them.)
Integrating with Crowdin
7 Crowdin: CLI vs API
We work with Crowdin, they have a nice and simple API, with clients in a bunch of programming languages. I chose that over CLI because I wanted to code in Python, plain and simple. You don’t have to do this. I know some people here are using CLI. It’s documented, and Crowdin support is the best support I’ve seen in the industry, they’ll help you with everything :)
But I coded my own thing, and it was simple as well because they have a Python API client, and a nice API reference.
8 Crowdin workflow
So now I have these scripts that can update source files and download translations. And copy them to the correct folders. So part of this tediousness is now gone.
Automating Unreal
9 Early command line automations
Automating Unreal is the next best thing I could do. Because what happens when you launch these actions from the Loc Dashboard, is that it launches a command line (headless) version of the editor to run the gather text commandlet with whatever commands you’ve selected. And I had this nagging feeling that this could be done from the outside.
10 Hint showing the actual command
It took me a surprisingly long time to find the command to do it, but here it is.
11 Command to run gather/export and import/compile
So now I had this. Now I didn’t have to launch the editor, which takes a long while to launch the command line version of it manually. I could just do it right away, and also launch 2 or more commands at once. Because in Loc Dashboard, you’d have to launch Gather, wait for it to finish, and then launch Export, for example. With that invocation, you just launch both of them.
Debug IDs
12 + 13 Debugs IDs in DRG, community
Then I had this talk with a couple of guys from the DRG community, and they show me what they’ve made for DRG: simple, numeric debug IDs. And they made the devs add a locale for this, and add a hotkey to switch to it and back to the normal locale. That lets community translators see exactly what string they’re looking at in the game. They could then search for this number on Crowdin, and they’d find the string. And this number is unique, so you can identify the repetitions as well. And they’re easy to remember, and type. You don’t even need to create a screenshot.
To me, that was a really good reminder that people are smart, and they can come up with great ideas in your field even if they’re not professionals themselves.
A bit of a tale about community translators: they tend to work a bit differently from professionals. They do translate the text in a CAT tool, just like the pros, but they also just play the game, a lot, and if they see something that isn’t translated or is translated poorly, they make note of it or just go and fix it right away. So for them, this tool is invaluable, they use it all the time. And in DRG, they have a hotkey to switch to this locale. In Satisfactory, we have a console command but we also let people choose what locales to use for this switching, so they can, for example, use the debug IDs or just any other language, much like in Disco Elysium.
Thanks to Unreal, culture switching just works. Unless you do bad things to your text :)
14 Debug IDs in Satisfactory
Mention culture=keys in 5.2.
So I wanted that for Satisfactory :) Like I said, there are libraries to work with PO files, and I’m already using a script to prepare my source locale, so it’s just a matter of creating these IDs, adding this as a new locale in Unreal, and adding the IDs to the context in the PO file.
Now I have another script that I can use for any Unreal game, to give this opportunity to community, translators, and LQA testers.
And it not only identifies strings, it also shows you what other strings and values are used in place of the variables you have in this string.
This same script sorts this PO file by reference, and I use it as a source locale, so it all works great together.
Language completion rates and credits
15 Completion rates
So, Satisfactory is mostly translated by the community. We have over 50 languages now, not all of them are translated up to a threshold, and monitoring that, enabling and disabling languages manually, isn’t a nice process. So I wanted to automate that. Thankfully, it’s not that hard to do that via API: just pull completion stats, put them into a CSV… and then reimport that CSV as a data table in Unreal. And use that data table to show this info in Settings.
16 Community credits
I’m also a strong believer in putting credit where credit is due. So I had this in my mind from the day I started, that we need to add community translators to credits. Doing this manually would be an enormous task, maintaining this would be nigh impossible. So it had to automated as well, and guess what, it’s not that hard to do it via API either. Create per-language top members reports, download them, do some custom filtering, create a CSV with data for credits, reimport that as a data table, and use that data table to show this information in Credits.
It’s working now, every time I update the translations from Crowdin, it also updates completion rates and credits.
And that reimport the data table part? Yeah, that one I didn’t want to do manually either, so I had to figure out how to work with Unreal Python API to reimport an asset.
So now I have three more scripts: pull completion rates from Crowdin, pull top member reports for credits from Crowdin, reimport those CSVs as data tables in Unreal.
Putting it all together
17 + 18 + 19 One script to rule them all… and a config file, two of them
So now I have a bunch of these scripts, and I have a new Unreal project coming my way, and I realize that I want to deploy this for that project as well. And I also realize that I will want to deploy this for all the other Unreal projects that I’ll have (and yes, I do and I have deployed it), and that means that I need to create a configuration system for this. So that I’d only had to change some of the config values, like locales and targets and directories, to make it work for the new project.
I wrap the Unreal stuff into its own script, add logging, add configs, sane defaults. Learning a ton as I go. At this point it becomes painful to look at the early scripts that I’ve written, the code is so bad, and I start understanding Sam more, the guy behind MGH who was doing it solo for a lot of time.
And I develop that, and this evolves into a proper tool set that I call UE loc tools. And it’s open source. Here’s the link, feel free to explore. I think, at this point, it’s more of an inspiration and a reference for anyone who wants to get into this. But I do aim to make it much more friendly and easy to deploy. My ambition was to get it done for the conference but I just didn’t make it.
So now all I have to do is just check out the repo into the Python folder of a new Unreal project, change some config values, set things up in Unreal and on Crowdin, and I have a working installation with all the features that I need.
Hash locale
20 Hash locale
As I’m doing it, I’m also investigating the new project, and I realize I want some simple pseudo localization. Just like, start and end markers, maybe some text expansion. I could do it in a CAT tool but at this point, doing locally, as part of the loc tools is faster. So I implement that: a hash locale, that just adds # and ~ to the strings. So you can play the game as usual but it’s really simple to spot untranslatable lines and also, even this tiny expansion breaks a lot of things in the UI, and it shows how much work there is to do for release. And you can spot concatenation, which is the translators’ mortal enemy.
And surprisingly, it also uncovers some odd logical errors in asset packs, where they were using localizable text for their logic… Which would break if the text is translated.
And it’s early, so we have time, and that’s great.
Targets
21 Targets: go ahead and add a language in a project with 15 targets
Next, I have a project with 15 targets: that’s because we wanted to group things logically into files, and that seemed like a nice way to do it: one target, one file, all good.
Then I have to add a language… And it’s a pain: I have to go and do it manually, with not the best UI, in every target.
Then I realize that what they have as Portuguese is in fact Brazilian Portuguese, and there’s no way to change the locale. You can only add and delete. And I can only do it for one target at a time.
Then I need to add another target. And it’s empty, no locales in there, so I have to add all of them manually.
I didn’t do any of it in the UI, of course, I cheated by creating targets and copying the relevant part of the config in the DefaultEditor.ini file, but you get the idea :D
So I’ve thrown together a thing that allows you to copy locales from one target to another, to add or delete locales, and to rename a locale. It still requires you to run gather at least once to make Unreal update some of the binary files, but it’s working, and it lets me skip the manual copy-paste cheating.
MT + Pseudo = Longest Locale
22 Longest locale in Satis or MGH?
Ever since I was working on the hash locale, I had this idea that it’d be nice to use more of pseudo localization. But I always realized that the usual “Extend the text by 30%” approach sucks. Because short strings sometimes grow much more. I thought that using MT for this could give me more realistic results. You’d get a not-so-good translation that is nevertheless much closer to the real thing in how it looks and feels.
I’ve finished working on this just days before the conference, and I’m not sure if it’s compatible with both UE 4 and 5, and Crowdin.com and Enterprise, but it’s working for Satisfactory, and it’s going to work for the new games as well.
And it’s great, showing the devs their game with this locale, how long things tend to become, and how crowded or broken the UI is after localization. Or how empty it looks, with Chinese, where most of the short strings become just one or two symbols. I feel like it’s a really powerful tool, much better than telling them that their UI should be flexible to be ready for other languages. And I used to do this manually, I’d go into engine, and just translate a widget blueprint on the spot. Now I can hopefully do this automatically, at scale, and it’s a locale that still allows me to play the game, it’s not gibberish, and make some screens or videos for the devs, or it allows them to play the game if they have the time.
Configs
23 Config
It’s a list of scripts that are available, with the overall configuration, and then it’s a set of task lists, which are what it says they are: lists of tasks to perform. And I have one for every thing I need to do regularly, like add a new file to Crowdin, do everything except for updating the source, updates the source, do everything including updating the source, recreate reports and reimport assets, create the longest locale, etc. And if I have a new situation that calls for another set of tasks, I just create a new task list and tweak the script parameters in that task list.
Other things
24 Add source files
And P4 checkout
25 Add auto comments
26 Crowdin issues to spreadsheet
27 Import screens
And filter strings
28 Import comments from CSV into PO or strings tables
And add source diffs
29 Other things…
30 What’s next for loc tools?
I want to make this way more friendly: make better logging, better guides, add tools to set things up in Unreal and Crowdin. Then maybe add a wizard to deploy and create task lists you want, add UI in Unreal Editor.
I want to continue developing and adding new features that I find useful for my games.
I want to have a discussion about what other people want in their games, and maybe developing that.
I have a bunch of other projects as well.
31 What else?
Unreal PO support on Crowdin
32 Unreal PO vs standard PO identity
Unreal POs are non-standard in a way. Unreal treats msgctxt as the only thing that identifies the string, while the standard says that it’s the combination of the msgid and msgctxt that identifies the string. As a result, when you fix a typo, for a CAT tool, it’s an old string being deleted, and a new one being added. So we’re fixing that by creating a special unreal_gettext format.
33 Unreal vs ICU
And Unreal is using its own ICU-like syntax, that is, of course, not supported by any CAT tool, so we’re working on converting it to the ICU and back. So that we could leverage all the nice tools that Crowdin provides for ICU: syntax highlighting, skeleton generation, previews, and QA checks.
Improving Unreal
34 Unreal loc system
A bit about Unreal. It has a great localization system overall: it supports a lot of stuff, works great in the game, and the editor tools are reliable. I just can’t not praise it, it’s amazing, a lot of tech is mind-blowing, with how reliable and fast it is.
Yet, it also leaves a lot to wish for. Mostly when it comes to source text authoring and preparation, ensuring consistent and error-free source text, providing context and hints for translators, doing loc prep and management work. All the stuff that is meant for the non-tech side of the localization equation. Things like string table editor, FText dropdowns, and localization dashboard lack some crucial features and UI/UX. And being a less fancy topic, localization stuff isn’t documented all that great.
35 What’s next? Improving Unreal
Like I said, Unreal localization system is great… and not so great at the same time. The background, under-the-hood part of it is amazing, and I have only gratitude for having that. It’s the loc management, text authoring, and loc preparation tools that are lacking. The non-tech side of localization.
- Don’t sort by GUID keys
- Source references for string table entries
- CSV string tables
- String table editor and context
- FText dropdown and context
- Spellchecker
- Jump to string table entry and back
- Move text to string tables