Unreal Engine POs: variables and formatting
I’m working on better support for Unreal Engine POs and part of it is describing the format. Among other specs, it includes variables and expressions typical for Unreal PO files. We’re working with Crowdin on converting this Unreal PO-specific syntax to ICU and back to take advantage of the existing linters and UI helpers. But this post is for translators that will have to deal with the variables as is.
Here’s the repo with my take on the Unreal PO format spec if you’re curious: https://github.com/xabk/unreal-po-spec.
And here’s a page on Text Formatting in Unreal Engine docs if you’re even more curious: https://docs.unrealengine.com/4.27/en-US/ProductionPipelines/Localization/Formatting/
A word of warning: I’ll only talk about defaults here. Developers can introduce any other types and formats of variables, and come up with their own expressions. So it’s always best to talk to the client to get a good understanding of what’s used in the project and how it all works.
Variables
Unreal uses {variables in curly braces}
.
- Variable names can be almost anything, they can even contain spaces, so you can ask your client to use descriptive variable names =)
- You can move them around the string.
- Variables can be part of inline plural and other expressions.
Square brackets aren’t used for variables.
Formatting Tags
Unreal Rich Text UI elements have XML-like tags with a few quirks.
Formatting is done using <angled brackets tags>
with </>
closing tags. Formatting tags usually don’t have attributes.
- There are no standard tags. Instead, developers create their own tables with tag names and corresponding styles. Important: different elements in the project can use different tables.
- Tags usually don’t have translatable attributes.
- Closing tags don’t have names.
- For some tags, attributes provide valuable context (e.g., image names).
- Unreal Rich Text doesn’t support nested tags at all. Tags inside a non-closed tag will be printed as is. E.g.,
white <blue>blue <green>green </>blue </>
will producewhite blue <green>green blue </>
(whereblue <green>green
will be blue, andwhite
andblue </>
at the end will be of the “default” color). Not what you’d expect from XML or HTML. So if you want the effect of a nested tag, you’d have to duplicate the blue tag:<blue>1</><green>2</><blue>3</>
.
Since rich text UI elements are isolated, it’s not a big deal to omit a closing tag. E.g., whatever <blue>thing
is a perfectly valid rich text value even though the tag isn’t closed. It’s sloppy and it can break things if you use that string as a variable in another string, but you can see it in a lot of projects. At the same time, it can be intentional, so check with the devs.
Technically, you can move tags around as much as you like and add new tags as long as you’re sure they exist in the style table (so either get a list from the devs, or limit yourself to the ones that exist in the source) and as long as you’re producing correct format that doesn’t break the parser. A good use case would be when you need to split a highlighted part for readability or grammar: feel free to duplicate the tags in this case, just follow the rules above.
In reality, UI might expect things to be in a particular order so it’s always best to talk to your client first.
Inline Images
Unreal Rich Text elements also support inline images out of the box. Images are added using empty tags: <img id="image-name"/>
.
Same here: technically, you can move them around as much as you want. In reality, UI might be not as flexible so it’s always best to make sure first.
Plurals: Inline, not PO
Unreal isn’t using PO features for plurals. Instead, it’s using its own flavor of ICU-like message formatting. It’s based on ICU under the hood but has its own syntax:
You have {x} {x}|plural(one=apple,other=apples)
The expressions itself doesn’t include the variable. I.e., {x}|plural{one=cat,other=cats}
will only print cat
or cats
depending on the number in {x}
. But not the number itself. If you want to have the number included, you have to specify it explicitly. Either outside of the expression, or inside the expression within the plural forms.
In languages with more complex plurals, like Russian, it’d be:
У вас {x} {x}|plural(one=яблоко,few=яблока,many=яблок,other=яблока)
To get an idea of what each keyword means in your language, you can use plurals rules from CLDR: https://unicode-org.github.io/cldr-staging/charts/37/supplemental/language_plural_rules.html.
Basic hints:
- Unreal only supports the standard keywords:
zero
,one
,two
,few
,many
, andother
. - It doesn’t support exact numbers: something like
=0
from ICU will break the parser and cause the string to be displayed as is. - Unreal supports variables inside plural forms:
You have {x}|plural(one={x} apple,other={x} apples)
. - It doesn’t support the ICU
#
shortcut for referencing the branching variable. You have to use the variable name as in the example above.
Advanced tricks:
- Usually, you can add a plural expression even if there is none in the source: the string is passed through the format node or function anyway, that not only replaces the variable with its value but also processes any of the supported expressions.
- This expression only works if the variable is a number. And that means it should be of a numeric type internally, so even if you see a number in the game it’s not a 100% guarantee that it’s passed as a number to the format node: it can be a string that contains a number. So it’s always best to ask the devs.
Gender Branching
Unreal offers gender branching. It depends on the developers passing gender information as a variable. If there’s no gender variable, this won’t work.
Possible values: masculine
, feminine
, neuter
. Values are position-based, neuter is often omitted but still can be used in translation.
Gender variable names can be anything. E.g., if the gender variable is {g}
then it looks like this:
{g}|gender(masculine form,feminine form,neuter form)
{g}|gender(Le guerrier est fort,La guerrière est forte)
Warning: this is not automatic at all, it fully depends on the correct gender information being passed at runtime. Even if you add and use the correct gender forms for your language it might never work because the underlying code just uses genders from some other language. E.g., this might work great for people because, for them, gender doesn’t change with translation, but this will require additional work on the dev and translator side if it’s used for things that change grammatical gender between languages.
More on genders for devs: /blog/gender-info-in-strings/
Quotation and Escaping
In Unreal Editor itself, forms for plurals and other expressions can be quoted at will and must be quoted if they contain a comma. Backslash is used to escape quotes and backslashes in a quoted string.
When the strings are exported to a PO, they’re quoted again and go through another round of escaping with backslashes.
Your CAT tool should deal with the extra quoting coming from the PO format, and you should see the string as they are in Unreal.
Examples
Plural forms quoted because they contain a comma, Unreal, PO:
You have {x} {x}|plural(one="big, big apple",other="big, big apples")
msgid "You have {x} {x}|plural(one=\"big, big apple\",other=\"big, big apples\")"
Plural forms quoted, internal quotes escaped, Unreal, PO:
You have {x} {x}|plural(one="big \"quoted\" apple",other="big \"quoted\" apples")
msgid "You have {x} {x}|plural(one=\"big \\\"quoted\\\" apple\",other=\"big \\\"quoted\\\" apples\")"
Hangul Postposition
{Arg}|hpp(은,는)
This is used for Korean to allow for better endings. Never seen this in real life, here’s some info in the Unreal Engine docs: https://docs.unrealengine.com/4.27/en-US/ProductionPipelines/Localization/Formatting/#hangulpost-positions
Inline Expressions Extensibility
Unreal allows developers to extend these expressions with new features.
- Add new keywords on top of plural, gender, etc., or change and extend existing expressions, e.g., add support for exact number matching to the plural expression. -
- Add new tags or extend functionality or format of existing tags for rich text (on top of
</> and images), etc. - - It’s all code so they can do anything, really. Like, use variables in square brackets.
All of this, though, is done by the devs on a specific project so they should be able to brief you on any non-standard stuff they have =)