Overview of localization in Unreal Engine 4

Unreal has a good localization system that works out of the box and offers a lot of nice features like live culture switching, asset and dialogue localization, PO export and import, integrations with a couple of service providers and means to add your own—and the source is there so you can build on top of it, or change something in how it works—provided you know some C++ black magic.

It also has a bunch of seemingly nice features that result in headaches later in development, like in-place authoring, automatic GUID-based keys for your strings plus some things that aren’t about localization but affect it nonetheless, like subgraphs being purely cosmetic and not resulting into proper context for your strings.

And then there’s a bunch of things that are just bad, like lack of usage references in string tables, lack of integration between string tables and design UI, the fact that asset string table are binary files, lack of good UI for CSV string tables, and asset string tables, really.

The worst part of it is that you don’t know any of this when you start working on your game. So you end up doing things that make it hard or impossible to localize parts of it later on, or doing things that cost you a lot of trouble later in development—or even both time, money, and headache.

When I first crossed into the game dev side of things, I knew next to nothing about Unreal Engine localization system, and I found the UE help lacking, and information on how things work in particular virtually non-existent. I also found no overviews of different approaches to text-authoring and localization setup, that would also describe some problems that come with each approach.

So here’s my take on it. Hope it helps someone who wants their game to be easier to localize or someone coming into game dev from localization.

Three ways to store text in Unreal

There are three ways to store your text in Unreal, and none of them is perfect. That’s a nice start, isn’t it? You’ll see why in a bit but first, here’s a list of what you can do with the standard localization system.

Author-at-source approach

First and foremost, you can adopt what Epic calls ‘author-at-source’ approach: you can just enter the text into an FText field directly, both in Slate UI or graphs, or use the LOCTEXT/NSLOCTEXT macro in code. That’s it: your text now sits in the asset or the source file itself, and it is gathered by the localization system, so it is localizable (unless you set it to culture-invariant or use a TEXT macro). It just works.

Simple, neh? Sure, but as with most things that ‘just work’, it can result in growth pains, especially if you’re using branches and your team gets bigger.

The good part of it is that when the text is gathered and exported, Unreal adds source reference to its context, which your translators will see when they work on the text. If it’s a source file, they’ll see its path and a line number. If it’s slate UI, they’ll see the path to the asset and path to the widget containing that text inside the asset. If it’s a graph or a function, they’ll the path to the blueprint and the name of the graph or function. Apparently, the usefulness of this depends on how you structure your files and name your assets and widgets in but if there is order and tidiness to it, then this kind of context can be of great help to translators.

The downsides are a few.

First and foremost, the actual text sits in the assets and files themselves so you’re violating the first rule of product localization: don’t mix text with code. The problem is that even a tiny typo fix now renders the file dirty, and it’s a huge issue for assets since they’re binary, and it’s impossible to merge them quickly. So if there’s a bunch of people working on things, and you’re using branches, you can’t just go around and change your text.

Now, there’s a cheat: you can actually ‘translate’ text from your native culture to your, ehm, native culture. So you can have the original text in your code and assets, and introduce fixes in this ‘translation’. It is then displayed to players, and it is also used as source for other languages. Seems to work but I think it’s messy. It seems okay for a typo but what if you have to change the working a bit? Meaning? I wouldn’t want to have actual differences between the original in the asset and what players see in game, it sounds like a recipe for disaster at some point in the future.

Second, you have no ways of performing batch operations on your strings or even working on them in bulk. Say, you want to proofread your text? Sure, go ahead. But how do you get it back into the game? There’s no way but to manually copy and paste it into each and every text field. What if you want to make your item description a bit more standardized? Sure, go ahead and open every blueprint and edit it separately. There’s a pattern here: it’s easy to create stuff on the fly, it’s hard to work on it as a whole later on.

Third, the context for strings in graphs isn’t all that helpful if a graph is big. That’s more of a project organization problem, of course, but it’s a disaster for localization because subgraphs that are often used to group things in a big graph are purely cosmetic, they don’t really separate things. So all strings in one graph get the same context, and they lose all spatial context they have in a graph: strings that are close to each other in a graph can be far apart in the file.

Finally, for strings created in UI, blueprints, and graphs, this approach defaults to empty namespaces and GUID-based keys for the text your create. It seems like an okay thing to do to save time but it’s a disaster for localization, and for you later on. Unreal sorts strings in the exported PO files based on namespace and key combination, and since your namespaces are empty by default and your keys are essentially random, it results in a randomized string order. There’s no context, things from vastly different parts of the game sit next to each other, and it’s a nightmare to translate. Source references can save the day, since you can sort the file by source reference: but if it’s a big graph, strings within this graph will be still randomized.

Of course, you can still use proper namespaces and keys, use smaller functions, function libraries and contained graphs—and by all means, you should!

It’s the default pattern I hate.

New developers often don’t know how bad the GUID keys are for localization, so they just leave them there. They don’t know how bad the huge graphs are for localization (and not just for localization). They don’t know they’ll have problems editing text later on if they just type it in place all the time.Later on, there’s usually no time to fix those things: you have thousand of words to translate, release is coming, there’s a lot of work besides localization, so it’s unfeasible to spend days opening each and every asset, going through each and every text field, and changing all the namespaces and keys to something meaningful… or moving them to string tables, which is our next specimen.

String tables assets