FString::Contains / ::Find can’t handle non-ASCII/Unicode characters
TL;DR
FString::Contains
, FString::Find
, and probably other search/replace-related functions’ case-insensitive mode doesn’t work on Unicode characters. Use FText::ToUpper
or FText::ToLower
on both texts before conversion to fix. I figured it out as I fixed a bug in Satisfactory.
Unreal and text
Unreal has three types to store text values: FName
, FString
, and FText
. The shortest possible advice would be: Always use FText
for text. That’s it, and it’s good advice.
Unfortunately, Unreal doesn’t provide all the basic functionality in FText
so sometimes, it feels like you have to convert it to FString
and use its functions instead.
FString vs FText
FString
seems to be designed to hold basic Latin text, ASCII range, that doesn’t get translated. And it’s not gathered for localization. That said, it can hold Unicode chars just fine but its text manipulation functionality isn’t Unicode- or culture-aware.
FString::ToUpper
and FString::ToLower
only change characters in the ASCII range and leave anything else untouched. So if you have a Unicode character, its case won’t change, it’ll just stay as is. See FString #L504 and FChar #L79 in the Unreal Engine repo on GitHub.
FText
, on the other hand, is designed to store localizable Unicode text. FText::ToUpper
and FText::ToLower
functions work well with Unicode chars and are culture-aware (e.g., they’ll capitalize i
as İ
if the game’s running in the Turkish locale, and as I
if it’s in the English locale).
However, FText
doesn’t provide any search function. Which is kind of a basic thing, if you ask me.
Can I just convert and use FString::Contains
?
So you either have to write your own function for this, or just use FString::Contains
, which also appears to have a neat case-insensitive
mode.
It’s a trap, though. This is how the check if text contains words you type function roughly looked like before the fix. And that didn’t work for any text that contained a non-basic character:
The thing with case-insensitivity
in FString::Contains
is that it’s just uppercasing both strings before running the search. And that doesn’t work for the non-ASCII chars. See FString::Find #L424 and KismetStringLibrary #L371, KismetStringLibrary #L366 in the Unreal Engine repo on GitHub.
Fix: Use FText::ToUpper
(or FText::ToLower
) before you convert to FString
for search
It’s that simple: just use FText::ToUpper
or FText::ToLower
before you do the conversion. This is how the updated function roughly looks like (I simplified it for readability):
And you might want to go for a case-sensitive
search (or tick the UseCase
checkbox in Blueprints) just to avoid the unnecessary internal capitalization in FString::Contains
.
The same goes for when you have an FString
with Unicode chars and you want to perform some case-insensitive operation: you need to convert it to FText
, apply FText::ToUpper
or FText::ToLower
, then convert it back to FString
and use FString::Contains
or FString::Find
. Double conversion sucks but I don’t know any other way to do this. Also, remember to use the same case-conversion functions for both strings/texts that you use in your search or comparison.
Replace for FText can’t be fixed that easily
I’m afraid this trick won’t help you get case-insensitive replacement functionality. I’m not sure if there’s a simple way to fix this, apart from writing proper Unicode- and locale-aware search and replace functions for FText
.
Though you really shouldn’t replace stuff in your localizable FTexts
, just saying =) Use FText::Format
if you need to inject things, that one tracks history and keeps live culture switching working. If you need to add other variable formats, consider extending FText::Format
instead of writing your own replacement function.