Playful Sparkle: Transliterate Library is a high-precision, extensible C# library targeting .NET Standard 2.0, purpose-built for deterministic Unicode transliteration and normalization. It enables structured transformation of input strings by decomposing composite characters, replacing multi-character graphemes and emoji sequences with pre-defined or user-defined ASCII-compatible mappings, and re-normalizing the result using standardized Unicode normalization forms (NFD, NFC, NFKD, NFKC).
The library is designed to process complex Unicode input in a consistent and idempotent manner. It supports:
- Full control over normalization granularity (canonical vs compatibility, composition vs decomposition),
- Multi-pass mapping resolution for both standard character sets and emoji code points,
- On-the-fly integration of user-defined mapping dictionaries,
- Surrogate pair decoding for code points beyond the Basic Multilingual Plane,
- Post-normalization filtering to remove diacritics and combining marks for ASCII-safe output.
Ideal for applications requiring language-agnostic preprocessing such as SEO sanitization, search indexing, canonical form comparison, filename/path generation, and legacy system compatibility.
- Full Unicode normalization support, including NFD (Decompose), NFC (Compose), NFKC (Compatibility Compose), and NFKD (Compatibility Decompose) normalization forms
- Transliteration of accented characters, diacritics, and complex Unicode sequences (e.g., ligatures, non-ASCII characters) to simplified ASCII representations
- Emoji sequence replacement based on pre-defined Unicode mappings for consistent transformation
- Efficient handling of multi-codepoint sequences, such as emojis and other combined graphemes, ensuring correct encoding and substitution
- Removal of non-spacing marks (diacritics) after normalization for producing clean, ASCII-compatible strings
- Support for customizable character-to-character mappings via user-defined dictionaries for project-specific transliteration needs
- Preprocessing of Unicode mapping dictionaries from
U+XXXX
notation, converting them into valid character sequences for processing - Cross-platform compatibility with all environments supporting .NET Standard 2.0, enabling integration into various .NET applications
- Includes validation for input text (null, empty, and invalid Unicode characters) to ensure robust processing.
- Supports validation of custom mapping entries to ensure keys and values adhere to defined length constraints.
- Uses internal dictionaries (
emojiUnicodeMappings
anddefaultUnicodeMappings
) for efficient storage and lookup of pre-defined transliteration rules.
The Decompose
method is used to transliterate and normalize an input string based on a specified Unicode normalization form. It processes the string by applying custom character mappings (if provided) and default mappings for complex characters (e.g., emoji and Unicode sequences). The method can be used to decompose composed characters into their base forms or apply other normalization forms like composition or compatibility normalization.
public static string Decompose(string text, Normalization normalization, bool useDefaultMapping = true, Dictionary<string, string> customMapping = null)
Parameters:
- text: The input string to be processed.
- normalization: The desired Unicode normalization form to apply. This can be one of the values from the Normalization enum:
- Normalization.Decompose: Decomposes composed characters into base characters and combining characters (NFD).
- Normalization.Compose: Combines characters into composed characters (NFC).
- Normalization.CompatibilityCompose: Applies compatibility decomposition followed by composition (NFKC).
- Normalization.CompatibilityDecompose: Applies compatibility decomposition followed by decomposition (NFKD).
- useDefaultMapping (default true): Indicates whether to use the default character mappings.
- customMapping (optional): A dictionary of custom character mappings to apply before the default mappings. Keys should be Unicode character sequences as strings, and values should be their replacement strings.
Returns:
The transliterated and normalized string.
Exceptions:
ArgumentException
: Thrown if the inputtext
is null or empty.ArgumentOutOfRangeException
: Thrown if the inputtext
contains invalid Unicode characters.
Example:
string input = "Some text with π and complex characters!";
string result = Transliterate.Decompose(input, Transliterate.Normalization.Decompose);
Console.WriteLine(result); // Result: Some text with slightly smiling face and complex characters!
The DecomposeAsync
method is an asynchronous version of the Decompose
method. It runs the transliteration and normalization process in a separate task to avoid blocking the calling thread, which is useful in scenarios where you need to process large strings or perform the operation without affecting the responsiveness of your application.
public static async Task<string> DecomposeAsync(string text, Normalization normalization, bool useDefaultMapping = true, Dictionary<string, string> customMapping = null)
Parameters:
- text: The input string to be processed.
- normalization: The desired Unicode normalization form to apply. This can be one of the values from the Normalization enum:
- Normalization.Decompose: Decomposes composed characters into base characters and combining characters (NFD).
- Normalization.Compose: Combines characters into composed characters (NFC).
- Normalization.CompatibilityCompose: Applies compatibility decomposition followed by composition (NFKC).
- Normalization.CompatibilityDecompose: Applies compatibility decomposition followed by decomposition (NFKD).
- useDefaultMapping (default true): Indicates whether to use the default character mappings.
- customMapping (optional): A dictionary of custom character mappings to apply before the default mappings. Keys should be Unicode character sequences as strings, and values should be their replacement strings.
Returns:
A Task<string>
representing the asynchronous operation, containing the transliterated and normalized string.
Exceptions:
ArgumentException
: Thrown if the inputtext
is null or empty.ArgumentOutOfRangeException
: Thrown if the inputtext
contains invalid Unicode characters.
Example:
string input = "Some text with π and complex characters!";
string result = await Transliterate.DecomposeAsync(input, Transliterate.Normalization.Decompose);
Console.WriteLine(result); // Result: Some text with slightly smiling face and complex characters!
- Custom mappings reprocessing: Custom mappings are reprocessed on each method call, which may impact performance when dealing with large input strings or multiple consecutive transliterations.
- Mappings Optimization: Reworked default and emoji mappings to use
int[]
arrays of Unicode code points as dictionary keys instead of string sequences (e.g.,"U+XXXX"
). This significantly improves the speed and efficiency of lookup operations during transliteration. - Decompose Performance: Refactored the
Decompose
method to use a new fast-path sequence matcher and a more efficient fallback for surrogate-aware single-character replacement. This results in considerably faster processing of text with complex mappings and Unicode sequences.
- Added 177 new character mappings.
- Improved Emoji Text Alternatives: To more accurately identify and display the correct text alternatives for emojis, we added an internal method
GetMaxKeyLength
to calculate the maximum key length for mappings, resolving potential issues with shorter, incorrect matches.
- New Option:
useDefaultMapping
: Added auseDefaultMapping
option (defaults totrue
) to enable or disable the built-in default mapping. - Internal Refinement: The
PrepareDictionary
method is nowinternal static
for improved code maintainability. - Terminology Update: Renamed "Smiley mapping" to "Emoji mapping."
- Terminology Update: Renamed "Mapping" to "Default mapping."
- Fixed:
IsValidUnicodeString
method has been replaced with a more comprehensive implementation that:- Properly validates surrogate pairs in Unicode strings
- Detects isolated high and low surrogates (e.g., character 0xD800)
- Identifies Unicode noncharacters (ranges 0xFDD0-0xFDEF and code points ending with FFFE/FFFF)
- Uses UTF-8 encoding validation as an additional verification layer
- Maintains performance while providing more accurate Unicode validation
- Fixed: The
IsValidUnicodeString
method now accurately validates user input for valid Unicode sequences by correctly identifying and rejecting strings containing lone high or low surrogate characters.
- Updated API documentation.
- Removed
PreprocessDictionary
for the user-defined character mapping. The user can now pass directly to theDecompose
method.
- Added async version of
Decompose
method for non-blocking operations. - Added check for user input to ensure valid input.
- Added custom mappings support for user-defined character mappings.
- Updated
Decompose
method to handle surrogate pairs and multi-codepoint sequences.
- Testing default mapping unicode notation to latin characters.
- Testing unicode notation to latin characters.
- Implemented core functionality for the Playful Sparkle: Transliterate Library extension for Visual Studio.
- Added Emoji replacement based on pre-defined Unicode mappings.
- Added default mappings for complex characters (e.g., ligatures, non-ASCII characters) to simplified ASCII representations.
- Added core files and project documentation for the Playful Sparkle: Transliterate Library extension for Visual Studio.
- Initial public release of the Playful Sparkle: Transliterate Library extension for Visual Studio.
For any inquiries, bug reports, or feature requests related to the Playful Sparkle: Transliterate Library extension, please feel free to utilize the following channels:
- GitHub Issues: For bug reports, feature suggestions, or technical discussions, please open a new issue on the GitHub repository. This allows for community visibility and tracking of reported issues.
- Email Support: For general questions or private inquiries, you can contact the developer directly via email at
support@playfulsparkle.com
. Please allow a reasonable timeframe for a response.
We encourage users to use the GitHub Issues page for bug reports and feature requests as it helps in better organization and tracking of the extension's development.
This extension is licensed under the BSD-3-Clause License. See the LICENSE
file for complete details.
Hi! We're the team behind Playful Sparkle, a creative agency from Slovakia. We got started way back in 2004 and have been having fun building digital solutions ever since. Whether it's crafting a brand, designing a website, developing an app, or anything in between, we're all about delivering great results with a smile. We hope you enjoy using our Visual Studio extension!