Carve out a function to translate a Unicode code point to an 8bit codepage.
Provide a unit test for the new function.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
The current code uses 'u16_strlen(x) + 1) * sizeof(u16)' in various
places to calculate the number of bytes occupied by a u16 string.
Let's introduce a wrapper around this. This wrapper is used on following
patches
Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org>
Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
u16 version of strcmp(): u16_strncmp() works like u16_strcmp() but only
at most n characters (in u16) are compared.
This function will be used in my UEFI secure boot patch.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Correct the description of utf8_utf16_strnlen() and utf8_utf16_strlen() to
reflect that they return u16 count and not byte count.
For these functions and utf16_utf8_strnlen() describe the handling of
invalid code sequences.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Add u16_strcpy() and u16_strdup(). The latter function will be
used later in implementing efi HII database protocol.
Signed-off-by: Akashi Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Up to now the EFI_TEXT_INPUT_PROTOCOL only supported ASCII characters.
With the patch it can consume UTF-8 from the console.
Currently only the serial console and the console can deliver UTF-8.
Local consoles are restricted to ASCII.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
Provide functions for upper and lower case conversion.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
utf8_get() - get next UTF-8 code point from buffer
utf8_put() - write UTF-8 code point to buffer
utf8_utf16_strnlen() - length of a utf-8 string after conversion to utf-16
utf8_utf16_strncpy() - copy a utf-8 string to utf-16
utf16_get() - get next UTF-16 code point from buffer
utf16_put() - write UTF-16 code point to buffer
utf16_strnlen() - number of codes points in a utf-16 string
utf16_utf8_strnlen() - length of a utf-16 string after conversion to utf-8
utf16_utf8_strncpy() - copy a utf-16 string to utf-8
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
The function names utf16_strlen() and utf16_strnlen() are misnomers.
The functions do not count utf-16 characters but non-zero words.
So let's rename them to u16_strlen and u16_strnlen().
In utf16_dup() avoid assignment in if clause.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
As part of the main conversion a few files were missed. These files had
additional whitespace after the '*' and before the SPDX tag and my
previous regex was too strict. This time I did a grep for all SPDX tags
and then filtered out anything that matched the correct styles.
Fixes: 83d290c56f ("SPDX: Convert all of our single license tags to Linux Kernel style")
Reported-by: Heinrich Schuchardt <xypron.debian@gmx.de>
Signed-off-by: Tom Rini <trini@konsulko.com>
Provide a conversion function from utf8 to utf16.
Add missing #include <linux/types.h> in include/charset.h.
Remove superfluous #include <common.h> in lib/charset.c.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
The constant MAX_UTF8_PER_UTF16 is used to calculate
required memory when converting from UTF-16 to UTF-8.
If this constant is too big we waste memory.
A code point encoded by one UTF-16 symbol is converted to a
maximum of three UTF-8 symbols, e.g.
0xffff could be encoded as 0xef 0xbf 0xbf.
The first byte carries four bits, the second and third byte
carry six bits each.
A code point encoded by two UTF-16 symbols is converted to four
UTF-8 symbols.
So in this case we need a maximum of two UTF-8 symbols per
UTF-16 symbol.
As the overall maximum is three UTF-8 symobls per UTF-16 symbol
we need MAX_UTF8_PER_UTF16 = 3.
Fixes: 78178bb0c9 lib: add some utf16 handling helpers
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Alexander Graf <agraf@suse.de>