Hallo, dies ist ein Test.
PWD: /www/data-lst1/unixsoft/unixsoft/kaempfer/.public_html
Running in File Mode
Relative path: ./../../../../../../usr/man/man7/iconv_unicode.7
Real path: /usr/share/man/man7/iconv_unicode.7
Zurück
'\" te .\" Copyright (c) 2014, 2021, Oracle and/or its affiliates. .TH iconv_unicode 7 "11 May 2021" "Oracle Solaris 11.4" "Standards, Environments, Macros, Character Sets, and miscellany" .SH NAME iconv_unicode \- codeset conversion for Unicode .SH DESCRIPTION .sp .LP The table below lists the names and descriptions of the supported Unicode encodings or encoding schemes (byte serializations of Unicode encoding forms) that can be used as \fIfromcode\fR or \fItocode\fR parameters to \fBiconv\fR(1), \fBiconv_open\fR(3C), and \fBcconv_open\fR(3C). There are also aliases such as FSS-UTF, UTF8, and so on. .sp .LP Available \fBiconv\fR and \fBcconv\fR conversions in the current system including aliases and optional variant levels can be obtained by running the \fBiconv -l\fR command as described in the \fBiconv\fR(1) manual page. .sp .LP For additional information on the mappings between canonical names and supported aliases with optional variant levels, refer to the \fBalias\fR(5) manual page and also the \fB/usr/lib/iconv/alias\fR file. .sp .TS tab( ) box; lw(0.92i) |lw(4.58i) lw(0.92i) |lw(4.58i) . Encoding Form Description _ UTF-8 T{ Multibyte sequences of 1-4 character bytes T} _ UTF-16 T{ Represented in 16-bit entity for \fBU+0000-\fR\fBU+D7FF\fR and \fBU+E000-U+FFFF\fR, and two 16-bit entities for \fBU+10000-U+10FFFF\fR. Is in the platforms default byte ordering and includes the Byte Order Mark (BOM). See below for a description on the BOM. T} _ UTF-16-INTERNAL UTF-16, without BOM _ UTF-16BE T{ UTF-16 in the big-endian byte ordering, without BOM T} _ UTF-16-BIG-ENDIAN T{ UTF-16 in the big-endian byte ordering, including BOM T} _ UTF-16LE T{ UTF-16 in the little-endian byte ordering, without BOM T} _ UTF-16-LITTLE-ENDIAN T{ UTF-16 in the little-endian byte ordering, including BOM T} _ UTF-16-SWAPPED T{ UTF-16 with endianness opposite to that of the local platform, without BOM T} _ UTF-32 T{ Represented in 32-bit entity in platforms default byte ordering and includes the BOM T} _ UTF-32-INTERNAL UTF-32, without BOM _ UTF-32BE T{ UTF-32 in the big-endian byte ordering, without BOM T} _ UTF-32-BIG-ENDIAN T{ UTF-32 in the big-endian byte ordering, including BOM T} _ UTF-32-SWAPPED T{ UTF-32 with endianness opposite to that of the local platform, without BOM T} _ UTF-32LE T{ UTF-32 in the little-endian byte ordering, without BOM T} _ UTF-32-LITTLE-ENDIAN T{ UTF-32 in the little-endian byte ordering, including BOM T} _ UCS-2 T{ Represented in 16-bit entity for \fBU+0000-U+D7FF\fR and \fBU+E000-U+FFFF\fR in the platforms default byte ordering and includes byte order mark (BOM) T} _ UCS-2-INTERNAL UCS-2, without BOM _ UCS-2BE T{ UCS-2 in the big-endian byte ordering, without BOM T} _ UCS-2-BIG-ENDIAN T{ UCS-2 in the big-endian byte ordering, including BOM T} _ UCS-2LE T{ UCS-2 in the little-endian byte ordering, without BOM T} _ UCS-2-LITTLE-ENDIAN T{ UCS-2 in the little-endian byte ordering, including BOM T} _ UCS-2-SWAPPED T{ UCS-2 with endianness opposite to that of the local platform, without BOM T} _ UCS-4 T{ Represented in 32-bit entity in the platforms default byte ordering and includes byte order mark (BOM) T} _ UCS-4-INTERNAL UCS-4, without BOM _ UCS-4BE T{ UCS-4 in the big-endian byte ordering, without BOM T} _ UCS-4-BIG-ENDIAN T{ UCS-4 in the big-endian byte ordering, including BOM T} _ UCS-4LE T{ UCS-4 in the little-endian byte ordering, without BOM T} _ UCS-4-LITTLE-ENDIAN T{ UCS-4 in the little-endian byte ordering, including BOM T} _ UCS-4-SWAPPED T{ UCS-4 with endianness opposite to that of the local platform, without BOM T} .TE .sp .sp .LP \fBUCS\fR, or Universal Character Set, refers to the ISO/IEC 10646 family of standards with character set identical to that of Unicode. .sp .LP Byte Order Mark, also known as \fBBOM\fR (U+FEFF), is a special character in the beginning of a file or character stream, denoting the byte order of the subsequent characters. UCS-2, UTF-16, UTF-32, and UCS-4 files and character streams usually start with a BOM character to indicate the byte ordering used in the file or character stream. .sp .LP UTF-8 to UTF-8 conversion simply moves bytes from input buffer to output buffer without doing any conversion. During the moves, illegal character checking is done to screen out any potentially harmful character bytes. Such illegal characters will cause the conversion to fail. .sp .LP UTF-7, a legacy 7-bit Unicode Transformation Format, is only supported by \fBiconv\fR conversions to and from UTF-8, UCS-2 and UCS-4. .sp .LP \fBUTF-EBCDIC\fR, a legacy EBCDIC-compatible variant of UTF-8, is only supported by \fBiconv\fR conversions to and from UTF-8. .SH NOTES .sp .LP \fBiconv\fR also supports conversion between Unicode encodings and many different codesets. The list of such codesets includes for example the ISO 8859 character sets, EBCDIC code pages, EUC (Extended UNIX Code) and ISO 2022 encodings for Chinese, Japanese, Korean, and many others (see \fBiconv_extra\fR(7), \fBiconv_ja\fR(7), \fBiconv_ko\fR(7), \fBiconv_zh\fR(7), \fBiconv_zh_HK\fR(7), and \fBiconv_zh_TW\fR(7)). .sp .LP If a source character code value cannot be mapped to a valid character in target codeset, it will be considered as an illegal or a non-identical character. In the absence of explicit information about the source character code value, \fBiconv\fR code conversions uses the following rules in determining whether a character is illegal or non-identical: .sp .LP If the source character code value is not within a range defined by the source codeset standard, it is considered as an illegal character. If the source character code value is within the range(s) defined by the standard, it will be considered as non-identical, even if the source character code value maps to an undefined or a reserved location within the valid range. The non-identical character will map to either \fB?\fR (\fB0x3f\fR in ASCII-compatible codesets) if the target codeset is a non-Unicode codeset or to Unicode replacement character (U+FFFD) if the target codeset is an Unicode codeset. .sp .LP When the BOM is present as the first character in the encoding that supports it, it will direct the way the following Unicode character sequences are interpreted. If the BOM is not the first character for such encodings or for Unicode encodings that do not support the BOM, the BOM character (\fBU+FEFF\fR) will be interpreted as Zero Width No-Break Space (\fBZWNBSP\fR) character and will not affect the way the Unicode characters are interpreted in terms of byte ordering. .sp .LP When the target codeset is one of UCS-2, UTF-16, UTF-32, UCS-4, UCS-2-BIG-ENDIAN, UCS-2-LITTLE-ENDIAN, UTF-16-BIG-ENDIAN, UTF-16-LITTLE-ENDIAN, UCS-4-BIG-ENDIAN, UCS-4-LITTLE-ENDIAN, UTF-32-BIG-ENDIAN, and UTF-32-LITTLE-ENDIAN, expect a BOM character in the beginning of the \fBiconv\fR code conversion output buffer. .sp .LP When the source codeset is UCS-2, UTF-16, UTF-32, or UCS-4 and there is no BOM presented as the first input character, the byte ordering of the current system is assumed on the input byte stream given to the \fBiconv\fR code conversion. .SH EXAMPLES .LP \fBExample 1\fR The \fBiconv\fR Library Module Filename .sp .LP In the conversion library, \fB/usr/lib/iconv\fR (see \fBiconv\fR(3C)), the library module filename is composed of two symbolic elements separated by the percent sign (\fB%\fR). The first symbol specifies the source codeset, i.e. the codeset that is being converted; the second symbol specifies the target codeset, i.e. the codeset to which the first one is being converted. .sp .LP For example, the library module filename to convert from the legacy UTF-7 codeset to the UTF-8 codeset is \fBUTF-7%UTF-8.so\fR. .LP \fBExample 2\fR The \fBcconv\fR Library Module Filename .sp .LP For some conversions, \fBiconv\fR(3C) makes a call to the \fBcconv\fR(3C) interfaces to perform the conversion. The \fBcconv\fR conversion modules are binary tables with \fB.bt\fR suffix generated by \fBgeniconvtbl\fR(1) and placed in the same \fB/usr/lib/iconv\fR library. The \fBcconv\fR library module filename is composed of the symbolic elements for source and target codeset separated by the plus sign (\fB+\fR). The \fBcconv\fR conversion is typically performed in two steps, with UTF-32 as the intermediate encoding. .sp .LP For example, the \fBcconv\fR library module filename to convert from the Japanese EUC codeset to the UTF-32 codeset is \fBeucJP+UTF-32.bt\fR. .SH FILES .sp .ne 2 .mk .na \fB\fB/usr/lib/iconv/*.so\fR\fR .ad .br .sp .6 .RS 4n \fBiconv\fR conversion modules .RE .sp .ne 2 .mk .na \fB\fB/usr/lib/iconv/*.bt\fR\fR .ad .br .sp .6 .RS 4n \fBcconv\fR code conversion binary tables for \fBiconv\fR(1), \fBcconv\fR(3C), and \fBiconv\fR(3C) .RE .sp .ne 2 .mk .na \fB\fB/usr/lib/iconv/geniconvtbl/binarytables/*.bt\fR\fR .ad .br .sp .6 .RS 4n \fBgeniconvtbl\fR conversion binary tables .RE .sp .ne 2 .mk .na \fB\fB/usr/lib/iconv/alias\fR\fR .ad .br .sp .6 .RS 4n Alias table file of codeset names .RE .SH SEE ALSO .sp .LP \fBgeniconvtbl\fR(1), \fBiconv\fR(1), \fBcconv\fR(3C), \fBcconv_close\fR(3C), \fBcconv_open\fR(3C), \fBcconvctl\fR(3C), \fBiconv\fR(3C), \fBiconv_close\fR(3C), \fBiconv_open\fR(3C), \fBiconvctl\fR(3C), \fBalias\fR(5), \fBgeniconvtbl-cconv\fR(5), \fBiconv_extra\fR(7), \fBiconv_ja\fR(7), \fBiconv_ko\fR(7), \fBiconv_zh\fR(7), \fBiconv_zh_HK\fR(7), \fBiconv_zh_TW\fR(7) .sp .LP The Unicode Consortium. The Unicode Standard, Version 6.2.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-07-8) .sp .LP Yergeau, F., UTF-8, a transformation format of Unicode and ISO 10646, RFC 2044, Alis Technologies, October 1996. .sp .LP Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo Institute of Technology, July 1995. .sp .LP Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel Almen Planlaegning, June 1992. .sp .LP Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation Format of Unicode, RFC 1642, Taligent, Inc., July 1994.