Chapter 9: Strings

大綱

Strings as collections

  • String是一種collections

let string = "Matt"
for char in string {
  print(char)
}

let stringLength = string.count
let fourthChar = string[3] // error: 'subscript' is unavailable: cannot subscript String with an Int, see the documentation comment for discussion

Grapheme clusters

  • There are two ways to represent some characters. One example is the é in café.

    • The single character to represent this is code point 233.

    • The two-character case is an e on its own followed by an acute accent combining character

    • The combination of these two characters in the second diagram forms what is known as a grapheme cluster defined by the Unicode standard

      • Grapheme clusters are represented by the Swift type Character.

Indexing strings

Equality with combining characters

  • String comparison in Swift uses a technique known as canonicalization

    • Before checking equality, Swift canonicalizes both strings, which means they’re converted to use the same special character representation.

    • café using the single é character and café using the e plus combining accent character had the same length.

Strings as bi-directional collections

  • 如何進行string的反轉

Substrings

  • The reason for this extra Substring type is a cunning optimization.

  • A Substring shares the storage with its parent String that it was sliced from. This means that when you’re in the process of slicing a string, you use no extra memory.

Encoding

UTF-8

UTF-16

Converting indexes between encoding views

Key points

  • Strings are collections of Character types.

  • A Character is grapheme cluster and is made up of one or more code points.

  • A combining character is a character that alters the previous character in some way.

  • You use special (non-integer) indexes to subscript into the string to a certain grapheme cluster.

  • Swift’s use of canonicalization ensures that the comparison of strings accounts for combining characters.

  • Slicing a string yields a substring with type Substring, which shares storage with its parent String.

  • You can convert from a Substring to a String by initializing a new String and passing the Substring.

  • Swift String has a view called unicodeScalars, which is itself a collection of the individual Unicode code points that make up the string.

  • There are multiple ways to encode a string. UTF-8 and UTF-16 are the most popular.

  • The individual parts of an encoding are called code units. UTF-8 uses 8-bit code units, and UTF-16 uses 16-bit code units.

  • Swift’s String has views called utf8 and utf16 that are collections which allow you to obtain the individual code units in the given encoding

Last updated