Str

Working with Unicode strings in Roc

Unicode can represent text values which span multiple languages, symbols, and emoji. Here are some valid Roc strings:

"Roc!"
"้น"
"๐Ÿ•Š"

Every Unicode string is a sequence of extended grapheme clusters. An extended grapheme cluster represents what a person reading a string might call a "character" - like "A" or "รถ" or "๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ". Because the term "character" means different things in different areas of programming, and "extended grapheme cluster" is a mouthful, in Roc we use the term "grapheme" as a shorthand for the more precise "extended grapheme cluster."

You can get the number of graphemes in a string by calling Str.countGraphemes on it:

Str.countGraphemes "Roc!"
Str.countGraphemes "ๆŠ˜ใ‚Š็ด™"
Str.countGraphemes "๐Ÿ•Š"

The countGraphemes function walks through the entire string to get its answer, so if you want to check whether a string is empty, you'll get much better performance by calling Str.isEmpty myStr instead of Str.countGraphemes myStr == 0.

Escape sequences

If you put a \ in a Roc string literal, it begins an escape sequence. An escape sequence is a convenient way to insert certain strings into other strings. For example, suppose you write this Roc string:

"I took the one less traveled by,\nAnd that has made all the difference."

The "\n" in the middle will insert a line break into this string. There are other ways of getting a line break in there, but "\n" is the most common.

Another way you could insert a newlines is by writing \u(0A) instead of \n. That would result in the same string, because the \u escape sequence inserts Unicode code points directly into the string. The Unicode code point 10 is a newline, and 10 is 0A in hexadecimal. \u escape sequences are always followed by a hexadecimal number inside ( and ) like this.

As another example, "R\u(6F)c" is the same string as "Roc", because "\u(6F)" corresponds to the Unicode code point for lowercase o. If you want to spice things up a bit, you can write "R\u(F6)c" as an alternative way to get the string `"Rรถc".

Roc strings also support these escape sequences:

You can also use escape sequences to insert named strings into other strings, like so:

name = "Lee"
city = "Roctown"
greeting = "Hello there, \(name)! Welcome to \(city)."

Here, greeting will become the string "Hello there, Lee! Welcome to Roctown.". This is known as string interpolation, and you can use it as many times as you like inside a string. The name between the parentheses must refer to a Str value that is currently in scope, and it must be a name - it can't be an arbitrary expression like a function call.

Utf8ByteProblem : [ InvalidStartByte, UnexpectedEndOfSequence, ExpectedContinuation, OverlongEncoding, CodepointTooLarge, EncodesSurrogateHalf ]

Utf8Problem

isEmpty : Str -> Bool

Returns Bool.true if the string is empty, and Bool.false otherwise.

expect Str.isEmpty "hi!" == Bool.false
expect Str.isEmpty "" == Bool.true

concat : Str, Str -> Str

Concatenates two strings together.

expect Str.concat "ab" "cd" == "abcd"
expect Str.concat "hello" "" == "hello"
expect Str.concat "" "" == ""

withCapacity : Nat -> Str

Returns a string of the specified capacity without any content.

joinWith : List Str, Str -> Str

Combines a List of strings into a single string, with a separator string in between each.

expect Str.joinWith ["one", "two", "three"] ", " == "one, two, three"
expect Str.joinWith ["1", "2", "3", "4"] "." == "1.2.3.4"

split : Str, Str -> List Str

Split a string around a separator.

Passing "" for the separator is not useful; it returns the original string wrapped in a List. To split a string into its individual graphemes, use Str.graphemes

expect Str.split "1,2,3" "," == ["1","2","3"]
expect Str.split "1,2,3" "" == ["1,2,3"]

repeat : Str, Nat -> Str

Repeats a string the given number of times.

expect Str.repeat "z" 3 == "zzz"
expect Str.repeat "na" 8 == "nananananananana"

Returns "" when given "" for the string or 0 for the count.

expect Str.repeat "" 10 == ""
expect Str.repeat "anything" 0 == ""

countGraphemes : Str -> Nat

Counts the number of extended grapheme clusters in the string.

Note that the number of extended grapheme clusters can be different from the number of visual glyphs rendered! Consider the following examples:

expect Str.countGraphemes "Roc" == 3
expect Str.countGraphemes "๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ"  == 4
expect Str.countGraphemes "๐Ÿ•Š"  == 1

Note that "๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ" takes up 4 graphemes (even though visually it appears as a single glyph) because under the hood it's represented using an emoji modifier sequence. In contrast, "๐Ÿ•Š" only takes up 1 grapheme because under the hood it's represented using a single Unicode code point.

graphemes : Str -> List Str

Split a string into its constituent grapheme clusters

startsWithScalar : Str, U32 -> Bool

If the string begins with a Unicode code point equal to the given U32, returns Bool.true. Otherwise returns Bool.false.

If the given string is empty, or if the given U32 is not a valid code point, returns Bool.false.

expect Str.startsWithScalar "้น means 'roc'" 40527 # "้น" is Unicode scalar 40527
expect !Str.startsWithScalar "9" 9 # the Unicode scalar for "9" is 57, not 9
expect !Str.startsWithScalar "" 40527

Performance Details

This runs slightly faster than Str.startsWith, so if you want to check whether a string begins with something that's representable in a single code point, you can use (for example) Str.startsWithScalar '้น' instead of Str.startsWith "้น". ('้น' evaluates to the U32 value 40527.) This will not work for graphemes which take up multiple code points, however; Str.startsWithScalar '๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ' would be a compiler error because ๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ takes up multiple code points and cannot be represented as a single U32. You'd need to use Str.startsWithScalar "๐Ÿ•Š" instead.

toScalars : Str -> List U32

Returns a List of the Unicode scalar values in the given string.

(Roc strings contain only scalar values, not surrogate code points, so this is equivalent to returning a list of the string's code points.)

expect Str.toScalars "Roc" == [82, 111, 99]
expect Str.toScalars "้น" == [40527]
expect Str.toScalars "เฎšเฎฟ" == [2970, 3007]
expect Str.toScalars "๐Ÿฆ" == [128038]
expect Str.toScalars "๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆโ€๐Ÿ‘ฆ" == [128105, 8205, 128105, 8205, 128102, 8205, 128102]
expect Str.toScalars "I โ™ฅ Roc" == [73, 32, 9829, 32, 82, 111, 99]
expect Str.toScalars "" == []

toUtf8 : Str -> List U8

Returns a List of the string's U8 UTF-8 code units. (To split the string into a List of smaller Str values instead of U8 values, see Str.split.)

expect Str.toUtf8 "Roc" == [82, 111, 99]
expect Str.toUtf8 "้น" == [233, 185, 143]
expect Str.toUtf8 "เฎšเฎฟ" == [224, 174, 154, 224, 174, 191]
expect Str.toUtf8 "๐Ÿฆ" == [240, 159, 144, 166]

fromUtf8 : List U8 -> Result Str [BadUtf8 Utf8ByteProblem Nat]

Converts a List of U8 UTF-8 code units to a string.

Returns Err if the given bytes are invalid UTF-8, and returns Ok "" when given [].

expect Str.fromUtf8 [82, 111, 99] == Ok "Roc"
expect Str.fromUtf8 [233, 185, 143] == Ok "้น"
expect Str.fromUtf8 [224, 174, 154, 224, 174, 191] == Ok "เฎšเฎฟ"
expect Str.fromUtf8 [240, 159, 144, 166] == Ok "๐Ÿฆ"
expect Str.fromUtf8 [] == Ok ""
expect Str.fromUtf8 [255] |> Result.isErr

fromUtf8Range : List U8, { start : Nat, count : Nat } -> Result Str [ BadUtf8 Utf8ByteProblem Nat, OutOfBounds ]

Encode part of a List of U8 UTF-8 code units into a Str

expect Str.fromUtf8Range [72, 105, 80, 103] { start : 0, count : 2 } == Ok "Hi"

startsWith : Str, Str -> Bool

Check if the given Str starts with a value.

expect Str.startsWith "ABC" "A" == Bool.true
expect Str.startsWith "ABC" "X" == Bool.false

endsWith : Str, Str -> Bool

Check if the given Str ends with a value.

expect Str.endsWith "ABC" "C" == Bool.true
expect Str.endsWith "ABC" "X" == Bool.false

trim : Str -> Str

Return the Str with all whitespace removed from both the beginning as well as the end.

expect Str.trim "   Hello      \n\n" == "Hello"

trimLeft : Str -> Str

Return the Str with all whitespace removed from the beginning.

expect Str.trimLeft "   Hello      \n\n" == "Hello      \n\n"

trimRight : Str -> Str

Return the Str with all whitespace removed from the end.

expect Str.trimRight "   Hello      \n\n" == "   Hello"

toDec : Str -> Result Dec [InvalidNumStr]

Encode a Str to a Dec. A Dec value is a 128-bit decimal fixed-point number.

expect Str.toDec "10" == Ok 10dec
expect Str.toDec "-0.25" == Ok -0.25dec
expect Str.toDec "not a number" == Err InvalidNumStr

toF64 : Str -> Result F64 [InvalidNumStr]

Encode a Str to a F64. A F64 value is a 64-bit floating-point number and can be specified with a f64 suffix.

expect Str.toF64 "0.10" == Ok 0.10f64
expect Str.toF64 "not a number" == Err InvalidNumStr

toF32 : Str -> Result F32 [InvalidNumStr]

Encode a Str to a F32.A F32 value is a 32-bit floating-point number and can be specified with a f32 suffix.

expect Str.toF32 "0.10" == Ok 0.10f32
expect Str.toF32 "not a number" == Err InvalidNumStr

toNat : Str -> Result Nat [InvalidNumStr]

Convert a Str to a Nat. If the given number doesn't fit in Nat, it will be truncated. Nat has a different maximum number depending on the system you're building for, so this may give a different answer on different systems.

For example, on a 32-bit system, Num.maxNat will return the same answer as Num.maxU32. This means that calling Str.toNat "9_000_000_000" on a 32-bit system will return Num.maxU32 instead of 9 billion, because 9 billion is larger than Num.maxU32 and will not fit in a Nat on a 32-bit system.

Calling Str.toNat "9_000_000_000" on a 64-bit system will return the Nat value of 9_000_000_000. This is because on a 64-bit system, Nat can hold up to Num.maxU64, and 9_000_000_000 is smaller than Num.maxU64.

expect Str.toNat "9_000_000_000" == Ok 9000000000
expect Str.toNat "not a number" == Err InvalidNumStr

toU128 : Str -> Result U128 [InvalidNumStr]

Encode a Str to an unsigned U128 integer. A U128 value can hold numbers from 0 to 340_282_366_920_938_463_463_374_607_431_768_211_455 (over 340 undecillion). It can be specified with a u128 suffix.

expect Str.toU128 "1500" == Ok 1500u128
expect Str.toU128 "0.1" == Err InvalidNumStr
expect Str.toU128 "-1" == Err InvalidNumStr
expect Str.toU128 "not a number" == Err InvalidNumStr

toI128 : Str -> Result I128 [InvalidNumStr]

Encode a Str to a signed I128 integer. A I128 value can hold numbers from -170_141_183_460_469_231_731_687_303_715_884_105_728 to 170_141_183_460_469_231_731_687_303_715_884_105_727. It can be specified with a i128 suffix.

expect Str.toI128 "1500" == Ok 1500i128
expect Str.toI128 "-1" == Ok -1i128
expect Str.toI128 "0.1" == Err InvalidNumStr
expect Str.toI128 "not a number" == Err InvalidNumStr

toU64 : Str -> Result U64 [InvalidNumStr]

Encode a Str to an unsigned U64 integer. A U64 value can hold numbers from 0 to 18_446_744_073_709_551_615 (over 18 quintillion). It can be specified with a u64 suffix.

expect Str.toU64 "1500" == Ok 1500u64
expect Str.toU64 "0.1" == Err InvalidNumStr
expect Str.toU64 "-1" == Err InvalidNumStr
expect Str.toU64 "not a number" == Err InvalidNumStr

toI64 : Str -> Result I64 [InvalidNumStr]

Encode a Str to a signed I64 integer. A I64 value can hold numbers from -9_223_372_036_854_775_808 to 9_223_372_036_854_775_807. It can be specified with a i64 suffix.

expect Str.toI64 "1500" == Ok 1500i64
expect Str.toI64 "-1" == Ok -1i64
expect Str.toI64 "0.1" == Err InvalidNumStr
expect Str.toI64 "not a number" == Err InvalidNumStr

toU32 : Str -> Result U32 [InvalidNumStr]

Encode a Str to an unsigned U32 integer. A U32 value can hold numbers from 0 to 4_294_967_295 (over 4 billion). It can be specified with a u32 suffix.

expect Str.toU32 "1500" == Ok 1500u32
expect Str.toU32 "0.1" == Err InvalidNumStr
expect Str.toU32 "-1" == Err InvalidNumStr
expect Str.toU32 "not a number" == Err InvalidNumStr

toI32 : Str -> Result I32 [InvalidNumStr]

Encode a Str to a signed I32 integer. A I32 value can hold numbers from -2_147_483_648 to 2_147_483_647. It can be specified with a i32 suffix.

expect Str.toI32 "1500" == Ok 1500i32
expect Str.toI32 "-1" == Ok -1i32
expect Str.toI32 "0.1" == Err InvalidNumStr
expect Str.toI32 "not a number" == Err InvalidNumStr

toU16 : Str -> Result U16 [InvalidNumStr]

Encode a Str to an unsigned U16 integer. A U16 value can hold numbers from 0 to 65_535. It can be specified with a u16 suffix.

expect Str.toU16 "1500" == Ok 1500u16
expect Str.toU16 "0.1" == Err InvalidNumStr
expect Str.toU16 "-1" == Err InvalidNumStr
expect Str.toU16 "not a number" == Err InvalidNumStr

toI16 : Str -> Result I16 [InvalidNumStr]

Encode a Str to a signed I16 integer. A I16 value can hold numbers from -32_768 to 32_767. It can be specified with a i16 suffix.

expect Str.toI16 "1500" == Ok 1500i16
expect Str.toI16 "-1" == Ok -1i16
expect Str.toI16 "0.1" == Err InvalidNumStr
expect Str.toI16 "not a number" == Err InvalidNumStr

toU8 : Str -> Result U8 [InvalidNumStr]

Encode a Str to an unsigned U8 integer. A U8 value can hold numbers from 0 to 255. It can be specified with a u8 suffix.

expect Str.toU8 "250" == Ok 250u8
expect Str.toU8 "-0.1" == Err InvalidNumStr
expect Str.toU8 "not a number" == Err InvalidNumStr
expect Str.toU8 "1500" == Err InvalidNumStr

toI8 : Str -> Result I8 [InvalidNumStr]

Encode a Str to a signed I8 integer. A I8 value can hold numbers from -128 to 127. It can be specified with a i8 suffix.

expect Str.toI8 "-15" == Ok -15i8
expect Str.toI8 "150.00" == Err InvalidNumStr
expect Str.toI8 "not a number" == Err InvalidNumStr

countUtf8Bytes : Str -> Nat

Gives the number of bytes in a Str value.

expect Str.countUtf8Bytes "Hello World" == 11

replaceEach : Str, Str, Str -> Result Str [NotFound]

Returns the given Str with each occurrence of a substring replaced. Returns Err NotFound if the substring is not found.

expect Str.replaceEach "foo/bar/baz" "/" "_" == Ok "foo_bar_baz"
expect Str.replaceEach "not here" "/" "_" == Err NotFound

replaceFirst : Str, Str, Str -> Result Str [NotFound]

Returns the given Str with the first occurrence of a substring replaced. Returns Err NotFound if the substring is not found.

expect Str.replaceFirst "foo/bar/baz" "/" "_" == Ok "foo_bar/baz"
expect Str.replaceFirst "no slashes here" "/" "_" == Err NotFound

replaceLast : Str, Str, Str -> Result Str [NotFound]

Returns the given Str with the last occurrence of a substring replaced. Returns Err NotFound if the substring is not found.

expect Str.replaceLast "foo/bar/baz" "/" "_" == Ok "foo/bar_baz"
expect Str.replaceLast "no slashes here" "/" "_" == Err NotFound

splitFirst : Str, Str -> Result { before : Str, after : Str } [NotFound]

Returns the given Str before the first occurrence of a delimiter, as well as the rest of the string after that occurrence. Returns [ Err NotFound] if the delimiter is not found.

expect Str.splitFirst "foo/bar/baz" "/" == Ok { before: "foo", after: "bar/baz" }
expect Str.splitFirst "no slashes here" "/" == Err NotFound

splitLast : Str, Str -> Result { before : Str, after : Str } [NotFound]

Returns the given Str before the last occurrence of a delimiter, as well as the rest of the string after that occurrence. Returns Err NotFound if the delimiter is not found.

expect Str.splitLast "foo/bar/baz" "/" == Ok { before: "foo/bar", after: "baz" }
expect Str.splitLast "no slashes here" "/" == Err NotFound

walkUtf8WithIndex : Str, state, (state, U8, Nat -> state) -> state

Walks over the UTF-8 bytes of the given Str and calls a function to update state for each byte. The index for that byte in the string is provided to the update function.

f : List U8, U8, Nat -> List U8
f = \state, byte, _ -> List.append state byte
expect Str.walkUtf8WithIndex "ABC" [] f == [65, 66, 67]

reserve : Str, Nat -> Str

Enlarge a string for at least the given number additional bytes.

releaseExcessCapacity : Str -> Str

Shrink the memory footprint of a str such that it's capacity and length are equal. Note: This will also convert seamless slices to regular lists.

appendScalar : Str, U32 -> Result Str [InvalidScalar]

Append a U32 scalar to the given string. If the given scalar is not a valid unicode value, it returns Err InvalidScalar.

expect Str.appendScalar "H" 105 == Ok "Hi"
expect Str.appendScalar "๐Ÿ˜ข" 0xabcdef == Err InvalidScalar

walkScalars : Str, state, (state, U32 -> state) -> state

Walks over the unicode U32 values for the given Str and calls a function to update state for each.

f : List U32, U32 -> List U32
f = \state, scalar -> List.append state scalar
expect Str.walkScalars "ABC" [] f == [65, 66, 67]

walkScalarsUntil : Str, state, (state, U32 -> [ Break state, Continue state ]) -> state

Walks over the unicode U32 values for the given Str and calls a function to update state for each.

f : List U32, U32 -> [Break (List U32), Continue (List U32)]
f = \state, scalar ->
    check = 66
    if scalar == check then
        Break [check]
    else
        Continue (List.append state scalar)
expect Str.walkScalarsUntil "ABC" [] f == [66]
expect Str.walkScalarsUntil "AxC" [] f == [65, 120, 67]

withPrefix : Str, Str -> Str

Adds a prefix to the given Str.

expect Str.withPrefix "Awesome" "Roc" == "RocAwesome"