What is string in Golang
string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.
string is slice of byte, which byte equals to type uint8.
A byte can represent character based on UTF-8 encoding. Since 1 byte represent 8 bits,it is not possible to represent all UTF-8 encoded character only using byte, e.g. Chinese, emoji, etc., so there is another type called rune to handle special characters. rune equals to type int32.
From runtime/string.go we can see, Golang runtime defines string as a byte pointer, which is the pointer of the first element in the byte slice, and a int, the length of the string.
type stringStruct struct {
str unsafe.Pointer // underlying bytes
len int // number of bytes
}
// Variant with *byte pointer type for DWARF debugging.
type stringStructDWARF struct {
str *byte
len int
}String Handling
nil
Golang has its own way of handling null (empty values). For the String type, there is no nil; there is only the default value of "" (empty string).
str := "hello"
fmt.Println(str == nil)report error:
invalid operation: str == nil (mismatched types string and untyped nil)Although you can check for a null string using str == "", in certain situations, an empty string and a null string have their own meanings and are not equivalent. Therefore, *string (a pointer to a string) is used to handle cases where a null string is needed.
var strp *string
fmt.Println(strp == nil) // true
// fmt.Println(*strp)
// runtime error: invalid memory address or nil pointer dereference
var str = ""
strp = &str
fmt.Println(strp == nil) // false
fmt.Println(*strp) //len()
var str = "Hello\n"
fmt.Println(len(str)) // 6String Literals
There are two ways to express a multiline string: using "" or ``:
var str = "Hello1\n2World3\n4!"等於
var str = `Hello1
2World3
4!`String concatenation
operator +
also known as “concatenation operator”
s := "Hello" + "World!"
fmt.Println(s) // HelloWorld!fmt.Sprint, fmt.Sprintln, fmt.Sprintf
s := fmt.Sprint("HelloWorld!")
fmt.Println(s) // HelloWorld!
s = fmt.Sprintln("Bye", "World", "~")
fmt.Println(s) // ByeWorld~
s = fmt.Sprintf("%s", "NiceWorld!")
fmt.Println(s) // NiceWorld!Sprint can convert variables in different types into String:
sli := []int{1,2,3}
str := fmt.Sprint(sli)
fmt.Println(str) // [1 2 3]strings.Join()
strings.Join is implemented by strings.builder.
ss := []string{"Hello", "World", "~"}
s := strings.Join(ss, "")
fmt.Println(s) // HelloWorld~bytes.Buffer
var b bytes.Buffer
b.WriteString("Hello")
b.WriteString("World!")
fmt.Println(b.String()) // HelloWorld!Note
From Go String Concat Performance:
Avoid memory allocation as much as we can
Because strings are immutable, using + opeartor or fmt methods will cause memory allocation, especially for longer strings.
In contrast, the struct definition of a Buffer is:
type Builder struct {
addr *Builder
buf []byte
}The WriteXxx method of Buffer uses append to operate on b.buf, reducing memory allocation and improving performance. However, since operations are performed on the same memory location (slice), care must be taken to avoid overwriting when reusing the same Buffer.
String convertion
Using the strconv package
str := "1234"
v, _ := strconv.Atoi(str)
fmt.Printf("%T\n", v) // int
// Or, use "reflect", fmt.Println(reflect.TypeOf(v))
s := strconv.Itoa(v)
fmt.Printf("%T\n", s) // string
u, _ := strconv.ParseUint(str, 10, 32)
fmt.Printf("%T\n", u) // uint64Convert int slice into string
a := []int{1,2,3,4}
str := strings.Trim(strings.Replace(fmt.Sprint(a), " ", ",", -1), "[]")
fmt.Print(str) // 1,2,3,4Special case
Remove the last character from a string
ss := []string{"Hello", "World", "Peter", "Tom"}
var s string
for _, v := range ss{
s = s + v + ", "
}
// remove the ", "
// s == "Hello, World, Peter, Tom, "
s = s[:i] + strings.Replace(s[i:], ", ", "", 1)
fmt.Println(s) // Hello, World, Peter, TomIf you know exactly what the substring to be deleted is, you can use bytes.Buffer:
b.WriteString("Hello")
b.WriteString("World!")
b.Truncate(b.Len() - len("rld!"))
fmt.Println(b.String()) // HelloWoCreate a random string
letterRunes := []rune("3456789ABCEFGHJKLMNPQRSTXY")
func RandStringRunes(n int) string {
b := make([]rune, n)
for i := range b {
b[i] = letterRunes[rand.Intn(len(letterRunes))]
}
return string(b)
}Handle full space character
Using rune allows for handling more UTF-8 characters, including Chinese characters, emojis, and other full-width characters.
When representing Chinese characters using a byte slice, we will find that Golang uses more than one byte to represent a Chinese character, for example:
fmt.Println([]byte("Hello, 世界"))
// [72 101 108 108 111 44 32 228 184 150 231 149 140]
// 228 184 150 世
// 231 149 140 界For example, the validateComment(string) function will replace specific words in the comment with *, where the comment can contain full-width characters.
Using strings.Builder.WriteRune() allows us to directly write full-width characters.
(This example also demonstrates how to restore the string after using strings.ToLower().)
func validateComment(comment string) string {
lowerComment := sensitive.Filter.Replace(strings.ToLower(comment), '*')
var sb strings.Builder
var runeCount int
for _, runeValue := range lowerComment {
if runeValue != '*' {
_, err := sb.WriteRune([]rune(comment)[runeCount])
if err != nil {
log.Error(err)
}
} else {
_, err := sb.WriteRune(runeValue)
if err != nil {
log.Error(err)
}
}
runeCount++
}
defer sb.Reset()
return sb.String()
}
// sensitive package: https://github.com/importcjj/sensitive