Swift's Pointy Bits: Unsafe Swift & Pointer Types

Swift offers remarkable performance while still providing safety through strong types, value semantics, and automatic memory management. For those times when you need to step outside those boundaries, however, Swift also offers tools to directly allocate and manipulate memory. This talk will explore the ins and outs of Swift’s take on pointers: typed and raw pointers and buffers, implicit bridging and casting, and some tips on how to stay safe while using unsafe APIs.


Introduction

I’ll talk with you today about Swift’s pointy bits. These are the unsafe pointer types in the Swift standard library. We’re gonna look at how the different pointer types work, why they work the way they do, and are designed the way they’re designed, and how we can use them in ways that are still safe.

What Makes Swift Safe Overall

Before we get to the pointer types themselves, and what makes them unsafe, I’d like to start out talking about what makes Swift safe overall. Swift is often described as a language that prioritizes safety when given the choice, but I’m not sure it’s always clear what safety means in Swift. When Swift first came out, I thought it sounded really great: Swift is safe. I can write a program that will never crash. This is amazing.

And in some ways, Swift delivers on that promise. Optionals are obviously a big part of that safety, since they make it really easy to work around things like null pointer issues.

As an example, I have an array of ages, that I’m going to use for some calculations, and we’ll see how well Swift’s safety protects me.


let ages = [13.3, 17.5, 18.9, 21.2]
let firstPlusOne = ages.first + 1
// Value of optional type 'Int?' not unwrapped

Here’s a case where optionals help prevent an error on my part. The first property of an array is an optional. So if I try just to add one to it, the compiler lets me know there’s a problem. Thanks, Swift compiler!

How do I fix this? Easy. Fix all in scope, and I’m done. Now that optional property gets unwrapped, and we get a value for first plus one. Awesome.

Does anyone see a problem here? No problems, right. The fixit Xcode gave me was to add a force unwrapping operator to first. So, if my array is empty, what happens? A crash.

This one’s my fault, right? We know a lot of ways of safely unwrapping optionals. We can use optional binding, we can use the nil-coalescing operator. We really can gracefully handle the absence of a value without crashing.

On the other hand, it’s really easy to cause a crash using the force unwrapping operator. Why is it even part of the language?


let ages: [Double] = []
let firstPlusOne = ages.first! + 1
// error: EXC_BAD_INSTRUCTION

Next up, my task is to compute the average of the values in my list. I’ve got this great functional way to compute an average. I don’t need a loop to add up the values in an array. I can just use reduce and then divide by the array’s count.


let ages: [Double] = []
let average = ages.reduce(0, +) /
    Double(ages.count)

This is what I love about Swift. It’s super clean, readable code, no optionals here to worry about.

Again, though, if my array is empty, I’m really dividing by zero. And what happens if we divide by zero? Our program crashes, even in Swift.


let ages: [Double] = []
let average = ages.reduce(0, +) / 0
// error: EXC_BAD_INSTRUCTION

One last example. I wanted to get the last age in my list. There are one, two, three, four, there, so I grab ages[4], and… wait a minute. Arrays are zero-indexed, right? So that’s an “off by one” error.


let ages = [13.3, 17.5, 18.9, 21.2]
let last = ages[4] // off by one
// error: EXC_BAD_INSTRUCTION

Crash. What’s going on here? I keep writing code that looks okay, I’m following the rules of Swift, but my program keeps crashing.

This is the language that’s supposed to be safe? How can a safe language be so easy to crash? Or if I want to put it another way, if safety in Swift doesn’t mean safety from crashing, what does it mean? What is Swift keeping us safe from?

Let’s go back one step, to the list of ages and my “off by one” error.

I definitely messed up here. There’s no element at index 4, and my program crashed because I tried to access that non-existent element. So what is Swift keeping us safe from? What could be worse than crashing?

Get more development news like this


let ages = [13.3, 17.5, 18.9, 21.2]
let last = ages[4] // off by one
// sure, fine, whatever

You know what’s worse than my program crashing? Is my program not crashing. If my program doesn’t crash, if it just keeps right on trucking, then who knows what data I’ll have in my last constant.

It could be something that makes sense in the simulator but goes haywire on a device. Or maybe it works when I compiled in debug mode, but not in release. Or maybe it doesn’t show up immediately, but causes problems later on in my program making it really hard to debug and track down where the problem really is.

Unexpected Behavior

What Swift is keeping us safe from is unexpected behavior. By making guarantees about things like type safety, the semantics of the values that we work with, the boundaries of collections and numeric types, and by providing automatic memory management, Swift is safe in ways that other languages can’t always promise.

There are still times, however, when we need to escape those boundaries. When we need more control or performance than Swift’s safety guarantees can provide. And that’s when we reach for unsafe APIs.

Swift’s Unsafe APIs

Swift does it’s best to make it very clear when we’re stepping outside those boundaries. That unsafe label on each of the pointer type names is deliberately a little intimidating. When you use Swift’s unsafe APIs, you give up some of the safety that the language provides, and take on ensuring those guarantees yourself. With pointers, you have direct access to read from and write to memory.

Memory Layout and Pointers in Swift

Now I’m going to give a very quick primer on how memory and memory layout, and pointers work in Swift, so that we’re on the same page later on when we’re working with actual Swift code.

First of all, let’s look at memory itself. When we run a program that’s written in Swift, a chunk of the computer’s memory is set aside for the types and functions that we’ve written in our program, and whenever we create a variable or constant, that variable has a value that’s stored using it’s binary presentation in memory. There’s our memory.

Binary, of course, is just ones and zeros, a computer’s memory is filled with billions of these. We normally group these into groups of eight called bytes that look like this, and if we write them using hexadecimal notation instead of binary, they look like this. If we collapse these a little bit, we get this nice compact view of the data in memory. Each of these rows represents eight bytes, or 64 bits, which we often call a word when we’re using a 64-bit processor, which is what’s running all of our devices here today.

Every position in memory has an address so that we can store and retrieve values in memory. We can add those on the left here. The addresses are also written using hexadecimal notation. And if you look closely, you’ll see that the address of each row is eight more than the previous row. Again, each row is eight bytes. So, we’re addressing the memory at the byte level. Even though we started by looking at bits, by looking at ones and zeros, as far as memory addresses are concerned, the byte is the smallest unit.

Any value that we create in the program needs to be stored in memory, which you can see here.


var age = 5

withUnsafePointer(to: &age) {
    // ...
}

age is an Int, which is a word-sized signed integer. So in 64-bit devices, this value is stored using all 64 bits of a word. In this table, the age variable uses one whole row of memory. When I use Swift’s withUnsafePointer function, I’m able to temporarily access a pointer to a value instead of the value itself.

Here I am passing my age variable using inout notation, which in this case is pretty close to the traditional C meaning of that ampersand. Inside the trailing closure that gets executed by withUnsafePointer(to:_:), I get a pointer argument, which is essentially a way of using the address of age instead of the value.


var age = 5

withUnsafePointer(to: &age) { agePointer in
    print(agePointer.pointee)
}
> 5

My agePointer argument here has the address of the age variable as its value. agePointer represents that position in memory, not the value itself. If I need to access the value that the pointer points to, I use the pointer’s pointee property.

To sum up, pointers are a way of using or manipulating the locations of values in memory, as opposed to the values themselves.

Unsafe Pointer Types

Which brings us at last to Swift’s pointer types. There are eight unsafe pointer types in the standard library. Four pointers, and four of what Swift calls buffer pointers.

Let’s start by looking at the four pointer types. Here they are. We have UnsafePointer<T>, UnsafeRawPointer, UnsafeMutablePointer<T>, and UnsafeMutableRawPointer. Each of these is essentially just a wrapper around a memory address. They’re just an unsigned integer under the hood. So why do we have four different types?

Even though these are unsafe APIs—they’ll let you skip pass the normal bounds checking, typed safety, and memory management—Swift still wants to help you do the right thing. So, it brings some of these checks right into the type system. From this perspective, we can look at the four types on two different axes.

Typed Pointer Versus Raw Pointer

First, we have “typed” versus “raw”. The two generic pointers on the left are typed pointers. That means that the memory addresses that these pointers represent hold a value of a specific type. If you have a pointer to an Int, accessing that pointer’s pointee gives you back an Int. Swift requires strict aliasing, meaning that you can’t safely access the same memory as two different types. Accessing the same memory as both the signed integer and as something else is undefined behavior. So typed pointers help prevent that kind of access. When you need them, there are methods on typed pointers for temporarily or permanently rebinding memory to a different underlying type.

A typed pointer knows the size and alignment of the type it’s referencing. So you don’t need to think about strides or alignments when you’re working with typed pointers. Looking back at the UnsafePointer to my age variable, because a typed pointer knows both the location and type, you can think of it as referencing that whole value highlighted there.

And when you advance a typed pointer through memory, for example, if you’re accessing the contiguous contents of an array, which we’ll look at later, a typed pointer steps one instance at a time. So you don’t accidentally land in the middle of an instance.

Raw pointers, on the other hand, don’t store any information about what type their underlying memory holds.

When you access the data that a raw pointer references, you do so either as raw bytes, or by specifically naming the type of the data that you want to load from that memory. If I were to get a raw pointer to the age property, instead of a typed pointer, it would be referencing the same memory location, but without any type information at all. It doesn’t know, or care, that there’s a 64-bit signed integer at that address. A raw pointer is just the address itself. You can access the memory at that location as just a byte, you can load a value of any type starting at that position, or you can convert a raw pointer to a typed pointer by binding the memory at that location.

Mutable Pointer Versus Immutable Pointer

The second access is mutability. Pointers are a literal embodiment of reference semantics, so even if we declare a pointer with let, we’d still technically be able to mutate the memory referenced by that pointer. Instead of controlling mutability at the instance level, like we do with value types, Swift gives us control over this at the type level, by only allowing changes to a value referenced by a mutable pointer.

Both typed and raw pointers have immutable versions, which allow only read-only access to the referenced memory, and mutable versions as well. You can initialize, de-initialize, assign to, or move from, contents of a mutable pointer. So, those are our four pointer types.

Buffer Pointers

What about buffer pointers? A buffer pointer is essentially a pointer coupled with a count. So, instead of storing one specific address, it describes a range of memory. There’s a buffer pointer to go with each pointer type. So, buffers can be typed or raw, and mutable or immutable.

What’s neat about this is that buffer pointers can act as collections. You can iterate over the contents of that region of memory, or perform other operations using most of the same tools that you use when you’re working with an array.

In this code, I’m passing age to the withUnsafeBytes function, which calls its closure with an UnsafeRawBuffer pointer that references the bytes of the variable you pass.


var age = 5

withUnsafeBytes(of: &age) { ageBytes in
    ageBytes.count // 8
    ageBytes.first // Optional(5)
    ageBytes[0]    // 5
}

I can use that buffer like a standard collection. Here, I’m getting the count and accessing the first value using it’s first property and the subscript.

Remember that these are the raw bytes of the value. That’s why the count is eight—an Int is eight bytes long, so the count of the unsafe bytes is eight—and that 5 that I’m retrieving is just the first byte of the age value.


var age = 2000

withUnsafeBytes(of: &age) { ageBytes in
    ageBytes.count // 8
    ageBytes.first // Optional(208)
    ageBytes[0]    // 208
}

If the value of age didn’t fit into a single byte—let’s say you’re recording the age of Gandalf, who lived to be 2,000 years old—the first byte of the memory for age would be d0 in hexadecimal, or 208. So, again, because we’re using a raw pointer here, we’re really reading this value at the byte level instead of the instance level.

Imported C APIs

Just when do we want to reach for these unsafe pointer types? For most developers, there will be two cases. First, these pointer types provide interoperability with a lot of C APIs, which are written to operate on typed or Void pointers. And second, there are certain kinds of optimizations that are only possible using unsafe pointer types. We’ll run through a couple of examples of this next.


func SKSearchFindMatches(
    _ inMaximumCount: CFIndex,
    _ outIDsArray: UnsafeMutablePointer<SKDocumentID>!,
    _ outFoundCount: UnsafeMutablePointer<CFIndex>!
) -> Bool

This is an abbreviated version of SKSearchFindMatches, a C function from the iOS 10 SearchKit framework. This is on the more complicated end of the spectrum, but still fairly typical for a C function that has to handle in and out parameters. It has three parameters, one of which is the input to the function, and two of which are the output.

The way this function works is that you call it repeatedly after you’ve initiated a search. Each time it fills the out parameters with a batch of results, and returns false when you’ve finally retrieved all the matches. The number you pass to inMaximumCount caps the number of results you get. This is really important because the first out parameter’s expecting a C array that has room to store the maximum number of document IDs. The second out parameter will hold the number of actual results that were returned.

The two out parameters are both typed mutable pointers with different pointee types. Here’s the great part: After all that preamble about pointer types, we don’t even need to touch unsafe pointers to call this function.

Swift can perform implicit conversions from a variable or array to a typed pointer or a raw pointer just by using inout syntax.


let limit = 100
var foundCount = 0 as CFIndex
var documentIDs = Array(repeating: 0 as SKDocumentID,
		count: limit)

_ = SKSearchFindMatches(CFIndex(limit),
        &documentIDs, &foundCount)

for i in 0..<Int(foundCount) {
	loadDocument(id: documentIDs[i])
}

Here we create an array of documentIDs, and a variable to hold the foundCount. Then, you can see that we passed those using inout syntax into the SKSearchFindMatches function. &documentIDs is converted to a pointer to the contents of the documentIDs array, while &foundCount is converted to a pointer to the foundCount variable.

It’s important to understand the limits of this implicit conversion. When we pass the documentIDs array using implicit pointer conversion, the only thing that’s passed is a pointer to the first element of that array’s contents. The function that receives this pointer doesn’t have any information about the size of the array, or any ability to change the array’s count. That’s why it’s important that we pass an array that has enough space to hold the number of elements that was passed as the maximum count.

Additionally, pointers that are created using this kind of implicit conversion are only valid for the lifetime of the function that’s called. Escaping one of these pointers, using it after the function completes execution, is undefined behavior.

After we’ve retrieved the documentIDs, we can loop over the array, up to the number of found documents, and call a handler to process each result.

So, this is working the way it’s supposed to, and it’s passing our tests, but after we do some performance testing, we identify this bit of code as one that we can make faster, and to make it faster, we’re going to explicitly use unsafe pointers.

There’s a trade-off here. We’re giving up some of the safety that the language provides for a little bit of speed. Please only do this kind of optimization if you’ve tested and are sure that a little bit of optimization will provide the benefit you need.

There are two places in just this little bit of code where there’s extra work going on to enforce Swift’s safety guarantees. First, let’s look at the way we’ve declared documentIDs. We’re using the array repeating count initializer. This initializer allocates the right amount of space for 100 documentIDs, and then initializes each of those entries to zero.

Normally, this is great. The array type provides safety by not letting us access an element unless it’s been properly initialized. In this case though, before we try to access any of these elements, we’re passing the array to SKSearchFindMatches, which will write its findings into the array. So, that initialization step is kind of unnecessary.

What about a different approach? Could we just create documentIDs as an empty array and reserve the right amount of capacity? Unfortunately no. While this seems like a solution, remember that the function that we’re passing documentIDs to only gets a single pointer. It can’t see the count, and it can’t change the count, of the array that we pass. So there’d be no way to access the elements that it puts into the array after it comes back. So that won’t work.

The second place where we’re doing extra work is down here, when we access the ith element of documentIDs. Every time you use a subscript in an array, it performs a bounds check to make sure you aren’t looking for an element outside the array’s defined boundaries.

Again, this is an important safety feature. But we’re trying to squeeze out every ounce of performance. As long as we’re careful to stay in the bounds of what we created while iterating, we really don’t need to perform a bounds check with every single access.

The solution for both of these is to convert documentIDs from an array to a typed mutable pointer. The same kind that our array is being implicitly converted to. When I call the static allocate method, space for the number of elements that I pass is allocated, and a pointer to the beginning of that block is returned.

And of course, we always have to remember to de-allocate any memory that we allocate. This is another kind of safety that we’re opting out of by switching from an array to an unsafe pointer. Swift makes it as easy as possible to do the right thing. Put your de-allocation in a defer block immediately after your allocation and you can’t go wrong.

The only other change we need to make is to take the ampersand off the parameter. Since we’re now passing an actual pointer instead of relying on implicit conversion, we don’t need that extra bit of syntax.

You can subscript a pointer the same way you subscript an array. So that part of our code is unaffected. This is our whole optimization. We’ve gotten rid of some unnecessary initialization and bounds checks, and we’ve taken on the responsibility for guaranteeing three things that Swift was doing for us before.

  • We need to make sure we’re initializing elements of documentIDs before we read them,
  • we need to remember to de-allocate any memory that we allocate,
  • and we need to stay within the boundaries of our allocation.

Next Up: New Features in Realm Obj-C & Swift

General link arrow white

Now for our final example. This is my favorite sorting algorithm, the bubble sort, partly because it has an adorable name, and partly because it fits on a single slide.


func bubbleSort<T: Comparable>(_ array: inout [T]) {
    guard !array.isEmpty else { return }

    for n in 1..<array.count {
        for i in 1...(array.count - n) {
            if array[i - 1] > array[i] {
                swap(&array[i - 1], &array[i])
            }
        }
    }
}

Seriously, if you try putting quick sort up there, it’ll be really tiny type. That wouldn’t work.

This bubbleSort takes an array of any Comparable elements and sorts them in ascending order. You’re tasked with speeding up the sorting routine, and we’ll imagine for a moment that you can’t make other obvious optimizations, like not using a bubble sort.

Swift’s array type has a group of methods that let you drop down from the array, which always performs bounds checking, to an unsafe buffer pointer over either the elements, or the raw bytes of the array’s elements, which only perform bounds checking in debug mode, and not in release mode.

We’re going to add a call to our array’s withUnsafeMutableBufferPointer method as a wrapper around basically the same code as we had before. Only the names have been updated, and we get a significant increase in our performance.


func bubbleSort<T: Comparable>(_ array: inout [T]) {
	guard !array.isEmpty else { return }

    array.withUnsafeMutableBufferPointer { buffer in
        for n in 1..<buffer.count {
            for i in 1...(buffer.count - n) {
                if buffer[i - 1] > buffer[i] {
                    swap(&buffer[i - 1], &buffer[i])
                }
            }
        }
    }
}

Notice that I get the benefit of using unsafe pointers, better performance through skipped bounds checks, while still having guarantees about the type of elements in the buffer, and this totally consistent array-like interface.

Now that we’ve covered the unsafe pointer types, and how the unsafe pointer types help us write safe code in their own way, let’s take a look at one easy way to misuse pointers that is really not safe at all.

Really Not Safe At All

So, the biggest and easiest way to get it wrong when working with Unsafe pointer types is to escape a pointer that you’ve gotten through one of the withUnsafePointer, or withUnsafeBytes functions, or through an implicit pointer conversion.

For example, take a look at this code.


var age = 2000
let agePointer = UnsafeMutablePointer(&age)
agePointer.pointee = 10
// age == 10

I’ve got my age variable again, and I’ve created a pointer to its value using the UnsafeMutablePointer initializer. Then I’m updating the values stored at the pointer’s address. Since that’s still the same memory used by my age variable, its value gets updated too.

Pretty straightforward. However, there’s a big problem here. Let me make it more explicit by writing this without the implicit conversion.


var age = 2000
let agePointer =
	    withUnsafeMutablePointer(to: &age) { p in
            return p
        }
agePointer.pointee = 10
// age == 10

Calling the UnsafeMutablePointer initializer using implicit pointer conversion, is essentially this: directly escaping the pointer back out of the withUnsafeMutablePointer closure. The compiler will have no problem with this whatsoever, but this is undefined behavior. Which means that even though it might work now, the compiler’s allowed to optimize away anything I do with that pointer after it escapes.

This can be tricky to recognize. There’s no explicit escape when I write it this way, and that inout notation, using an ampersand to pass a variable, is a standard way of calling functions. So the two rules to remember are, 1) never escape the pointer you get in a withUnsafe- function, and 2) never, ever get the pointer to a variable through implicit conversion.

Thank you all so much. I hope this helps the next time you need to reach for Swift’s Unsafe Pointer Types.

Q & A

Q: What is the difference between the method bindMemory and the assumingMemoryBound?

A: Okay, so, binding memory versus assuming memory bound. Unsafe raw pointers, even though they don’t know anything about the memory that they’re addressing, the memory itself can still be bound. So you can have a raw pointer to memory that’s bound to an integer type, or you can have raw memory, a raw pointer that’s bound to another type, like the instances of a class. When you want to convert that raw pointer to a typed pointer so that you can access instances more easily, what happens is that you can either bind the memory, which tells the compiler I’m binding this memory to this type, and I’m not going to change it unless I rebind it later on. That’s what the bind is for that returns an unsafe typed pointer.

Calling assumingMemoryBound bypasses a check. It doesn’t actually perform any binding on the compiler side, but it basically just assumes that it’s already been bound and that you have static knowledge that the memory is already bound as that type. If you do that but haven’t bound that memory, if you just allocate raw memory and then use that assumingMemoryBound(to:) method, then that ends up, I think becoming unsafe, undefined behavior, since you’re accessing memory as typed when it hasn’t actually been bound to that type. Hopefully that answers your question.

Q: I was hoping to clarify the very last example, you said that you shouldn’t escape that pointer, but in your case, you use it in the very next line in the same function block. Is that okay, or are you saying even that’s escaping?

A: Even that’s escaping. When you use implicit pointer conversion, when you’re using a variable that’s passed using an ampersand to something that’s expecting a mutable pointer, which is how I was able to call the UnsafePointer initializer, that is escaping. If you use it after the execution of that very thing that you pass it to. When you pass it to a C function, C functions generally don’t return… There’s no way to get it back out of a C function, but passing it to the initializer is one easy way to accidentally escape a pointer. Also, returning a pointer out of one of the withUnsafe things.

The tricky part is that it’ll work most of the time. That example actually works in a playground. You can run it, it updates the value of the age variable, but if you do it on a computed property, the compiler won’t complain at you, but it won’t work the way you’re expecting. There are all sorts of ways that it doesn’t work in weird cases, and then because it’s undefined behavior, the compiler’s allowed to optimize out anything you do later on with that pointer. If I ran it, if I compiled it with optimizations turned on, it could just delete that line, where I say hPointer.pointee = 10, and then my age value doesn’t actually update. Sure. Anyone else?

Alright, well thank you so much for your time.

Nate Cook

An independent web and application developer, he works on projects of all sizes, from websites and blogs for nonprofits to customized enterprise applications. He is also the managing editor of NSHipster, where he writes weekly about obscure topics in Objective-C, Swift, and Cocoa.