|

Beginners Guide to Solidity Yul Assembly | Storage

You don’t need to learn assembly to become a good Solidity Smart Contract developer, but you do need it to become a great one.

If you’re in the industry long enough, sooner rather than later, you will come across assembly code in a Smart Contract. If you’re completely clueless, you might be in trouble.

Solidity’s assembly language is called Yul, and it is incredibly powerful. The power comes from the fact that there are no safeguards in assembly. We bypass Solidity completely and have direct access to storage and memory, which can be used for some gas-saving benefits for example, or for some dangerous and unpredictable logic.

In the next few articles, I will teach you just enough Yul assembly for you to get quite comfortable with writing and reading assembly for those rare use cases.

How to use assembly

To use assembly, just insert the assembly block, inside your Solidity function.

function addOne(uint256 x) external view returns (uint256){
assembly {
...
}
}

Ok, but what do we write inside? First a small intro to Yul types.

Types in Yul

Yul only works with 32 bytes words; it doesn’t have the concepts of Solidity types like bool, uint16, uint32, string, etc.
However, it can read to and write from Solidity variables, for example:

assembly {
let a := 1234
let b := true

let b1 := false
let c := "hello world"
let d := 0xabcdef
}
Here I’m assigning Solidity literals to assembly variables. As far as Yul assembly is concerned, these are the values it is working with internally.
a = 0x0000000000000000000000000000000000000000000000000000000000001234
b = 0x0000000000000000000000000000000000000000000000000000000000000001
b1 = 0x0000000000000000000000000000000000000000000000000000000000000000
c = 0x68656c6c6f20776f726c64000000000000000000000000000000000000000000
d = 0x0000000000000000000000000000000000000000000000000000000000abcdef

These are all 32-byte words, but we see 64 characters above because each byte takes up two places.

Basic operations

The syntax of Yul is quite small, see the docs, and you’ll notice right away that everything is a function, even arithmetic and logical operations.

Let’s see a full example of checking if a number is a Primer number:

It is different from Solidity, but it is still readable. We can see the assembly-way of calculating the halfX value, we must use addition and division as functions.
Then we have a for cycle that looks quite ok, and readable enough.
Finally, we check if the modulo operation returns 0 with iszero. The iszero function checks if a value is 0, obviously, but this has a bigger utility in assembly than in normal languages.

Truthy values

Because there is no boolean type in Yul assembly, all operations return full 32-byte words, even the logical operations like and, or and not.
For example, the operation of checking if x is less than 2: lt(x, 2), this will return a full word with all 0s for false and a full word with 1 at the end if it’s true.
So with the iszero function from above, we actually check if something is false. To check if the condition above returns false, we check: isZero(lt(x, 2)).

This is very useful in combination with the if statement because if statements don’t have else in assembly. An example for checking max value:

The ‘not’ operator also exists in Yul, but it performs negation on the bit level (it flips all 0s to 1s and vice versa), which means that not(1) will still return true, because all bits except the first one will be ones.

Storage variables

Yul can only access local function variables directly by name, it cannot access the values of the storage variables directly. This is not a scoping issue but just how assembly works, it has a specific way of accessing storage.
To be able to get and set storage variables, we use three main functions:
slot - used for getting the location of a variable
sload - used for reading value from a given storage slot
sstore - used for writing value to given storage slot

This is what this looks like in Solidity:

In getXYul, we use x.slot to get the storage slot location and pass that into sload to read the value. We can do this with random slot and random variables.
But keep in mind, never let a function set an arbitrary slot with an arbitrary value. Imagine someone discovering the location of the owner valuable of your Smart Contract and being able to change it!!!

Accessing packed storage variables

The storage is split up into 32-byte words, which can contain one uint256 number or two uint128s or 16 uint8s. So if you have the following variables:

uint128 public C = 4;
uint8 public D = 6;
uint24 public E = 8;

All of their bytes together are 256, so all of these will be stored in one single 32-byte slot. This means that calling C.slot, D.slot and E.slot will all return the same value.

One new function that is super useful here is offset. As the name suggests, it returns the position (offset) of the variable from the beginning of the slot.
I can run the following function to get all 3 offsets:

function getOffsets() external pure returns (uint256 offsetC, uint256 offsetD, uint256 offsetE){
assembly {
offsetC := C.offset
offsetD := D.offset
offsetE := E.offset
}
}

I will get the response:

offsetC: 0
offsetD: 16
offsetE:
17

C is the first defined variable, so it’s first in the word with offset 0. D is second, and its offset is equal to the size of C, which is 128 / 8 = 16. The 128 is divided by 8 because the offset response is in bytes, while the 128 is in bits.
We must multiply by 8 later when we try to read and write.

How to read storage variables in Yul assembly

Let’s see how we can read and write the variable D in the middle.
This might seem too much work for something as simple as reading, but we won’t have to do this often.

The read works like this. First, we read the whole word, with C, D, and E values in the value fullWord.
We know D is in the middle, 128 bits from the start, looking from right to left. So in the second step, we will shift the whole word to the right until we bring D at the beginning. The right shift operation will simply push out the value C to the right and fill up zeros from the left.
Finally, we will read the value of D by performing an “and” operation with 0xff.
We will use the fact that and(XY, 0xff) = XY. This means that performing “and” between a value and 0xff (11111111), will help us extract the value. We must do this to extract only the D value from the word because E is still in there.
We use 0xff with two “f”s because that is the size of D, uint8. Uint8 means 8 bits, represented as 0xff, which is 11111111 in binary.

How to write Solidity variables in Yul assembly

The challenge with writing is that we need to preserve all other variables in the same word, C and E, and only update the value of D, in the middle of the word. For this, we use so-called bitmasks.
A bitmask is a 32-byte hexadecimal number that helps us perform binary operations on a full 32-byte word. The operations are usually logical operations like: “and”, “or”, and “xor”. Why are they useful?

  • and(0xab, 0x00) = 0x00 – for clearing bits
  • or(0xab, 0xFF) = 0xab – for setting bits
  • xor(0x0F, 0xFF) = 0xF0 – for flipping bits

But this doesn’t mean anything out of context, let’s see how we can use them for our variable writing problem, let’s examine the write function above.
The first step is to clear the old value of D like so:
let clearedD := and(c, 0xffffffffffffffffffffffffffffff00ffffffffffffffffffffffffffffffff)

Above we have a bitmask of all “f”s except two “0”s at the position of D, we know the position of D by its offset value, and this won’t change so it’s ok to be hardcoded.
With the “and” operation we are clearing the old value of D while keeping all the rest, C and E.
0xffffffffffffffffffffffffffffff00ffffffffffffffffffffffffffffffff
AND
0x0000000000000000000000000000080400000000000000000000000000000004
=
0x0000000000000000000000000000080000000000000000000000000000000004

The second step is to write the new value of D to that exact spot that we just cleared. If the newD value is 7, in the assembly world of 32-byte words that is represented like this:

0x0000000000000000000000000000080000000000000000000000000000000007
The seven is at the end of 64 digit long word, but we want to apply it in the middle of the word, at the position of D. That’s why we perform “shl” or shift left operation on this seven, thus moving it to the left all the way to the position of D:
let shiftedNewD := shl(mul(D.offset, 8), newD)
Again we must multiply the D offset by 8 to convert it from bytes to bits.

Finally, we perform logical “OR” to set the value of newD to the place of D:
clearedD => 0x0000000000000000000000000000080000000000000000000000000000000004
OR
newD => 0x0000000000000000000000000000000700000000000000000000000000000000
=
result => 0x0000000000000000000000000000080700000000000000000000000000000004

In the result we have the 8 that is C, the new D which is 7 and E in the end as 4.

Then all that’s left is storing the full word back in its slot with sstore.

Accessing storage mappings and arrays

We have no idea how much storage it would take to store a mapping or dynamic array, because they could have store an unlimited number of values. So what do we do?

Accessing fixed arrays

For fixed arrays, we actually know how the max length of the array so it’s easy. A slot in storage is picked for the fixed array, and then the next N slots are reserved for a fixed array of length N. This is how we access individual members of fixed array.

If we want to access the element on position 3, then we simply read the array slot (the starting point), and then add 3 to it. That is the slot for the third value, so we just read that slot.

Accessing dynamic arrays

For dynamic arrays, we don’t allocate slots for the array values because we have no idea how many there will be. So the idea is to move the starting location somewhere in storage space where it would be least likely to crash with another array or mapping.
First, we read the slot of the array in assembly. Then outside of assembly we calculate the keccak256 hash of the slot number, this gives us that place in storage that is away from everything else and has enough room to store all dynamic array values.

That is our starting point and the slot for index 0. If we want to access other values we simply add the index to the starting location.

It is important to mention that there is an actual value in the initial array.slot, and that value is the size of the dynamic array. This value is updated on every addition to the array.

Accessing mappings

Mappings add their own little twist to the story because their indexes can be arbitrary keys, whereas with arrays they’re always numbers.

It is similar logic to accessing dynamic arrays, we need to get a keccak256 hash of something in order to store the mapping in a slot that has enough room. This might seem a bit hacky or non-reliable but it is actually reliable because the normal variables’ slots start from 1, whereas a keccak256 hash would put an array or mapping in the multiple trillions numbers.
But back to our mapping, here we need to hash the key of the mapping along with the original mapping slot. This will make even seemingly close members of the mapping stored really far away from each other, but that is by design. Because under a key of a mapping we can nest a whole other mapping or dynamic array or something else huge, so this logic works for all types of mappings.

Summary

I will wrap up things here for this first article on the introduction to Yul assembly.
To summarize, we learned how to perform basic operations by calculating the Prime number and then spending more time reading and writing storage variables.
In the end, we discovered how mappings and arrays are stored in the storage and how they can be accessed.
In the next one, we’ll focus on working with memory.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *