Aarch64 Loop Task

Task Objective

In this task we were introduced to 64-bit architecture family. This task will provide the basics for understanding the instruction sets to create a program. With this task, we will be using the Aarch64 architecture to create a loop that runs 31 times. Each loop will have a value beside it that indicates the number of times the loop has ran. I will be going through each step on how we will reach the final solution provided above.

Display the word “Loop:” 30 times

The first goal of our group was to display the loop 31 times from 0-30. In the previous posts, we discussed that to loop through an application, we would need to create a label, run some code in the label, compare the accumulator with an immediate value or a value stored in a memory location (CoMPare accumulator), and use a branch instruction (bne (branch if not equal) or beq (branch if equal)). As you would have it, it is the exact same in Aarch64 architecture.

.text
.globl _start
_start:
 	mov	x25,0	/*used to store loop counter*/
	adr	x1,msg		/*used to print out msg*/
	mov	x2,len		/*store length of msg in bytes*/

loop:	
	mov	x0, 1           /* status -> 1 stdout */
	mov	x8, 64          /*write is syscall #64 */
	svc     0          	/* invoke syscall */		
	add	x25, x25, 1
	cmp	x25, 31
	b.ne	loop 	
	mov     x0, 0     	/* status -> 0 */
	mov     x8, 93    	/* exit is syscall #93 */
	svc     0          	/* invoke syscall */
.data
msg: 	.ascii      "Loop:\n"
len= 	. - msg

Registers in Aarch64

With the above code, we have successfully ran our loop 31 times from 0-30. The first 3 lines of the code is to let the compiler know that this is where our code starts. After that we define some registers to hold values, such as a loop counter to register x25, pointer to our message (x1), and length of our string (x2). The x post-fixed by a number represents the register. The x means that we are accessing a register that is 64-bit-wide. Later on in the post we will see other register numbers with different prefix. Important note, all registers have a number between 0 and 30, so if we have x20 and w20, they refer to the same register, just letting the compiler know that we are only accessing 64-bits or 32-bits of the register’s memory respectively.

Instructions in Aarch64

A couple of things I want to discuss about regarding Aarch64 and 6502 instruction sets. There are a couple of differences between the two architectures, such as the name of instruction (mov is similar to sta), but the biggest difference I want to mention is that Aarch64 has more freedom in how we store our immediate values into registers. In 6502 we were limited to storing accumulator, y, and x register into a memory location, while in Aarch64, we just specify a register we wish to use to store and the immediate/register (mov x20,10 will store the immediate value 10 into register 20). Also important to note that Aarch64 comes with syscall. This takes a syscall number (64 for write and 93 for exit) and stores it into register x8 (where syscall gets its arguments). After storing the sycall number into register x8, we invoke the svc 0.

Defining Strings

Line 18 to 20 demonstrates how to store a string into a directive. Similar to how we used dcb directive, here we are setting the label msg in ASCII characters (.ascii) with the string Loop:\n. At the top of the code, we store this into register x1.

The Next Step: Loop with index (single digit)

.text
.globl _start
_start:
 	mov	x25,0	/*used to store loop counter*/
	mov 	x26,0	/*used to store integer */
	mov	x22,0	/*used to store integer*/
	mov	x23,9	/*used to check if require 2 digits or just 1 digit*/
	adr	x1,msg		/*used to print out msg*/
	mov	x2,len		/*store length of msg in bytes*/

loop:	
	cmp	x25,x23
	br	display1Digit
next:	
	mov	x0, 1            /* status -> 1 stdout */
	mov	x8, 64          /*write is syscall #64 */
	svc     0          	/* invoke syscall */		
	add	x25, x25, 1
	add	x26, x26, 1	
	cmp	x25, 31
	b.ne	loop 	
	mov     x0, 0     	/* status -> 0 */
	mov     x8, 93    	/* exit is syscall #93 */
	svc     0          	/* invoke syscall */
 
display1Digit:
/*adding a space using ASCII*/
	add 	x27,x22,' '	
	adr	x20,msg+6
	strb	w27,[x20]
/*converting an integer to character*/
	add	x28,x26, '0'
	adr	x20, msg+7
	strb	w28,[x20]			
	bl	next

.data
msg: 	.ascii      "Loop: ##\n"
len= 	. - msg

Register Sizes

The display1Digit label converts values in register x22 and x26 into ASCII characters, stores them into register x27 and x28, and appends the character to the end of the string. The most important note here is the on line 34 and line 38 – strb w27,[x20] and strb w28,[x20]. strb is the instruction to store the first register into the register in bracket (store w28‘s address into x20). The value prefix with a w is a value that only takes up 32-bits of memory from a register. Since we do not need the full 64-bit register to store a single character, we use the prefix x to store our integer converted to character.

The Final Step: Loop with Index (single and double digits)

The last step is to find a way to store a double digit number. To do this we would need to divide the current double digit by 10 to get the quotient. Then we would need to find a way to get the remainder and somehow display both at the end of our loop string. Here’s how I did it.

.text
.globl _start
_start:
 	mov	x25,0	/*used to store loop counter*/
	mov 	x26,0	/*used to store integer */
	mov	x22,0	/*used to store integer*/
	mov	x23,9	/*used to check if require 2 digits or just 1 digit*/
	mov	x21,10	/*used to store divisor*/
	mov	x24,0	/*hold quotient number*/
	adr	x1,msg		/*used to print out msg*/
	mov	x2,len		/*store length of msg in bytes*/

loop:	
	cmp	x25,x23
	b.gt 	display2Digit
	b.eq	display1Digit
	b.lt	display1Digit

next:	
	mov	x0, 1           /* status -> 1 stdout */
	mov	x8, 64          /*write is syscall #64 */
	svc     0          	/* invoke syscall */		
	add	x25, x25, 1
	add	x26, x26, 1	
	cmp	x25, 31
	b.ne	loop 	
	mov     x0, 0     	/* status -> 0 */
	mov     x8, 93    	/* exit is syscall #93 */
	svc     0          	/* invoke syscall */
 
display1Digit:
/*adding a space using ASCII*/
	add 	x27,x22,' '	
	adr	x20,msg+6
	strb	w27,[x20]
/*converting an integer to character to stdout*/
	add	x28,x26, '0'
	adr	x20, msg+7
	strb	w28,[x20]			
	bl	next
display2Digit:
/*adding first digit*/
	udiv	x22,x26,x21
	add	x27,x22,'0'
	adr	x20,msg+6
	strb	w27,[x20]
/*adding second digit*/
	msub	x24,x22,x21,x26
	add 	x28,x24, '0'
	adr	x20, msg+7
	strb	w28,[x20]	
	bl	next
.data
msg: 	.ascii      "Loop: ##\n"
len= 	. - msg

Using Math to Obtain Double Digits

The 10 lines of code from line 41 to line 51 uses math to get the quotient and remainder to store in our register and display them at the end of our loop when dealing with numbers greater than 10.

Conclusion

In conclusion, this task has shown me that there are a lot of similarities between Aarch64 and 6502 architectures. The instructions may be named differently, there maybe more flexibility with 64-bit architectures such as b.lt (branch if less then), or that there are more registers in the 64-bit then the 6502 architecture, but the logic is the same.

Published by Danny Nguyen

I am a curious person. I find interest in all aspects of software development cycle, software stacks, and how the same software is used in different industries in different ways

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
<span>%d</span> bloggers like this: