Which assembler to use? This question gets answered in many different ways, for example:
"try them all and use the one you prefer"
“there are two assembler formats available for AVR. 1. The Atmel official syntax. 2. The syntax used by the gnu assembler avr-as, as part of winavr, and also the Atmel Studio C compiler.”
"Tutorials,classes, and books are likely to use Atmel syntax."
The gnu assembler produces relocatable object files (.o) that can be linked with other languages,but lacks absolute addresses, and are subject to being linked with startup code.
The the perennial fault when dealing with flash addressing. Atmel chose to use word addressing. while GCC being generic across many processors, always uses byte addressing (as the lowest common denominator).
The Atmel-syntax assemblers produce binary output that cannot be linked with other code (at least, not easily),but it IS easy to put pieces of code at desired absolute addresses (handy if you're writing a bootloader)
Different assemblers use different syntax as mentioned above. Here is the same command on three different assemblers.
1. rjmp PC -1 ….Avra
2. rjmp -1 ….Gavrasm
3. rjmp .-1 ….Gas
A very simple Arduino program to turn the light emitting diode (led) on.
Assemble with command: avra -l ledon ledon.S
.DEVICE ATMEGA328p LDI R17, 0b11111111 OUT 0x04, R17 OUT 0x05, R17 stop: rjmp stop
Assembler listing (ledon) which was produced with the -l ledon switch in the avra command above.
.DEVICE ATMEGA328p C:000000 ef1f LDI R17, 0b11111111 C:000001 b914 OUT 0x04, R17 C:000002 b915 OUT 0x05, R17 C:000003 cfff stop: rjmp stop Segment usage: Code : 4 words (8 bytes) Data : 0 bytes EEPROM : 0 bytes Assembly completed with no errors.
:020000020000FC :080000001FEF14B915B9FFCF81 :00000001FF
ledon.S.hex is the intel hex file which is a plain text file. It is passed to the programming software, avrdude. Avrdude processes the intelhex file, and sends the extracted binary codes to the arduino. The first line relates to extended addresses, which are not used here, the middle line contains the relevant information, the third line is the end of file marker.
Notice how the bytes are reversed in the assembler listing compared to the bytes here in the hex file. Address 0 has EF1F, the hex file has 1FEF.
Data record types:
00 A record containing data and the 2-byte address at which the data is to reside.
01 A termination record for a file of Hex-records. Only one termination record is allowed per file and it must be the last line of the file. There is no data field.
02 A segment base address record.
My present understanding - The compiler generates HEX file which is simply (without any touch, as pristine as it is generated) is transferred to the MCU's flash and it works there as desired.
No, that's wrong. The .hex file is a textual representation of the binary it represents, looking like:
:100000000C9461000C947E000C947E000C947E0095 :100010000C947E000C947E000C947E000C947E0068 :100020000C947E000C947E000C947E000C947E0058 :100030000C947E000C947E000C947E000C947E0048 :100040000C949A000C947E000C947E000C947E001C
Each line includes the delimiter, a length byte, a record type, an address word, and a checksum in addition to the data (which is in printable hex.) It is NOT AT ALL a “binary” file.
In the case of Arduino, the .hex file is converted to binary during the “upload” process, by the avrdude program used for uploading. Other programming schemes may upload the .hex format directly, and have it converted to binary by the bootloader in the destination processor.
AFAIK, Arduino doesn't use .BIN files anywhere. The compiler produces .o files, the linker combines those into .elf files, objcopy converts/extracts to .hex files, and avrdude uploads from the .hex file. The only reason to create a .bin file would be if you wanted to look at the instructions in binary/hex, all by themselves. All of the other formats include “extra” values that are not actually programmed into the AVR chip.
Convert back to code form with objdump:
avr-objdump -SC Blink.cpp.elf
It's best to use the .elf file, which contains type and symbol information that objdump uses. You can run objdump on a .hex or .bin file, but you have to tell it which sub-type of AVR your are using, and you won't get symbolic names.
(Something like avr-objdump -D -bbinary -mavr *.bin )
First off, understand that this .hex file is in Intel Hex32 format. This means it can support 32 bit wide address memory devices. But the format is broken up into an upper 16 bits and a lower 16 bits. The upper 16 bits are known as the extended address.
Every new line begins with a colon. Then the numbers and letters that follow are hexadecimal codes that are control characters, address locations or data bytes. Each line in the file has this format: :BBaaAATTDDCC
BB contains the number of data bytes on line.
aaAA is address in memory where the bytes will be stored. This number is actually doubled which I’ll cover in a bit. Also the lower byte is first aa followed by the upper byte AA.
TT is the data type. 00 - means program data. 01 - means End of File (EOF). 04 -means extended address. It indicates the data value is the upper 16 bits of the address.
DD is actual data bytes which contain the machine code that your program created. There can be numerous bytes in one line. The BB value indicates how many bytes are included in the line.
CC is calculated checksum value for error monitoring. It’s a 2s-complement calculation of: BB + AAAA + TT + DD.
So let’s look at the first line. :020000040000FA
02 indicates the number of bytes on the line. In this case there are two bytes.
0000 is the memory address to place the bytes but its value was multiplied by 2. So normally we would divide the address by 2 but in this case 0000 / 2 = 0000. So the address is 0000.
04 means this is the extended address. So the data contains the upper 16 bits of the address. Any lower bit addresses that follow this line will use these upper bits until a new 04 line changes the upper bits.
0000 this is the upper 16 bits of the address indicated by the 04 data type.
Anyway your fault is the perennial fault when dealing with flash addressing. Atmel choose to use word addressing while GCC (being generic across many processors) always uses byte addressing as the lowest common denominator.
While programmers always work in hex on this occasion think in decimal for a moment. Just consider what the number 0x3800 actually is. In decimal it is 14,336. Yet this is supposed to be an address near the “top” of a 32K (32,768) byte micro! 14K is a whole lot less than 32K so the number simply is not right for GCC. In fact to convert Atmel's 0x3800 number from word to byte you just double it to 0x7000. In decimal that is 28,672 which sounds a much more plausible number for 4K short of 32,768.
is this why intelhex address word output is double the requested address, ie .org 200 becomes address 400 in the intelhex file.
The is the second time to day I've explained this but:
: 10 0000 00 0C9434000C944F000C944F000C944F00 4F
which says write as follows:
0x0000: 0C 0x0001: 94 0x0002: 34 0x0003: 00 0x0004: 0C 0x0005: 94 0x0006: 4F 0x0007: 00 0x0008: 0C 0x0009: 94 0x000A: 4F 0x000B: 00 0x000C: 0C 0x000D: 94 0x000E: 4F 0x000F: 00
those bytes in flash will hold those values - without a doubt. However, when you apply power to an AVR it makes a 16bit opcode fetch which is little-endian. It therefore reads:
0x0000: 0C 0x0001: 94
as 0x940C and passes it to the instruction decoder which identifies it as a JMP opcode. This further triggers logic to make it make a second fetch as part of this same opcode and this time it picks up:
0x0002: 34 0x0003: 00
which again it reads as little endian and therefore treats as WORD address 0x0034 (which is BYTE address 0x0068). So after reading the first 4 bytes in two 16bit chunks it has determined it has “JMP 0x0068” and sets PC to be that address. (really 0x0034 in fact - it's just GCC that interprets things byte wise to be common across all architectures).
So the bytes are programmed in exactly the order they appear in the .hex at increasing byte addresses but when the AVR core makes fetches (two bytes, 16bits at a time) it treats them as little endian using the lower addressed of the two bytes as the lower part of each 16bit fetch. So it's not when programming hex that little endianness comes into play - it's when the data/code is later fetched back from the flash that the endianness issue applies - hence my reply above - sorry if it was confusing.
about AVRs is that the PC register contains just enough bits to address the flash of the device. In a 32K AVR like this there are 14 active bits in the PC register and they can address from 0x0000 to 0x3FFF (word) or 0x0000 to 0x7FFE (bytes). When the PC counter holds 0x3FFF (word) and execute the 0xFFFF (NOP) opcode from that locat ion it increments and wraps from 0x3FFF to 0x0000. Lo and behold! That just happens to be where prog.c is located so it now starts to execute the prog.c program just as if it had started opcode fetching at 0x0000 not 0x3800.
without the C runtime support (crt*.o), you’ll need to tell the linker that the entry point is inside your program instead of the C initialization routines. The command-line option -e for avr-ld is what you’re looking for: https://quanttype.net/posts/2014-01-27-avr-gcc-assembler.html
avr-gcc -c -mmcu=atmega328p -o foo.o foo.S avr-ld -o foo.elf foo.o avr-objcopy -O ihex foo.elf foo.hex avr-gcc -c -mmcu=atmega328p -o foo.o foo.S avr-ld -e init -o foo.elf foo.o avr-objcopy -O ihex foo.hex foo.elf