Back when I was a kid in the 80s with my Commodore VIC-20 and later C64, I had become a whiz
at writing code in BASIC, much of which I learned by typing in other BASIC programs
from books and magazines. But as I began to acquire more commercial software, when I would
attempt “list” the program, all I would see is a single line of BASIC with an SYS
command and some numbers. I was mystified at how this worked. It was the equivalent
of magic to me. Later on, I was told that these programs were written in machine language,
or as we later called ML. This wasn’t just a Commodore thing. in fact, with most 8-bit
systems, 99% of commercial software and games were written in machine language. This would
include your favorite games on the Atari 2600, the Nintendo Entertainment system, Apple II
series, and so forth.
So, machine language was not the exception, in fact it was the norm. And in this episode,
I hope to shed a little bit of light on what machine language actually is. I’m not going
to teach you how to code in machine language as that would probably take like 10 hours
and be pretty dry and boring. But I do hope to shed a little bit of light on on what machine
language actually is as compared to other programming languages. And before I tell you
what machine language is, I want to start be telling you what it isn’t.
I’ll start by showing you this little demonstration program I wrote in BASIC. I actually used
this in a previous episode as a benchmark, and I think it works well here too. This is
about as optimized as I can make it running here in BASIC on a 1 Mhz machine. Now, back
in the day we had this program for the Commodore 64 called Blitz! Basic. And I think it was
also known as Austrospeed.
At the time, us common folk were under the impression this would compile your BASIC program
into machine language. But, that’s not actually true. Don’t get me wrong, the software never
claimed to do this, rather it claimed to convert the code into something called P-Code. And
P-Code is just another interpreted language but it happens to be much more optimized for
speed. If you try to list a program after compiling it, it will just say Blitz! Like
this. So the code is no longer editable as BASIC, and I think that’s why the common
belief was that it was machine language.
But anyway, as you can see, if you compare the original program written in BASIC to the
one that was Blitzed, clearly the Blitz version runs much faster. It’s still nowhere nearly
as fast as machine language. With Blitz!, typical results can be anywhere from twice
as fast to maybe 8 times as fast depending on what instructions you are using.
So, how does this work? Well, for illustration, let’s take a look at this sample BASIC program,
and let’s see how Blitz would optimize it. Well, for starters, any REM statements can
be eliminated. I mean, these are only here as comments in the code, but they take up
space and slightly slow down execution. And since the program can’t be edited anymore,
might as well just delete it.
The next thing we can do is have a look at where all of the BASIC lines are actually
stored in RAM and make note of their actual memory addresses. And then we can look at
anything that points to a line number, such as a GOTO, GOSUB, or even a NEXT statement,
and figure out what it is they are pointing it. Normally, this is done during execution,
which takes time to look up the line number and find it’s memory address. So, to speed
things up, you can actually replace the line numbers with actual memory addresses, so no
lookups will be done during execution. And, of course, we no longer need the line numbers,
so they can actually be removed from the code completely.
OK, so here’s another thing that could use optimization. Numbers like this stored in
BASIC are actually stored as raw text characters, a string if you will. So, a number like 53280
is actually stored as 5 separate bytes. You can actually see this if you look at a BASIC
program in a machine language monitor. Well, the trouble is, these numbers have to be converted
from strings to numbers during execution. So again, we can go ahead and convert them
now, that way it saves time when the program is executing.
And so, these are the types of enhancements that Blitz or similar programs would do to
speed up your BASIC programs, but they are still essentially BASIC programs, they are
still being interpreted at time of execution, definitely not machine language. But there
isn’t necessarily anything wrong with this, if you were good at programming in BASIC,
then this certainly gave you an option for some extra speed.
So, then, what is machine language? Well, this is the language that is native to your
microprocessor. This is the only language it knows. And every single piece of software
you run on your computer, whether it is a vintage computer, or a modern computer, has
to be in machine language or your CPU won’t know how to execute it. Which means, if you
ware writing code in some other language it will need to be converted into machine language.
There are two ways this can happen.
Languages like BASIC, Java, Python, Ruby, and Perl are all interpreted languages. That
means they need a separate piece of software called an interpreter, or sometimes called
a run-time-environment, which will look at your code one instruction at a time, and then
convert that into machine language instructions for your CPU. This interpreter must always
be present for your code to work. Interpreted code also tends to be the slowest type of
code in terms of performance, but also tends to have the easiest learning curve.
On the flip side, languages like C++, Pascal, C#, Cobol, and Assembly language are run through
a compiler, which actually transforms the code into machine language permanently. Once
this process is finished, you no longer need the compiler for the program to run, although
you might need it if you ever want to make changes to the code. These types of programs
tend to have much better performance than interpreted languages, but as mentioned before
might have a bit higher learning curve.
Now, before you race to the comment section to tell me how I’m wrong about Java or C#,
please understand that I’m making a generalization here and I do understand it isn’t quite
this black and white. The point is, it has to be converted to machine language somewhere
along the way.
So, what does machine language look like? Well, at it’s core it’s just a bunch of
numbers. This small program is an example program that flashes the border on a Commodore
128. The first number here is a CPU command, or instruction, or Op-code, if you will. It
commands the CPU to increment a number somewhere in memory. And the next two numbers are the
memory address that needs to be incremented. The next number is another instruction, this
time telling the CPU to Jump, or goto some other place in memory to continue execution.
And of course, the two numbers following that instruction represent the memory address it’s
going to jump to.
This is actually how early coding was done on computers like the Altair with it’s panel
of switches on the front panel, or computers like the KIM-1.
Now, if that seems confusing to you and you’re thinking “how on earth would anyone write
complex programs in machine language?” Well, the reality is, nobody did. Or at least not
since the 1970s. And I gave you this example, not to scare you away, but rather to help
differentiate between machine language and assembly language. These two terms are often
used interchangeably, and while they are closely related, they’re not exactly the same thing.
It’s sort of like the difference between music, and musical score. They are very much
related, and there is a one-to-one ratio between the notes you hear and what is written on
the paper. But they aren’t exactly the same thing.
And since I can’t think of a really good way to explain it, I figure the best thing
to do is just show you. But first, I want to write a small program in BASIC. This program
also flashes the border. And I think this will help illustrate a few things. It’s
just going to count from 0 to 255 and then write that number to the screen border register.
And then we’re just going to repeat this indefinitely. Now, this looks like it’s
going pretty fast. In fact, the border is changing faster than the screen is drawn,
that’s why the border appears more than one color in places. So, that’s pretty fast
right?
Well, now let’s write the same program in assembly language using the built in monitor.
So, we’re going to start our program at memory location 1300 hex. We’re going to
increase the color value in the screen border. So, this D020 here is the exact same number
as this 53280 we used in BASIC. Just one is represented in decimal and the other one in
hexadecimal. Also, see these 3 numbers here, these are the actual machine code that is
being stored in RAM. The stuff to the right is just the human-readable form of that, which
we call assembly.
Anyway, our next instruction is JMP, which is similar to the GOTO statement in BASIC.
And we’re going to jump to 1300, which takes us right back here in an infinite loop. So,
this program is actually simpler and smaller than the BASIC code we wrote. OK, so let’s
exit the monitor back to BASIC and I’ll start the code by typing SYS, which in Commodore
BASIC calls a machine language program, and the address is 4864, which is the same as
1300 hex. And so, there it is. So, this is actually running so fast that the border is
changing every few pixels of drawing the screen, making the thing look like this weird pattern.
So, I’m going to give you one more example program here. Again, we’ll start in BASIC.
And what we’re going to do is count from 0 to 255 again, and we’re going to display
all 255 characters on the screen. Screen memory starts at memory address 1024. And that should
do it. And let’s see how long it takes to execute this code. OK, about 3 or 4 seconds.
So, that’s not too bad, right?
OK, so now lets do the same code in the monitor. This is just a few lines so I’ll walk you
through the code. Now, don’t worry if you don’t understand all of this, I think you’ll
get the gist of it. LDX means Load X with Zero. This just sets the value of the X register
to zero. Next, TXA means Transfer X to the accumulator. Now STA means to store the accumulator
at 400, which is the start of screen RAM in hexadecimal. The comma-X at the end means
we’ll add the value of X to the address before the save. INX means to increase X by
one. Then we’re going to compare X to zero. The reason we do zero is that if the value
is 255 and you increase by one, it will wrap around to zero again since this is an 8-bit
system. Now that we’ve compared, we’ll tell the CPU to branch if not equal back to
1302. If it is equal, then RTS will tell it to return back to BASIC.
So, there we go, pretty simple. So, let’s execute the code. And as you can see, it is
so fast that it’s actually finished before I even lift my finger off of the return key.
So, I’ll show you that again, one more time.
So now that you’ve seen assembly language, I’d love to show you this little scene from
the Terminator. What you are seeing here is assembly language, and more specifically 6502
assembly language from an Apple II. Not many people today would recognize this.
So, why is Assembly so much faster than BASIC? Well, as mentioned before, BASIC is interpreted.
So for every BASIC command you see here, hundreds of machine language instructions are happening
to carry it out. As such, a program in machine language will run at least a hundred times
faster than anything you can write in BASIC.
Now, it is at this point I think I should stop and mention to you that there is more
than one kind of machine language. Now, when you write code in like C or Python, there’s
a good chance that code is going to run on any computer with any CPU. But, machine language
is very dependent on the CPU that you are coding for. So, code written on a 6502, for
example, will not run on a Z80. They are very different languages. But for the purposes
of this video, I’m going to stick with the 6502 because I think it’s one of the simplest
to understand.
The 6502 CPU has 151 op-codes. But, it’s really not that many in practice. In fact,
you only need to remember 56 instructions. For example, if you are using the command
LDA, which means to load a number into the accumulator. You can tell it to load a specific
number like 42, or you can tell it to load the number from a specific memory address
like this. As you can see, looking over to the left here, the actual CPU op-codes are
different, but as a human, you don’t really need to know the difference as your assembler
will take care of figuring out which op-code to use.
Now, what I shown so far has been using a machine language monitor, which allows you
to look at and modify machine code. But you wouldn’t want to really write any sort of
code in this that was longer than just a few lines. Because you’d have to remember a
ton of memory addresses for every sub-routine. But the worst part is there is no easy way
to insert a line of code. Generally you can only overwrite a line.
That’s where using a compiler comes in handy. So, here’s some of my source code for my
recent game, Attack of the PETSCII Robots. As you can see, this is written in a text
editor, and looks far more friendly, with subroutines that have friendly names like
“search object” or “display player sprite.” And this is where an assembler makes life
so much easier. So, how does this work?
Well, here’s a very short little example program written in a notepad. Now, after compiling
the program I’m going to open it up in the machine language monitor and see what it looks
like. So, the first thing I want to mention is that the actual instructions are preserved
one to one. There’s absolutely no difference here. However, our friendly label such as
“screen RAM” has been converted to the actual memory address for us. Likewise, the
labels I used for the loop, have also been converted into the actual memory address,
which of course corresponds to the exact place in the code where it is supposed to point
to. And, of course, if you just wanted to see the resulting machine code itself, you
can look at that as well. And so I hope this better explains the difference between this,
which is machine code or machine language, and this, which is assembly language. They
are really just two different ways of looking at the same thing. And while back in the early
days, some coders did have to write software in machine language, the vast majority used
an assembler to write code that is more human-readable.
So, one thing you might be wondering about, is why everything in assembly language uses
hexadecimal? So, I want to explain a few things about it. For those that don’t know, it
is a numbering system that is based on 16 digits instead of 10, and it just adds A,B,C,D,E,
and F for the extra 6 digits. The reason this is handy is because when dealing with byte
sized numbers, a single hex digit can represent 4 bits of your byte, which is actually called
a nybble by the way. So, 9A is much easier to read than the decimal form of 154.
A possibly better example would be looking at memory addresses, which is something you
do a lot in assembly language. In decimal, the 64K memory range of a CPU would be represented
from 0 to 65535. But, in hex it is a much cleaner 0 to FFFF. But where things get really
interesting is when looking at a memory address in terms of bytes. So here’s the same address
in Hex and Decimal. You might recognize this as the border color address on a Commodore
64, which you can use a POKE statement to change the border color. Well, in assembly
language this address has to be represented by exactly two bytes. And in hexadecimal,
those two bytes are D0, and 20. Which, makes total sense. But in decimal, those two bytes
are 208 and 32. Which doesn’t appear to make any sense at all. In fact, you have to
do some math to actually see that this is right. You’ll need to multiply 208 by 256,
which gives you 53248, and then add in your 32 from the lower byte and then you get your
final number. So, as you can see, programers find it much easier just to deal with things
like this in hexadecimal.
And of course, if they use hexadecimal on Logopolis, that’s all the proof you need
that it’s the best system for computers!
9A, E7, E9, sorry, E9, 23, wait wait, did you say say E9? Look it says E7 there! You’re
right!
One last thing about hexadecimal I wanted to mention is how it is represented. I’ve
been showing it with a dollar sign in front of it like this, which is the normal way to
identify it in 6502 assembly language, PASCAL, Delphi, and Forth. But, another way to represent
it is to put a zero X in front of it like this, which is common in 8088 assembly language,
C++, C#, Java, Python, and many others. The numbers work exactly the same, it’s just
a different way of designating it as hex in the compiler.
Assembly language in and of itself is not that difficult. In fact, there are certain
things that are actually easier to do in assembly language such as graphics programming. But
there are some tradeoffs. And there are certain things that are kind of a pain. For example,
working with strings. That’s a nightmare to work with in assembly language. However,
I think the thing that often makes assembly language more difficult is the fact that you
have to know how the hardware works that you’re coding for.
With assembly language, and in the example of the 6502 processor, you have your 56 instructions,
and all of these more or less have to do with moving numbers around in memory. So, how do
you actually do things like modify things on the screen or make music, for example?
Well, every chip in the computer presents itself as memory of some sort to the CPU.
So, in the C64 for example, the video chip presents itself as 47 different memory addresses
starting from $D000 to $D02E. When it is not memory, we tend to call them registers. But
from the perspective of the CPU, it can’t tell the difference. So, when we write a number
to the border color register, for example, the CPU just thinks it is just updating a
number in memory, but what is really happening it’s updating a register in the video chip.
So, as a programmer, you’d have to make yourself familiar with all of the registers
in every chip on the computer and how they work.
And of course, when you move from one computer, like the C64 over to another computer such
as the Apple II, even though they use the same processor, everything else is completely
different. That’s why it can be incredibly difficult to port a game from one machine
to another.
I think that’s why people are often confused when they see that I’ve written these games
like Planet X3, or Attack of the Petscii Robots, and they want me to make a version for their
favorite computer. Now even going between systems that share the same CPU, it’s still
a pretty big undertaking.
In fact, there’s a small team of Apple II guys that have been working to convert Petscii
Robots over to the Apple II and as you can see, it is mostly working now. Keep in mind
the Apple II has the same CPU as Commodore systems, so a huge amount of the code is re-usable
between these systems. But it’s still a massive undertaking to re-write all of the
audio and video sub-routines.
But then when people suggest that I make a version of my game for system that uses a
different CPU, for example asking me to make an Amiga or ZX spectrum version or something
like that, at that point it becomes a complete rewrite.
So, it’s no small task. I mean, if it took me a year to write the game in the first place,
then it would take me another year to basically write the whole game again from scratch. And
I think this is where a lot of people are spoiled by modern programming languages because
if you write something in C or Python, fundamentally you can expect that same code is probably
going to run across multiple different CPUs on multiple different systems, on multiple
different memory configurations.
Machine language basically goes one direction. You can compile a program from a higher level
language into machine language. Or you can write in machine language to begin with, if
you want. But you can’t go back the other direction. Much like you can convert notes
on a sheet of music into a symphony. It’s much more difficult to convert the symphony
back into notes. Another example would be scrambling an egg. It’s easy to do it one
direction, but unscrambling is a different story.
Today, Assembly language is more or less a lost art. Modern compilers are pretty good
at creating optimized machine code from higher level languages. And computers are also just
so darned fast that any speed loss you might have is pretty much negligible these days.
I mean, even operating systems like the Linux kernel are written in C. So, assembly language
today doesn’t really offer that much benefit for the modern coder. But if you want to write
games for computers that were madden the 1970s or 80s, learning machine language is pretty
much a must. And if you do learn it, you’ll probably wind up finding out a whole lot more
information about how computers actually work.
OK, one last thing I want to mention, if you’re looking around at the new studio and you’re
thinking “well gee, David, this studio looks pretty boring compared to your old one! Well,
it’s actually not done yet. The problem is the old studio has already been dismantled
so I didn’t have anywhere to film, so it had to be done in here. But yeah, I’m going
to do a whole other follow up video in another couple of weeks here hopefully, showing the
finishing up of the interior of the studio and it will look hopefully quite a bit different.
So any way Thanks for watching
Comments
Post a Comment