Search This Blog

This Simple Exercise Will Improve Your Eye Vision"

How Machine Language Works

Get link
Facebook
X
Pinterest
Email
Other Apps

3/22/2023 01:21:00 PM

Back when I was a kid in the 80s with my Commodore VIC-20 and later C64, I had become a whiz

at writing code in BASIC, much of which I learned by typing in other BASIC programs

from books and magazines. But as I began to acquire more commercial software, when I would

attempt “list” the program, all I would see is a single line of BASIC with an SYS

command and some numbers. I was mystified at how this worked. It was the equivalent

of magic to me. Later on, I was told that these programs were written in machine language,

or as we later called ML. This wasn’t just a Commodore thing. in fact, with most 8-bit

systems, 99% of commercial software and games were written in machine language. This would

include your favorite games on the Atari 2600, the Nintendo Entertainment system, Apple II

series, and so forth.

So, machine language was not the exception, in fact it was the norm. And in this episode,

I hope to shed a little bit of light on what machine language actually is. I’m not going

to teach you how to code in machine language as that would probably take like 10 hours

and be pretty dry and boring. But I do hope to shed a little bit of light on on what machine

language actually is as compared to other programming languages. And before I tell you

what machine language is, I want to start be telling you what it isn’t.

I’ll start by showing you this little demonstration program I wrote in BASIC. I actually used

this in a previous episode as a benchmark, and I think it works well here too. This is

about as optimized as I can make it running here in BASIC on a 1 Mhz machine. Now, back

in the day we had this program for the Commodore 64 called Blitz! Basic. And I think it was

also known as Austrospeed.

At the time, us common folk were under the impression this would compile your BASIC program

into machine language. But, that’s not actually true. Don’t get me wrong, the software never

claimed to do this, rather it claimed to convert the code into something called P-Code. And

P-Code is just another interpreted language but it happens to be much more optimized for

speed. If you try to list a program after compiling it, it will just say Blitz! Like

this. So the code is no longer editable as BASIC, and I think that’s why the common

belief was that it was machine language.

But anyway, as you can see, if you compare the original program written in BASIC to the

one that was Blitzed, clearly the Blitz version runs much faster. It’s still nowhere nearly

as fast as machine language. With Blitz!, typical results can be anywhere from twice

as fast to maybe 8 times as fast depending on what instructions you are using.

So, how does this work? Well, for illustration, let’s take a look at this sample BASIC program,

and let’s see how Blitz would optimize it. Well, for starters, any REM statements can

be eliminated. I mean, these are only here as comments in the code, but they take up

space and slightly slow down execution. And since the program can’t be edited anymore,

might as well just delete it.

The next thing we can do is have a look at where all of the BASIC lines are actually

stored in RAM and make note of their actual memory addresses. And then we can look at

anything that points to a line number, such as a GOTO, GOSUB, or even a NEXT statement,

and figure out what it is they are pointing it. Normally, this is done during execution,

which takes time to look up the line number and find it’s memory address. So, to speed

things up, you can actually replace the line numbers with actual memory addresses, so no

lookups will be done during execution. And, of course, we no longer need the line numbers,

so they can actually be removed from the code completely.

OK, so here’s another thing that could use optimization. Numbers like this stored in

BASIC are actually stored as raw text characters, a string if you will. So, a number like 53280

is actually stored as 5 separate bytes. You can actually see this if you look at a BASIC

program in a machine language monitor. Well, the trouble is, these numbers have to be converted

from strings to numbers during execution. So again, we can go ahead and convert them

now, that way it saves time when the program is executing.

And so, these are the types of enhancements that Blitz or similar programs would do to

speed up your BASIC programs, but they are still essentially BASIC programs, they are

still being interpreted at time of execution, definitely not machine language. But there

isn’t necessarily anything wrong with this, if you were good at programming in BASIC,

then this certainly gave you an option for some extra speed.

So, then, what is machine language? Well, this is the language that is native to your

microprocessor. This is the only language it knows. And every single piece of software

you run on your computer, whether it is a vintage computer, or a modern computer, has

to be in machine language or your CPU won’t know how to execute it. Which means, if you

ware writing code in some other language it will need to be converted into machine language.

There are two ways this can happen.

Languages like BASIC, Java, Python, Ruby, and Perl are all interpreted languages. That

means they need a separate piece of software called an interpreter, or sometimes called

a run-time-environment, which will look at your code one instruction at a time, and then

convert that into machine language instructions for your CPU. This interpreter must always

be present for your code to work. Interpreted code also tends to be the slowest type of

code in terms of performance, but also tends to have the easiest learning curve.

On the flip side, languages like C++, Pascal, C#, Cobol, and Assembly language are run through

a compiler, which actually transforms the code into machine language permanently. Once

this process is finished, you no longer need the compiler for the program to run, although

you might need it if you ever want to make changes to the code. These types of programs

tend to have much better performance than interpreted languages, but as mentioned before

might have a bit higher learning curve.

Now, before you race to the comment section to tell me how I’m wrong about Java or C#,

please understand that I’m making a generalization here and I do understand it isn’t quite

this black and white. The point is, it has to be converted to machine language somewhere

along the way.

So, what does machine language look like? Well, at it’s core it’s just a bunch of

numbers. This small program is an example program that flashes the border on a Commodore

128. The first number here is a CPU command, or instruction, or Op-code, if you will. It

commands the CPU to increment a number somewhere in memory. And the next two numbers are the

memory address that needs to be incremented. The next number is another instruction, this

time telling the CPU to Jump, or goto some other place in memory to continue execution.

And of course, the two numbers following that instruction represent the memory address it’s

going to jump to.

This is actually how early coding was done on computers like the Altair with it’s panel

of switches on the front panel, or computers like the KIM-1.

Now, if that seems confusing to you and you’re thinking “how on earth would anyone write

complex programs in machine language?” Well, the reality is, nobody did. Or at least not

since the 1970s. And I gave you this example, not to scare you away, but rather to help

differentiate between machine language and assembly language. These two terms are often

used interchangeably, and while they are closely related, they’re not exactly the same thing.

It’s sort of like the difference between music, and musical score. They are very much

related, and there is a one-to-one ratio between the notes you hear and what is written on

the paper. But they aren’t exactly the same thing.

And since I can’t think of a really good way to explain it, I figure the best thing

to do is just show you. But first, I want to write a small program in BASIC. This program

also flashes the border. And I think this will help illustrate a few things. It’s

just going to count from 0 to 255 and then write that number to the screen border register.

And then we’re just going to repeat this indefinitely. Now, this looks like it’s

going pretty fast. In fact, the border is changing faster than the screen is drawn,

that’s why the border appears more than one color in places. So, that’s pretty fast

right?

Well, now let’s write the same program in assembly language using the built in monitor.

So, we’re going to start our program at memory location 1300 hex. We’re going to

increase the color value in the screen border. So, this D020 here is the exact same number

as this 53280 we used in BASIC. Just one is represented in decimal and the other one in

hexadecimal. Also, see these 3 numbers here, these are the actual machine code that is

being stored in RAM. The stuff to the right is just the human-readable form of that, which

we call assembly.

Anyway, our next instruction is JMP, which is similar to the GOTO statement in BASIC.

And we’re going to jump to 1300, which takes us right back here in an infinite loop. So,

this program is actually simpler and smaller than the BASIC code we wrote. OK, so let’s

exit the monitor back to BASIC and I’ll start the code by typing SYS, which in Commodore

BASIC calls a machine language program, and the address is 4864, which is the same as

1300 hex. And so, there it is. So, this is actually running so fast that the border is

changing every few pixels of drawing the screen, making the thing look like this weird pattern.

So, I’m going to give you one more example program here. Again, we’ll start in BASIC.

And what we’re going to do is count from 0 to 255 again, and we’re going to display

all 255 characters on the screen. Screen memory starts at memory address 1024. And that should

do it. And let’s see how long it takes to execute this code. OK, about 3 or 4 seconds.

So, that’s not too bad, right?

OK, so now lets do the same code in the monitor. This is just a few lines so I’ll walk you

through the code. Now, don’t worry if you don’t understand all of this, I think you’ll

get the gist of it. LDX means Load X with Zero. This just sets the value of the X register

to zero. Next, TXA means Transfer X to the accumulator. Now STA means to store the accumulator

at 400, which is the start of screen RAM in hexadecimal. The comma-X at the end means

we’ll add the value of X to the address before the save. INX means to increase X by

one. Then we’re going to compare X to zero. The reason we do zero is that if the value

is 255 and you increase by one, it will wrap around to zero again since this is an 8-bit

system. Now that we’ve compared, we’ll tell the CPU to branch if not equal back to

1302. If it is equal, then RTS will tell it to return back to BASIC.

So, there we go, pretty simple. So, let’s execute the code. And as you can see, it is

so fast that it’s actually finished before I even lift my finger off of the return key.

So, I’ll show you that again, one more time.

So now that you’ve seen assembly language, I’d love to show you this little scene from

the Terminator. What you are seeing here is assembly language, and more specifically 6502

assembly language from an Apple II. Not many people today would recognize this.

So, why is Assembly so much faster than BASIC? Well, as mentioned before, BASIC is interpreted.

So for every BASIC command you see here, hundreds of machine language instructions are happening

to carry it out. As such, a program in machine language will run at least a hundred times

faster than anything you can write in BASIC.

Now, it is at this point I think I should stop and mention to you that there is more

than one kind of machine language. Now, when you write code in like C or Python, there’s

a good chance that code is going to run on any computer with any CPU. But, machine language

is very dependent on the CPU that you are coding for. So, code written on a 6502, for

example, will not run on a Z80. They are very different languages. But for the purposes

of this video, I’m going to stick with the 6502 because I think it’s one of the simplest

to understand.

The 6502 CPU has 151 op-codes. But, it’s really not that many in practice. In fact,

you only need to remember 56 instructions. For example, if you are using the command

LDA, which means to load a number into the accumulator. You can tell it to load a specific

number like 42, or you can tell it to load the number from a specific memory address

like this. As you can see, looking over to the left here, the actual CPU op-codes are

different, but as a human, you don’t really need to know the difference as your assembler

will take care of figuring out which op-code to use.

Now, what I shown so far has been using a machine language monitor, which allows you

to look at and modify machine code. But you wouldn’t want to really write any sort of

code in this that was longer than just a few lines. Because you’d have to remember a

ton of memory addresses for every sub-routine. But the worst part is there is no easy way

to insert a line of code. Generally you can only overwrite a line.

That’s where using a compiler comes in handy. So, here’s some of my source code for my

recent game, Attack of the PETSCII Robots. As you can see, this is written in a text

editor, and looks far more friendly, with subroutines that have friendly names like

“search object” or “display player sprite.” And this is where an assembler makes life

so much easier. So, how does this work?

Well, here’s a very short little example program written in a notepad. Now, after compiling

the program I’m going to open it up in the machine language monitor and see what it looks

like. So, the first thing I want to mention is that the actual instructions are preserved

one to one. There’s absolutely no difference here. However, our friendly label such as

“screen RAM” has been converted to the actual memory address for us. Likewise, the

labels I used for the loop, have also been converted into the actual memory address,

which of course corresponds to the exact place in the code where it is supposed to point

to. And, of course, if you just wanted to see the resulting machine code itself, you

can look at that as well. And so I hope this better explains the difference between this,

which is machine code or machine language, and this, which is assembly language. They

are really just two different ways of looking at the same thing. And while back in the early

days, some coders did have to write software in machine language, the vast majority used

an assembler to write code that is more human-readable.

So, one thing you might be wondering about, is why everything in assembly language uses

hexadecimal? So, I want to explain a few things about it. For those that don’t know, it

is a numbering system that is based on 16 digits instead of 10, and it just adds A,B,C,D,E,

and F for the extra 6 digits. The reason this is handy is because when dealing with byte

sized numbers, a single hex digit can represent 4 bits of your byte, which is actually called

a nybble by the way. So, 9A is much easier to read than the decimal form of 154.

A possibly better example would be looking at memory addresses, which is something you

do a lot in assembly language. In decimal, the 64K memory range of a CPU would be represented

from 0 to 65535. But, in hex it is a much cleaner 0 to FFFF. But where things get really

interesting is when looking at a memory address in terms of bytes. So here’s the same address

in Hex and Decimal. You might recognize this as the border color address on a Commodore

64, which you can use a POKE statement to change the border color. Well, in assembly

language this address has to be represented by exactly two bytes. And in hexadecimal,

those two bytes are D0, and 20. Which, makes total sense. But in decimal, those two bytes

are 208 and 32. Which doesn’t appear to make any sense at all. In fact, you have to

do some math to actually see that this is right. You’ll need to multiply 208 by 256,

which gives you 53248, and then add in your 32 from the lower byte and then you get your

final number. So, as you can see, programers find it much easier just to deal with things

like this in hexadecimal.

And of course, if they use hexadecimal on Logopolis, that’s all the proof you need

that it’s the best system for computers!

9A, E7, E9, sorry, E9, 23, wait wait, did you say say E9? Look it says E7 there! You’re

right!

One last thing about hexadecimal I wanted to mention is how it is represented. I’ve

been showing it with a dollar sign in front of it like this, which is the normal way to

identify it in 6502 assembly language, PASCAL, Delphi, and Forth. But, another way to represent

it is to put a zero X in front of it like this, which is common in 8088 assembly language,

C++, C#, Java, Python, and many others. The numbers work exactly the same, it’s just

a different way of designating it as hex in the compiler.

Assembly language in and of itself is not that difficult. In fact, there are certain

things that are actually easier to do in assembly language such as graphics programming. But

there are some tradeoffs. And there are certain things that are kind of a pain. For example,

working with strings. That’s a nightmare to work with in assembly language. However,

I think the thing that often makes assembly language more difficult is the fact that you

have to know how the hardware works that you’re coding for.

With assembly language, and in the example of the 6502 processor, you have your 56 instructions,

and all of these more or less have to do with moving numbers around in memory. So, how do

you actually do things like modify things on the screen or make music, for example?

Well, every chip in the computer presents itself as memory of some sort to the CPU.

So, in the C64 for example, the video chip presents itself as 47 different memory addresses

starting from $D000 to $D02E. When it is not memory, we tend to call them registers. But

from the perspective of the CPU, it can’t tell the difference. So, when we write a number

to the border color register, for example, the CPU just thinks it is just updating a

number in memory, but what is really happening it’s updating a register in the video chip.

So, as a programmer, you’d have to make yourself familiar with all of the registers

in every chip on the computer and how they work.

And of course, when you move from one computer, like the C64 over to another computer such

as the Apple II, even though they use the same processor, everything else is completely

different. That’s why it can be incredibly difficult to port a game from one machine

to another.

I think that’s why people are often confused when they see that I’ve written these games

like Planet X3, or Attack of the Petscii Robots, and they want me to make a version for their

favorite computer. Now even going between systems that share the same CPU, it’s still

a pretty big undertaking.

In fact, there’s a small team of Apple II guys that have been working to convert Petscii

Robots over to the Apple II and as you can see, it is mostly working now. Keep in mind

the Apple II has the same CPU as Commodore systems, so a huge amount of the code is re-usable

between these systems. But it’s still a massive undertaking to re-write all of the

audio and video sub-routines.

But then when people suggest that I make a version of my game for system that uses a

different CPU, for example asking me to make an Amiga or ZX spectrum version or something

like that, at that point it becomes a complete rewrite.

So, it’s no small task. I mean, if it took me a year to write the game in the first place,

then it would take me another year to basically write the whole game again from scratch. And

I think this is where a lot of people are spoiled by modern programming languages because

if you write something in C or Python, fundamentally you can expect that same code is probably

going to run across multiple different CPUs on multiple different systems, on multiple

different memory configurations.

Machine language basically goes one direction. You can compile a program from a higher level

language into machine language. Or you can write in machine language to begin with, if

you want. But you can’t go back the other direction. Much like you can convert notes

on a sheet of music into a symphony. It’s much more difficult to convert the symphony

back into notes. Another example would be scrambling an egg. It’s easy to do it one

direction, but unscrambling is a different story.

Today, Assembly language is more or less a lost art. Modern compilers are pretty good

at creating optimized machine code from higher level languages. And computers are also just

so darned fast that any speed loss you might have is pretty much negligible these days.

I mean, even operating systems like the Linux kernel are written in C. So, assembly language

today doesn’t really offer that much benefit for the modern coder. But if you want to write

games for computers that were madden the 1970s or 80s, learning machine language is pretty

much a must. And if you do learn it, you’ll probably wind up finding out a whole lot more

information about how computers actually work.

OK, one last thing I want to mention, if you’re looking around at the new studio and you’re

thinking “well gee, David, this studio looks pretty boring compared to your old one! Well,

it’s actually not done yet. The problem is the old studio has already been dismantled

so I didn’t have anywhere to film, so it had to be done in here. But yeah, I’m going

to do a whole other follow up video in another couple of weeks here hopefully, showing the

finishing up of the interior of the studio and it will look hopefully quite a bit different.

So any way Thanks for watching

The 8-Bit Guy

1.41M subscribers

Support The 8-Bit Guy on 

https://youtu.be/HWpi9n2H3kE

Get link
Facebook
X
Pinterest
Email
Other Apps

Comments

Why China is losing the microchip war

2/18/2023 02:10:00 PM

In 2012, Zongchang Yu left his job as an engineer at a company called ASML... the only company in the world that can make a unique machine. This machine makes the most advanced semiconductor chips or microchips in the world. After he left ASML he started two new companies one in the US and one in China. US and ASML lawyers would later allege that Yu recruited other ASML engineers to his US company... that they brought with them stolen information about AMSL's machine... and that it was all backed by the Chinese government. This story is just one small piece of China's monumental effort to transform one of the world's most global and significant industries: semiconductors. But China's effort has increasingly locked it in a struggle with the United States. This isn't about market share. This isn't about tariffs. This is about security. So how exactly did China and the US enter into a Cold War over computer chips? This is the first semiconductor chip invented in...

Theme images by Michael Elkan

riaz_tarhana: Interested in studying blogs

Visit profile

Search This Blog

This Simple Exercise Will Improve Your Eye Vision"

How Machine Language Works

Comments

Post a Comment

Popular posts from this blog

Why China is losing the microchip war