What is machine language?

Last Updated on May 2, 2020 by Dave Farquhar

From time to time you may find a reference to something mysterious and computer-related called machine language. Most people just assume you know what it means, if you care. But what is machine language in computer terminology, and why is it important?

What is machine language in computer terminology?

Machine language, or machine code, is the native programming language of a computer CPU. As you may be aware, computers don’t understand anything natively other than on and off, or 1 and 0. That’s right. Your computer really is nothing inside other than millions of microscopic on/off switches. Because of this, sometimes we call machine language “binary code.”

Very few people set out to write a computer program in that native language anymore, but it’s possible to do. Today we typically write in high-level languages that are easier for humans to understand, then rely on a compiler to translate into machine language.

Every computer CPU has its own machine language. Sometimes they are compatible with one another, but CPUs from different families have distinct machine languages. Conceptually they tend to be very similar, but code written for one CPU family will not run on another, at least not directly.

What does machine language look like?

There are any number of ways to render machine language in a human-readable format. But generally speaking, it’s pretty cryptic. It’s a series of numbers, and they tend to be weird-looking numbers to us, since we naturally work in base 10 and computers work in base 2. Since a byte is 8 bits (binary digits), the numbers computers use don’t render all that well in base 10. So traditionally we tend to render machine language in base 16, using the letters A-F to represent the numbers 10-15. That’s why machine language looks like something weird until you get used to it.

Here’s a snippet of 6502 machine language, from a Commodore computer of several decades ago. The basic concept still applies to an Intel x86 CPU of today even though it’s a different CPU family.

A9 00 85 20 D0

As you can see, the instruction code is just a series of numbers in hexadecimal. A disassembler could render it in binary too, but A9 takes a lot less space than 1001. Hexadecimal is a compromise, but it’s a good one. That number format is more convenient to read and type than pure binary, while being easy to convert to and from binary if needed.

What this sequence does is set the CPU’s accumulator (a built-in variable) to a value of 0, then store the accumulator in the memory address 0xd020 (traditionally rendered $d020 on the 6502), which translates to memory address 53280 in decimal.

Introducing assembly language

The numbers A9 and 85 above are significant, but who wants to memorize hundreds of numbers just so they can program in machine language? Nobody. So instead, we use a slightly higher level language called assembly language that uses mnemonics.

Here’s the code snippet from above rendered in assembly language.

LDA #$00 STA $D020
This is still cryptic, but at least I can explain it. LDA stands for LoaD the Accumulator. STA stands for STore the Accumulator.

What we’re doing is storing a value of 0 in memory address $d020. That’s not the clearest thing in the world, but it’s a lot easier to make sense of than the numeric sequence I presented first.

Even the most basic CPUs, like a MOS 6502, has dozens of instructions to do simple operations like these. The operations include loading and storing numbers, incrementing or decrementing values, bit-shifting, and looping. A modern CPU like the one in the computer you’re using to read this probably has hundreds of instructions.

Why use machine language or assembly language?

The reason we used machine language or assembly language years ago was for speed. Back when a fast computer ran at six or eight megahertz, high level languages that are easy for humans to understand weren’t fast enough to be practical for a lot of tasks, especially graphics-oriented tasks. The classic NES game Super Mario Bros was 32 kilobytes in size–the whole game. It would have been larger and slower if it had been written in a high-level language.

Why did we stop using machine language and assembly language?

Coding in assembly language is a lost art today. The upside to coding in assembly language and slamming bits directly onto the hardware was speed. The problem with it was compatibility. Imagine having to completely rewrite your computer program depending on whether a computer has an AMD or Nvidia video card inside it.

As computers got faster, we got the luxury to be able to abstract out the hardware and write in higher level languages that are easier to understand. This comes with a great deal of overhead, but the tradeoff is worth it. By 1988 standards, the Raspberry Pi 3 is a supercomputer. Today it’s a toy we buy for $20 to use for tinkering so we don’t tie up a bulkier computer.

During specialized applications, sometimes circumstances still call for assembly language. But that’s pretty rare today. Someone who learned to program this decade who tries to program a Commodore 64 finds it more than a little bit strange. The computer doesn’t give you a whole lot of help.

That’s why assembly language is a lost art. Sometimes I get curmudgeon-like and think about the old days when almost any expert knew computers at that kind of an intimate level. But then again, lots of things we take for granted today weren’t practical then. The good old days may seem better, but in most regards, they probably weren’t.

So what is machine language? In some ways it’s a good thing we don’t have to know anymore.

Dave Farquhar

David Farquhar is a computer security professional, entrepreneur, and author. He started his career as a part-time computer technician in 1994, worked his way up to system administrator by 1997, and has specialized in vulnerability management since 2013. He invests in real estate on the side and his hobbies include O gauge trains, baseball cards, and retro computers and video games. A University of Missouri graduate, he holds CISSP and Security+ certifications. He lives in St. Louis with his family.

The Silicon Underground