Introduction
Modern computers are complex, high-speed
electronic devices that process, store, and manipulate data. However, every
program, OS, or malware sample, no matter how complex, is reduced to numbers so it can be understood by the computer's CPU.
This Lab will go through these main points:
- What is the CPU
- How does a CPU work
- Numbering Systems (Binary, Decimal, Hexadecimal)
- How binary numbers relate to the physical operation of computer hardware and CPUs
- Why humans commonly use decimal representation and its limitations in low-level computing
- Why is hexadecimal notation widely used in debugging, reverse engineering, and memory analysis
- How CPU instructions are ultimately stored and executed as binary values in memory
- The relationship between machine code, hexadecimal instruction bytes, and assembly language instructions
1. What is the CPU
To understand the true
function of the CPU (Central Processing Unit), we went to IBM through this link
(https://www.ibm.com/think/topics/central-processing-unit#:~:text=A%20central%20processing%20unit%20(CPU)%20is%20the%20primary%20functional%20component,in%20a%20highly%20orchestrated%20way.).
The CPU
is the primary component of a computer. It’s a group of electronic circuits
that run a computer's operating systems and applications and manage a variety of other computer operations, so basically everything. To put it simply, the
CPU is essentially the active brain of the computer, where data input is
transformed into information output, meaning it stores and executes program
instructions through the vast network of circuits it has.
3 main components in
the CPU
The Control Unit
The part of the CPU where we can find the circuits that
guide the computer system through a system of electrical pulses and notify it
to execute high-level computer instructions, think of it as a human manager
assigning particular tasks to different workers.
Arithmetic/ logic
Unit (ALU)
This part takes care of all the arithmetic operations and
logical operations as the names suggests. It has 4 main types of operations:
addition, subtraction, multiplication and division.
Memory Unit
This part has several
main functions, from handling the data flow which happens between the RAM and
the CPU, to taking care of the cache memory. This part also has data and
instructions needed for data processing and memory-protection.
Other important
components of the CPU
Cache – a small, ultra-fast
type of volatile memory located directly on or very near to the processor. It
acts as a high-speed buffer between the CPU and the RAM. It is used to prevent
processor idle time
Registers – a form
of permanent memory which can be accessed the millisecond theyre needed.
Clock – It’s the part
that issues electrical pulses at regular intervals, which coordinates the
complicated circuitry within the CPU in a highly synchronized manner.
Instruction register
and pointer – This part of the computer system shows the exact location of
the next instruction to be executed by the CPU.
Buses – only one role, it ensures proper data transfer and data flow between various
components of the computer system. It has a “width”, which essentially means
how many bits can be transferred via the bus in parallel.
2. How does a
CPU work
The functionality of the CPU is handled by the control unit,
with synchronization assistance from the computer clock. The CPU functions on a set instruction cycle,
which has 3 main points:
Fetch – First, the
CPU retrieves a specific binary instruction from the computer's RAM based on an address
provided by the program counter.
Decode – The decoder
within the CPU then translates the binary instructions into electrical signals
that engage other parts of the CPU.
Execute – The CPU
performs the required action, such as a mathematical calculation or even data
transfer
Now that we’ve understood the
basics of how a CPU works, we also need to understand numbering systems and how
they map information directly to CPU instructions.
3. Numbering Systems (Binary, Decimal, Hexadecimal)
Numbering systems are systematic
methods on how we represent write and interpreting numbers using specific digits
and rules, and different values based on position. Each type of numbering system
has a different power of the system’s base, on which they’re also separated. For
this lab, were going to focus on 3 main ones: Binary (base 2), Decimal (base 10),
and hexadecimal (base 16).
Binary
The binary numbering system is a base-2 positional numeral
system that uses only two digits, 0 and 1 (bits), to represent data,
instructions, and numerical values. A single digit is called a bit, which is the
smallest unit of data in computers. A byte, then, is worth 8 bits. Binary is essentially
the “language” of computers, and everything gets converted into binary. At
the most fundamental level, computers are dumb machines that don’t understand
what numbers really are. A number for a computer is simply a collection of
billions of microscopic switches that we call transistors.
Decimal
The decimal number system is a base-10 numeral system that
uses digits from 0 to 9. The decimal number system is the most intuitive and
easiest numeral system for us to understand, but computers cannot process
straight decimals since it cannot be saved in memory, and registers do not
operate in the same way.
Hexadecimal
The hexadecimal numbering system is a base-16 numeral system
that uses digits from 0-9 (the deca part of the name) and letters from A-F. The
letters from A-F represent the numbers from 10-15. Even though it seems really
counterintuitive, hexadecimal is actually a really important way of
representing data. Since raw binary can be quite draining to read and
understand, we developed hex as a more compact way of showing this data.
Hexadecimals have a prefix of 0x when written, so for example, the number 13 is equal
to “0xD”.
4. How binary
numbers relate to the physical operation of computer hardware and CPUs
As mentioned, Binary essentially just is the mathematical
language we use to describe the physical state of the transistors (either high
voltage, usually from 1.8-5V, or low voltage, which is close to 0 V). Binary
values then form bits, and then bits form bytes, which represent data,
instructions, and even memory. Additionally, when we combine transistors, we get
Logic Gates, which are responsible for logical operations like AND, NOT, OR,
NAND, NOR, XOR, and XNOR (essentially all Boolean operations).
5. Why humans
commonly use decimal representation and its limitations in low-level computing
Even though Binary is practical for computer usage, it is
highly non-intuitive for us humans. This is simply explained by our evolution. Since
we had 10 fingers, and we started counting by using them, our reasoning
developed to use a base-10 numbering system for everything, starting from our
everyday calculations, to more complex algorithms such as high-end transactions.
Even though the Decimal system seems to be easier to use for
us, this is not the case when it comes to low-level computing. Since
transistors are simply electrical switches that work on 1/0s, or high and low voltage,
binary is a perfect match for them since they only have to tell apart from 2
physical states of the electrical flow. If we wanted to implement transistors
that use the decimal system even as a low-level language, we would have to build
transistors that could tell apart from 10 distinct voltage levels, which would
then be prone to errors and electrical noise or disruptions.
6. Why is hexadecimal notation widely used in debugging, reverse engineering, and
memory analysis
As we said beforehand, the hexadecimal notion is essentially
just a way to make the binary data more representable and understandable for us
to read when conducting analysis, whether that being debugging, reverse
engineering, or memory analysis. Hex is easier to use because each digit of hex
represents four binary bits, which means a byte can simply be represented
in just 2 hexadecimal digits. It is worth mentioning that even though computers
store information in binary, through various tools, the data will mostly be
displayed in hex through all systems. In our previous labs, we saw various tools
that represent the data in hex, such as PE Bear, PE Studio, x64dbg, etc.
7. How CPU
instructions are ultimately stored and executed as binary values in memory
A CPU bit is represented by electrical signals. The hardware is then properly designed to tell apart from 2 levels of voltage mentioned
above. The binary system is quite reliable, since the hardware can easily
distinguish between only 2 voltage levels.
The CPU then uses transistors as electronic switches. When a
voltage is applied to the gate of a transistor, it allows the electrical
current to either flow (which is a 1 bit), or block it (a 0 bit). By managing
the voltage in specific patterns, we can then create logic gates (AND, OR, NOT,
XOR).
The CPU essentially is just a massive network of these
logical gates organized into functional units. The ALU then uses a complex
arrangement of these logic gates (called adders) to perform mathematical
functions. For example, to add two binary numbers, the ALU sends
electrical pulses through a series of XOR and AND gates that then physically carry
bits to the next column. Registers, on
the other hand, physically trap a 1 and a 0 in a loop, further allowing the CPU
to remember a value momentarily.
The CU then just acts as the conductor. It receives binary
instructions from the memory, and then uses logic gates to route electricity to
the correct part of the CPU to execute that specific command. This is done
through the fetch, decode, and execute, which we previously mentioned.
Additionally, the CPU clock also plays an important role,
where with every “tick” of the clock, the transistors update their states, which
prevents the electrical pulses from crashing into each other.
8. The
relationship between machine code, hexadecimal instruction bytes, and assembly
language instructions
Binary, hexadecimal instruction bytes, and assembly are a
simple hierarchy of languages that are at the heart of every computer, telling
the CPU what to do.
As previously mentioned, machine code, also known as Binary,
is the only language the CPU actually understands, consisting of only 1s and
0s. Since it's really difficult for us to understand them, we use other formats,
such as Hex.
Hexadecimal allows us to write or read machine code in a
more compact way. Instead of instructions in a byte, we group them into two
characters (example 1011 0000 in binary would be Bx0 in hex, which is 176 in
decimal). This doesn’t change the data or what the computer does in any way, it
just makes the data easier to look at without getting lost in the 1s and 0s.
Assembly, on the other hand, is just the human-readable
version of machine code. For example, if
we want to tell the computer to put the number 100 in a specific slot, we would
have to remember specific commands in either hex or binary (B0 64 for hex, and
10110000 01100100 for binary). Instead, in assembly, we can use a short word
like MOV to do the same thing.
We should think of them simply as just 3 ways to do the exact
same thing. It's worth mentioning that we can turn Assembly into machine code
with a tool called an assembler, and do the opposite with a tool called a disassembler.