逆向工程
Practical Reverse Engineering
The reverse engineering learning process is similar to that of foreign language acquisition for adults. The first phase of learning a foreign language begins with an introduction to letters in the alphabet, which are used to construct words with well-defined semantics. The next phase involves understanding the grammatical rules governing how words are glued together to produce a proper sentence. After being accustomed to these rules, one then learns how to stitch multiple sentences together to articulate complex thoughts. Eventually it reaches the point where the learner can read large books written in different styles and still understand the thoughts therein. At this point, one can read reference books on the more esoteric aspects of the language—historical syntax, phonology, and so on.
In reverse engineering, the language is the architecture and assembly language. A word is an assembly instruction. Paragraphs are sequences of assembly instructions. A book is a program. However, to fully understand a book, the reader needs to know more than just vocabulary and grammar. These additional elements include structure and style of prose, unwritten rules of writing, and others. Understanding computer programs also requires a mastery of concepts beyond assembly instructions.
It can be somewhat intimidating to start learning an entirely new technical subject from a book. However, we would be misleading you if we were to claim that reverse engineering is a simple learning endeavor and that it can be completely mastered by reading this book. The learning process is quite involved because it requires knowledge from several disparate domains of knowledge. For example, an effective reverse engineer needs to be knowledgeable in computer architecture, systems programming, operating systems, compilers, and so on; for certain areas, a strong mathematical background is necessary. So how do you know where to start? The answer depends on your experience and skills. Because we cannot accommodate everyone’s background, this introduction outlines the learning and reading methods for those without any programming background. You should find your “position” in the spectrum and start from there.
For the sake of discussion, we loosely define reverse engineering as the process of understanding a system. It is a problem-solving process. A system can be a hardware device, a software program, a physical or chemical process, and so on. For the purposes of the book, the system is a software program. To understand a program, you must first understand how software is written. Hence, the first requirement is knowing how to program a computer through a language such as C, C++, Java, and others. We suggest first learning C due to its simplicity, effectiveness, and ubiquity. Some excellent references to consider are The C Programming Language
, by Brian Kernighan and Dennis Ritchie(Prentice Hall, 1988) and C: A Reference Manual
, by Samuel Harbison (PrenticeHall, 2002). After becoming comfortable with writing, compiling, and debugging basic programs, consider reading Expert C Programming: Deep C Secrets
, by Peter van der Linden (Prentice Hall, 1994). At this point, you should be familiar with high-level concepts such as variables, scopes, functions, pointers, conditionals, loops, call stacks, and libraries. Knowledge of data structures such as stacks, queues, linked lists, and trees might be useful, but they are not entirely necessary for now. To top it off, you might skim through Compilers: Principles, Techniques, and Tools
, by Alfred Aho, Ravi Sethi, and Jeffrey Ullman, (Prentice Hall, 1994) and Linkers and Loaders
, by John Levine (Morgan Kaufmann, 1999), to get a better understanding of how a program is really put together. The key purpose of reading these books is to gain exposure to basic concepts; you do not have to understand every page for now (there will be time for that later). Overachievers should consider Advanced Compiler Design and Implementation
, by Steven Muchnick (Morgan Kaufmann, 1997).
Once you have a good understanding of how programs are generally written, executed, and debugged, you should begin to explore the program’s execution environment, which includes the processor and operating system. We suggest first learning about the Intel processor by skimming through Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture
by Intel, with special attention to Chapters 2–7. These chapters explain the basic elements of a modern computer. Readers interested in ARM should consider Cortex-A Series Programmer’s Guide
and ARM Architecture Reference Manual ARMv7-A and ARMv7-R Edition
by ARM. While our book covers x86/x64/ARM, we do not discuss every architectural detail. (We assume that the reader will refer to these manuals, as necessary.) In skimming through these manuals, you should have a basic appreciation of the technical building blocks of a computing system. For a more conceptual understanding, consider Structured Computer Organization
by Andrew Tanenbaum (Prentice Hall, 1998). All readers should also consult the Microsoft PE and COFF Specification
. At this point, you will have all the necessary background to read and understand Chapter 1, “x86 and x64”, and Chapter 2, “ARM”.
Next, you should explore the operating system. There are many different operating systems, but they share many common concepts including processes, threads, virtual memory, privilege separation, multi-tasking, and so on. The best way to understand these concepts is to read Modern Operating Systems
, by Andrew Tanenbaum (Prentice Hall, 2005). Although Tanenbaum’s text is excellent for concepts, it does not discuss important technical details for real-life operating systems. For Windows, you should consider skimming through Windows NT Device Driver Development
, by Peter Viscarola and Anthony Mason (New Riders Press, 1998); although it is a book on driver development, the background chapters provide an excellent and concrete introduction to Windows. (It is also excellent supplementary material for the Windows kernel chapter in this book.) For additional inspiration (and an excellent treatment of the Windows memory manager), you should also read What Makes It Page? The Windows 7 (x64) Virtual Memory Manager
, by Enrico Martignetti (CreateSpace Independent Publishing Platform, 2012).
At this point, you would have all the necessary background to read and understand Chapter 3 “The Windows Kernel”. You should also consider learning Win32 programming. Windows System Programming
, by Johnson Hart (Addison-Wesley Professional, 2010), and Windows via C/C++
, by Jeffrey Richter and Christophe Nasarre (Microsoft Press, 2007), are excellent references.
For Chapter 4, “Debugging and Automation”, consider Inside Windows Debugging: A Practical Guide to Debugging and Tracing Strategies in Windows
, by Tarik Soulami (Microsoft Press, 2012), and Advanced Windows Debugging
, by Mario Hewardt and Daniel Pravat (Addison-Wesley Professional, 2007).
Chapter 5, “Obfuscation”, requires a good understanding of assembly language and should be read after the x86/x64/ARM chapters. For background knowledge, consider Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection
, by Christian Collberg and Jasvir Nagra (Addison-Wesley Professional, 2009).
Reference Books
The C Programming Language
, by Brian Kernighan and Dennis Ritchie(Prentice Hall, 1988)C: A Reference Manual
, by Samuel Harbison (PrenticeHall, 2002)Expert C Programming: Deep C Secrets
, by Peter van der Linden (Prentice Hall, 1994)Compilers: Principles, Techniques, and Tools
, by Alfred Aho, Ravi Sethi, and Jeffrey Ullman, (Prentice Hall, 1994)Linkers and Loaders
, by John Levine (Morgan Kaufmann, 1999)Advanced Compiler Design and Implementation
, by Steven Muchnick (Morgan Kaufmann, 1997)Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture
Cortex-A Series Programmer’s Guide
ARM Architecture Reference Manual ARMv7-Aand ARMv7-R Edition
Structured Computer Organization
by Andrew Tanenbaum (Prentice Hall, 1998)Microsoft PE and COFF Specification
Modern Operating Systems
, by Andrew Tanenbaum (Prentice Hall, 2005)Windows NT Device Driver Development
, by Peter Viscarola and Anthony Mason (New Riders Press, 1998)What Makes It Page? The Windows 7 (x64) Virtual Memory Manager
, by Enrico Martignetti (CreateSpace Independent Publishing Platform, 2012)Windows System Programming
, by Johnson Hart (Addison-Wesley Professional, 2010)Windows via C/C++
, by Jeffrey Richter and Christophe Nasarre (Microsoft Press, 2007)Inside Windows Debugging: A Practical Guide to Debugging and Tracing Strategies in Windows
, by Tarik Soulami (Microsoft Press, 2012)Advanced Windows Debugging
, by Mario Hewardt and Daniel Pravat (Addison-Wesley Professional, 2007)Surreptitious Software: Obfuscation, Watermarking, and Tamperproofi ng for Software Protection
, by Christian Collberg and Jasvir Nagra (Addison-Wesley Professional, 2009)