The Sony PlayStation 3 hack deciphered: what consumer-electronics designers can learn from the failure to protect a billion-dollar product ecosystem
The Sony lawsuit against George Hotz (aka "GeoHot"), one of the hackers from the "fail0verflow" team responsible for the hack that opened up the PS3 (PlayStation 3) gaming and media console, has been settled, but at this time, controversy continues to swirl around the incident and the hacker involved. The settlement brings to a close the latest chapter in a lengthy process to completely break down the console's security system, which began with the announcement at 27C3 (the 27th Chaos Computing Congress) in December 2010 that the group had obtained the root code-signing keys on the platform. This conquest, in turn, allowed group members to install any software of their choice on PS3 consoles, in effect obtaining total control of the platform.
The PS3 hack is similar to many attacks on security systems: It is not really one hack, but rather an incremental series of attacks made over a period of time, which successively defeat various security subsystem features via a variety of techniques. These kinds of attacks often take place over many days or weeks and use knowledge gained in each successful stage to advance to the next stage. For example, one of the earliest initiatives was a physical attack that induced glitches on the memory bus, enabling the hackers to take control of the operating system and perform additional investigations. The PS3 hack can teach designers much about how to approach and plan platform security.
The PS3 security system
At first glance, the PS3 seems to be a formidable platform to attack. The system controller IC is based on a licensed implementation of IBM's Cell processor, using a 64-bit Power PC general-purpose processor and an array of vector coprocessors. It incorporates a number of features designed to provide a trusted platform architecture that can boot and run authenticated, encrypted code, along with capabilities that appear intended to isolate programs from each other. PS3 platform security features include the following:
- An on-chip boot ROM to prevent easy compromise or replacement of the bootstrap firmware
- On-chip key storage memory (According to fail0verflow, while it is present, "it doesn't work very well." This feature is further discussed later in this article.)
- Public key cryptography with a chain of trust for code signing and authentication (Two aspects of this feature exist; the first is the use of public key cryptographic signing schemes, intended to enable proof of the code's integrity and source. The second aspect is the use of a chain of trust, intended to make the system more robust against compromise of part of the signing hierarchy.)
- Unique per-console keys, used for code and data encryption, to bind these objects to the console in which they are installed
- The use of signed executables, verified at run-time (The purpose of these executables is to prevent untrusted code from successfully running on the console.)
- A security coprocessor designed to isolate access to keys and to limit the ability of compromised software to influence the security subsystem
- Encrypted storage for files stored on hard disk
- A hypervisor that separates applications in a virtualized processor environment
- Processor-supported user and kernel modes to enforce separation of application code and the operating system
According to fail0verflow, as long as other operating systems (notably Linux) could be loaded and run on the PS3 through a feature called "OtherOS," there was no reason for hackers to be interested in breaking the system. Only after Sony removed the OtherOS feature with the introduction of the PS3 Slim (a removal later extended across the entire PS3 product line via an automatically distributed firmware update) did hackers get serious about breaking the PS3's security system.
Anatomy of the hack
As earlier mentioned, most major hacks are implemented via a series of small hacks that successively break down various aspects of the security system until the attacker achieves an objective. In some cases, this objective will simply be the ability to gain control of the system at will, the ability to learn and use a secret, or some similar outcome. In this case, the hackers set an objective of restoring an original capability of the PS3: the option to install and run an operating system of choice.
When the PS3 was first released, it garnered a lot of interest in the academic and scientific computing communities (not to mention the hacker, aka computer-enthusiast, communities) to use the console in conjunction with user-installed operating systems to run custom programs, due to the platform's novelty and the substantial and cost-effective computational capacity that it delivered. Sony even cooperated with and otherwise supported some early efforts. Many users claimed that OtherOS was the primary-to-exclusive reason for them to buy a PS3. The loss of this capability triggered the all-out effort to defeat the PS3 security system. Without that trigger, the PS3's fatal security flaws might not be known today. The initial crack in the PS3's armor was found with the discovery that a well-timed signal glitch on the memory bus could cause a software malfunction that allowed hackers to run unauthorized software. This basic approach of probing system behavior under adverse operating conditions has been successfully applied in a number of attacks on similar devices. Typical techniques include raising or lowering power-supply voltages beyond their specified ranges, slowly ramping power-supply voltages during power-up, raising or lowering the system clock frequency or voltage swing, slowing the system clock edge rate, and loading or driving data and address busses. More determined attacks may, for example, try to modify the circuit board to see what happens when unconnected pins are driven with signals. The basic idea is to induce faults or usually inaccessible operating modes that one can use to transition the system into an unintended state in which user control is possible.
In spite of the seemingly random nature of fault-induction attacks, they have been remarkably successful. Hackers have learned through these successes that events or time windows in operation of the system exist that may be vulnerable to these attacks. After that initial success, a significant vulnerability in the security architecture was subsequently discovered. One of the vector processors acts as a dedicated security processor for the PS3, holding and using keys on behalf of the main processor. But access to it is not well controlled, so it is possible to simply pass an encrypted object (for example, a program executable) to the security processor, which will decrypt it and return the result.
Next, a USB device appeared on the market that could reliably take control of the system immediately after boot. These devices, generically called jailbreak dongles, take advantage of quirks in configuration methods for devices on the USB bus, along with the fact that many systems are designed to allow remote control, debugging, and/or firmware (re)installations via USB peripherals. In the case of the PS3, the jailbreak dongle behaves as a USB hub connected to multiple downstream devices. One of these devices reserves a very large configuration memory space, thereby implementing a memory-overrun attack to gain access to part of the PS3 system firmware. The attack succeeds in part because the PS3 controller does not enforce separation of code and data spaces, allowing the same memory to be treated as data in one context and executable code in another.
The success of this attack highlights a number of PS3 security architecture weaknesses. According to fail0verflow, the combination of OS kernel and hypervisor prevents errant applications from accidentally crashing the operating system or applications running on it, but the combo is not sufficient to prevent malicious code from attacking the rest of the system. The inability to prevent data from being executed as code is one example. Monitoring of the integrity of executing code after it has passed its initial authentication check during start-up—often called RTIC (run-time integrity checking)—also does not exist. Executable code changes to running applications are simply undetected.
One of the main uses for PS3 jailbreak devices was to allow users to continue to run their OS of choice. Without the dongle installed, the PS3 would boot normally from Sony firmware. With it installed, conversely, users could choose what firmware they wanted to run. Recent PS3 firmware versions from Sony prevent the jailbreak from running successfully, and users who wish to access PSN (the PlayStation Network) or play the latest Blu-ray titles must be running them.
This situation thereby forces PS3 owners to choose whether to be able to access movies on disc and subscription games on PSN or to retain the ability to jailbreak their consoles. This is a somewhat reasonable compromise: Users who choose to run Linux on their PS3s can do so, while those who want to use their consoles for gaming and media-playback purposes can do the same. Given the current interest in online gaming, Sony could have long ago closed the hole created by jailbreak devices. Doing so would have limited the value of pirated games by preventing them from being played online.
The next stage in breaking down the PS3's security system was likely facilitated by a human engineering attack or a corrupt insider. A leaked copy of a field maintenance program allows installation of authorized code at any revision level, regardless of the current firmware version running on the console. A user equipped with the field maintenance program could now re-flash his system to switch between firmware versions that can run on PSN (and play current Blu-ray titles) and those that are jailbreak-compatible. This situation highlights another critical (and a nontechnical) aspect of platform security: the human factor. It is often possible to devise technical countermeasures to these kinds of vulnerabilities, but doing so requires careful forethought about the issues involved.
The final major system breach came about through what can only be described as an incompetent implementation. This hack highlights how easy it is for a single careless error to undermine all other aspects of the security system. In essence, a mathematical error existed in the cryptographic system that implements the public key signing scheme on the PS3. This error allowed recovery of the root signing key, the most serious breach one can have in a public key cryptography scheme. To understand the hack, some mathematics understanding is necessary.
Aside from the seriousness of this fundamental shortcoming, the PS3 system used a very simple signing hierarchy, thereby making the system "brittle" (that is, unable to recover from such problems by revoking and reissuing signing credentials). This mistake is one of the most common ones Elliptic Technologies encounters when working with companies that plan to develop platform trust models for their product ecosystems. The signing authority used by Sony's PS3 is essentially equivalent to a root certification authority such as that used to sign SSL credentials used by e-commerce sites. However, all too commonly the signing tool is a simple program developed by a harried software developer late in the race to release a product.
Despite its importance, this signing tool often does not receive the review and scrutiny it deserves in ensuring the ongoing security of the system. A properly designed signing system and key hierarchy will not only ensure that the cryptographic essentials are correct but also should allow for recovery from future breaches such as a lost or stolen signing tool. During the process of hacking the PS3, other flaws emerged. For example, some system code verifies its own signature, leading to the possibility of bypassing the signature verification altogether, or of supplying code with a known-good signature to the verification routine in place of the actual running code.
Identifying the threat
One of the most important steps in defining a security system is the establishment of a set of security objectives. Having a clear statement of these objectives allows architects and developers to define the requirements that ultimately drive the system design and implementation of the system. However, in the case of the PS3, this plan seems to have gone awry.
Systems such as the PS3 are designed for, and a well-entrenched company like Sony will almost certainly achieve, massive-market appeal that results in the sale of millions of system units, along with even more substantial sales of content in the form of games, movies, and network game and other content subscriptions, and a vast array of other goods and services. This situation represents a market worth billions of dollars. Sony has a legitimate interest in protecting this ecosystem against piracy and unauthorized service access. Appropriately defining the security system objectives, therefore, makes solid sense.
The original market definition included a niche submarket, the community of potential customers who wanted to use their PS3s to accomplish alternative functions via alternative OSs. Systems were, as it turns out, sold to that market niche, and those particular users were seemingly happy with what was offered. As previously noted, it was only after Sony unilaterally moved to exit that niche and took technical steps to disable the feature that the effort to break the security system began. It is difficult to understand the rationale for Sony's decision. The company was in the business of selling computer hardware (that is, the PS3 console) and should have had no real interest in controlling the actions of people who bought that hardware. The fundamental system security objective was to protect access to content and other parts of the console ecosystem.
The unintended side effect of trying to control and limit how console owners used their hardware was to trigger a skillful and determined security system attack. It seems likely, as fail0verflow claimed was the case, that most users of the OtherOS feature were interested in running their choice of OS for purposes outside of mainstream gaming and home entertainment. In that context, there should have been little need or interest in attacking security features intended to disable piracy or unauthorized access to other parts of the PS3 ecosystem. The fact that there was no significant progress toward breaking the system in the first 2.5 years or so of PS3 sales would seem to support this assertion.
This situation also suggests that individuals intent on piracy were at first incapable of penetrating the system, or at least were not making significant progress to that end. It would be naïve to believe that no one involved in breaking the system was motivated by a wish to pirate content, yet there is little evidence of such aspirations at this point. However, the PS3 security system is now so completely breached that we are likely to see "script kiddie" tools emerge that enable piracy on a massive scale. Protection against this scenario should from the outset have been the main objective of the PS3 security system.
One common approach to assessing how much effort and cost should go into the design and implementation of a security system is to perform a cost-benefit analysis, taking into account factors such as the following:
- The value of the data and services being protected
- The potential losses if an attack succeeds
- The cost of launching an attack
- The benefits an attacker might gain by succeeding in an attack
For systems that have the potential to generate millions or billions of dollars of initial sales and recurring revenue through service subscriptions, even moderately successful attacks have the potential to cost tens of millions of dollars. It is therefore both feasible and desirable to put considerable resources toward a careful design and implementation of the security system.
In terms of benefits to attackers, the story is less clear. No clear revenue source exists for a successful attack, notwithstanding the modest sales one might expect jailbreak dongles to generate (and even here, public-domain implementations exert downward price pressure). In a consumer market, people interested in piracy or theft of services tend to choose soft targets with low cost of entry, so little money can be made in servicing them. However, teams such as fail0verflow are ideologically motivated: They are fighting for a principle or to assert a right in which they believe. Their ideology, combined with a diverse team with good skills and intelligence, makes them a formidable adversary.
Let’s estimate the value of this particular attack. First, the actual attack was taken on by "volunteers" who, as far as we know, received no compensation or remuneration for their time and any tools they needed. By that measure. the cost of the attack would be approximately zero, perhaps at most a couple thousand dollars if one includes the cost of consoles, jailbreak dongles, and readily available test equipment in the estimate. But now ask the question in a different way: How much would it cost to mount an attack like this if one were to pay for it? The fail0verflow team that assembled itself appears to have consisted of young people with solid skills and (perhaps) university training in computer hardware and software design, as well as a firm grounding in the mathematics of cryptography.
During fail0verflow's 27C3 presentation, four team members appeared on stage, with more in the audience. The team primarily consisted of approximately 10 people. This headcount is roughly equivalent to a small, talented, young engineering team that would cost, at North American loaded labor rates, about $50/hr. If it took five hours a week of effort from each of them over a 15-month period, that 3000 hours of effort would translate to about $150,000. But it's entirely possible that the cost could be half that amount, and just as possible that it's double that figure. What is certain is that it represents a substantial amount of resources brought to bear on the attack—by current standards a serious, committed adversary. This estimate is probably a very reasonable one for the kind of adversary planners should be thinking of when building products aimed at potentially lucrative and popular mass-market consumer applications.
It would seem that Sony put tens of millions of dollars of ecosystem revenue at risk of piracy and unauthorized service access by inviting a formidable attack from an adversary that represented little actual risk to the ecosystem and was benign until antagonized. Ultimately, the adversary came out on top, at least as far as managing to take control of the security system. That outcome represents a serious loss of perspective on the security objectives for the system.
What teachings can come from the PS3 hack? Certainly, many detailed issues allowed the attackers to peel successive layers off the PS3 security onion; these are important in their own right. But the bigger issue is that systems' security architecture and design are frequently undertaken at the very end of development, sometimes as an afterthought and sometimes simply resulting from scheduling issues, as the development of primary system functionality moves toward the product release date. This reality leaves security system development insufficiently resourced and reviewed, in the context of the value it is intended to protect for the whole system. This decision is a mistake. If a system is worth protecting, it is worth protecting well. Some ideas to consider:
- Protect those things that need protection. This statement is not meant to be trite; identification of the assets to be protected is an important first step in developing a plan to protect them. From the outset, the PS3 was marketed as much more than a game console; part of what was sold to people was the ability to run another operating system. This decision required a security subsystem that met the needs of the gaming and media ecosystems, restricting access to content and services when the PS3 was running its factory OS. Limiting protection to this objective likely would have been much easier to achieve than retroactively trying to limit what versions of system code could run on the platform. In effect, the security objectives were changed after the fact to protect against what in reality was a non-threat. The result was an attack that succeeded beyond anyone's wildest expectations.
- Be realistic in assessing the threats you face. The more interesting or valuable your platform or the market for it, the greater the interest will be in trying to compromise it. This reality may translate into large-scale efforts to break the security system. Ensure that security analysis and subsequent design receive a level of attention in proportion to the value of the information you are trying to protect. And don't underestimate the level of effort that an eventual attack may bring.
- Plan for failure. This statement sounds defeatist, but it's not. If your system is valuable enough to attract a serious attack, your adversary may discover a vulnerability that enables a break for a period of time. Allowing for the possibility that you may need to repair and upgrade the security system over time is prudent planning. Designing in controlled amounts of flexibility to allow for changes that may need to be made later is a good future-proofing strategy, provided that it is carefully implemented. In retrospect, the misuse of a shallow signing key hierarchy that divulged the root signing key on the PS3 was a bad idea. Using it in a way that it could not be replaced was an even worse idea.
- Know what you know, and know what you don't know. Many products use their security systems to protect valuable assets, which are generally unrelated to security. The PS3 is fundamentally a special-purpose multimedia entertainment system (and a very good one at that), which has been at the center of a much larger ecosystem of content and programming suppliers. The security system was supposed to protect that ecosystem, but an apparent lack of competence in its design and implementation left it vulnerable and ultimately surmounted. Designing security systems is a specialized undertaking, and many companies do not have the necessary in-house expertise. In such situations, it is well worth engaging a security specialist to help with this particular task. If the same care and thought had gone into the PS3's security system as went into the consumer features, the console likely would not have been breached so completely. Independent scrutiny and review by experts may have helped avoid this outcome.