News in English

Can the U.S. and China Cooperate on AI?

In 2012, DQN, an AI system developed by then-start-up DeepMind, discovered how to play classic Atari computer games with human-level skill, a major breakthrough at the time. By 2023, GPT-4 had become the most powerful and general AI model to date, showcasing the ability to ace several standardized tests, including the SAT and LSAT, pulling ahead of human doctors on several medical tasks, constructing full-scale business plans for startups, translating natural language to computer code, and producing poetry in the style of famous poets.

We haven’t seen anything yet. There are several reasons to believe that AI systems will continue to become more powerful, more general, and more ubiquitous.

First, there is the recent development and rapid improvement of machine learning algorithms known as foundation models, which take the knowledge learned from one task and apply it to other seemingly unrelated tasks. This ability makes them incredibly versatile and, thus, incredibly powerful. Large language models like GPT-4 are infant technologies that seem likely to undergo many more rounds of improvement as private and public money continue to pour into AI research. They require large amounts of data on which to train, which in turn require the appropriate hardware on which to be processed.

This brings us to our second and third reasons, which are the increasing availability of training data and the increasing sophistication of Graphic Processing Units (GPUs), advanced electronic circuits typically used to train artificial neural networks. GPU throughput, a measure of the rate at which a GPU can process data, has increased tenfold in recent years, a trend that seems likely to continue apace.

Future foundation models are likely to continue increasing in size, leading to the fourth and perhaps most fundamental reason to expect increasing power, generality, and ubiquitousness from our AI models: an arcane theory known as the scaling hypothesis. Machine learning algorithms turn inputs into outputs using scores of numerical values known as parameters, which are adjusted for accuracy as the model trains on reams of data. Under the scaling hypothesis, AI systems will continue to improve given more parameters, as well as more data and more computation, even in the absence of improvements to the algorithms themselves. In other words, bigger is better. 

DeepMind cofounder Mustafa Suleyman argues that it is likely that “orders of magnitude more compute will be used to train the largest AI models” in the near future. Therefore, if the scaling hypothesis is true, rapid AI progress is set to continue and even accelerate for the foreseeable future.

All available evidence suggests that AI systems will continue to gain more sophisticated and more general capabilities. Both economic and national security incentives will push toward the widespread adoption of these systems by private citizens, businesses, governments, and militaries. Ignoring their potential dangers would put the United States at risk of incurring the costs of powerful AI systems pursuing unintended and destructive behaviors.

The Alignment Problem

One of these risks is the so-called alignment problem, defined by Brian Christian as “ensuring that AI models capture our norms and values, understand what we mean or intend, and, above all, do what we want.” To accomplish this goal, policymakers should view the problem of aligning AI as encompassing both technical and policy aspects. The technical aspect of AI alignment is the problem of programming AI systems to align their behavior with the intentions of their programmers. The policy aspect is the problem of writing regulations, creating incentives, and fostering international cooperation to ensure the implementation of best practices in safe AI development.

There are two broad ways in which AI systems have already demonstrated their susceptibility to misalignment. The first is specification gaming, which Victoria Krakovna and her co-authors define as “a behavior that satisfies the literal specification of an objective without achieving the intended outcome.” In such cases, the programmer misspecifies the reward function used to determine the AI system’s actions, causing the system to engage in unintended behaviors. Numerous and well-documented, albeit small-scale, examples of specification gaming highlight a central difficulty in AI research. It is very difficult to specify what we do not want an AI system to do because unintended behaviors are often the product of unforeseen environmental factors. Researchers have thus far failed to find a solution to this problem.

The second way for an AI system to be misaligned is goal misgeneralization. Here, as Rohan Shah and his co-authors explain, “the system may coherently pursue an unintended goal that agrees with the specification during training, but differs from the specification at deployment.” In these cases, the programmer correctly specifies the goal, and the AI system successfully pursues that goal in the training environment. However, when the agent moves outside that environment, the goal fails to generalize, leading to pathological behavior. Given the unpredictability of real-world operational environments, researchers have yet to find a robust solution to this problem as well.

Policy Implications

The combination of increasingly powerful AI systems and our failure thus far to solve the alignment problem poses an unacceptable risk to humanity. This risk has been analyzed in detail elsewhere, and I will not rehash it here. However, it should be fairly obvious that finding ourselves in the presence of an incredibly intelligent and powerful system whose goals are not aligned with our own is not a desirable state of affairs. Given this risk, there are three general policies that the United States should pursue to solve both the technical and policy aspects of the AI alignment problem.

On the technical side, alignment research should be massively scaled up. This research should involve the development of so-called sandboxes and secure simulations, virtual environments in which robust AI systems can be tested before being given access to the real world. This policy requires increasing research funding through both the National Science Foundation and the Department of Defense. The increased spending would allow existing AI safety researchers to scale up their projects, would help build a talent pipeline as demand for research assistants, laboratory assistants, and graduate students in the field increases, and would increase the prestige of the field, which would help attract top talent.

In 2022, there were approximately three or four hundred full-time AI safety researchers worldwide, out of approximately forty thousand AI researchers in total. Given the importance of the problem, this number is unacceptably low. Though it has likely increased in recent years with private AI labs’ growing focus on safety, the problem has not yet been solved, not least because trusting those who have a commercial stake in rapidly releasing the most advanced models to the world is risky. A recent leak revealed that the safety team at OpenAI, the creator of ChatGPT, is already cutting corners.

On the policy side, the United States should require rigorous testing of advanced AI models prior to their release, which should be in line with the latest research on how to do this most effectively. This would ensure that developers use the sandboxes and secure simulations discussed above when they are developed. Even before such techniques are discovered, developers should be required to red-team their models before their release. Red-teaming is the process by which engineers attempt to bypass the safety mechanisms of an AI system to expose its weaknesses, thereby allowing the system’s designers to improve its safety. The White House’s announcement of an AI Safety Institute, which would be responsible for, among other things, “creating guidelines, tools, benchmarks, and best practices for evaluating and mitigating dangerous capabilities and conducting evaluations including red-teaming to identify and mitigate AI risk,” was a good start. The next vital step is writing these guidelines into law, which will require congressional action.

Secondly, the United States should enforce these requirements through regular audits of the most advanced models in development. This may require the creation of a federal registry of advanced AI systems, similar to the one for high-risk systems in the EU AI Act. These audits should focus on the most computationally intensive models since these are the ones that are likely to be the most powerful.

The Strategic Landscape

Even if the United States implemented these policies and enforced them perfectly, it would be unable to ensure the development of safe and beneficial AI on its own. This is because there are two AI superpowers in the world: the United States and China. American policymakers will, therefore, have to work with their Chinese counterparts to tackle this problem properly. Despite their current state of geopolitical competition, there are two reasons to be optimistic about the prospect of Sino-American cooperation on this issue.

First, both the Chinese and American governments have recognized their shared interest in developing safe, aligned AI. President Biden’s executive order on AI and Senator Chuck Schumer’s SAFE Innovation Framework for Artificial Intelligence both recognize AI alignment as a top priority. During their November 2023 bilateral summit, Presidents Biden and Xi both expressed concern over AI safety and reaffirmed their commitment to developing safe AI at their most recent summit in May. In the most telling sign of a recognition of a shared interest in AI alignment, both the United States and China signed the Bletchley Declaration on AI Safety, negotiated among twenty-nine countries in November 2023. The declaration explicitly identifies misalignment as a substantial risk from advanced AI, calls for further research into the problem, explicitly endorses safety testing, and commits the parties to international cooperation.

The second reason for optimism is the historical precedent set by Cold War diplomacy, especially the balance between competition and cooperation with the Soviet Union pursued by President Richard Nixon and Henry Kissinger. The Nixon administration accomplished significant diplomatic breakthroughs with the Soviet Union during the détente of the 1970s, even while it continued to compete with it. For instance, at the Moscow Summit in 1972, the United States and the Soviet Union signed the historic Anti-Ballistic Missile Treaty and the first Strategic Arms Limitation Treaty (SALT I). That same year, after years of talks between the Johnson and Nixon administrations and Soviet Premier Alexei Kosygin, the United States and the Soviet Union founded the International Institute for Applied Systems Analysis (IIASA), a scientific cooperation initiative meant to build bridges between the Western and Soviet spheres through scientific and policy research. The IIASA continues to conduct pioneering research on complex systems, governance, biodiversity, sustainability, migration, demography, and a host of other topics to this day.

While the two superpowers were increasing cooperation, Nixon and Kissinger were prosecuting a war against Soviet-aligned North Vietnam. The historic diplomatic breakthroughs in Moscow were thus not the result of a complete confluence of interests between the rivals but rather the recognition of a pool of shared interests amid an ocean of differences.

Toward an International Policy Framework

Given this historical precedent, there is no reason the United States and China cannot cooperate on AI safety research for the benefit of both countries (not to mention the rest of humanity). The two superpowers can model their joint research efforts on the IIASA. In its strongest form, this joint venture would be an International AI Safety Institute, jointly funded by the United States and China and staffed by top researchers from around the world. An alternative, if weaker, form of cooperation would be a joint commitment to funding AI safety research grants in their respective countries, with annual bilateral conferences and informal cooperation among researchers and government officials.

Because this approach is narrowly focused on technical alignment research, it requires minimal trust between the two countries. One obvious concern about this approach might be the risks of espionage and theft of intellectual property. Some officials had similar concerns about the IIASA. However, because it only conducted non-secret research, espionage, and theft never materialized as major problems. A similar dynamic would be at play in a joint AI safety institute. Most alignment research is discussed in scientific journals and is therefore available to the public. Solutions to safety concerns, therefore, tend not to be classified or proprietary. The Chinese and Americans could continue competing in the AI development arena while cooperating on alignment research. The groundwork for such an initiative has arguably already been set by the recent bilateral summit, though the discussions did not lead to substantive commitments.

As trust builds between the two countries, the United States should push for the signing of a treaty to formalize the principles of the Bletchley Declaration. The end goal would be a series of formal agreements between the two countries to implement the policies discussed above. Taken as a whole, this approach would pool Chinese and American resources to solve the technical problems of AI misalignment while setting international standards for safety testing, red-teaming, and auditing.

Fleshing out the details of this framework will require further research. For instance, the scientists who eventually receive the grants should have broad latitude to pursue the research avenues they view as most promising. However, policymakers, in consultation with AI safety experts, should design a set of standards to ensure that they prioritize the most pressing problems in AI alignment. Similarly, policymakers will need to define the exact safety standards and red-teaming requirements alongside experts with extensive knowledge of and experience in the field.

This proposed framework is meant to get the ball rolling on tackling the AI alignment problem. While it leaves out some key details, the direction it proposes is sound. Policymakers in both the United States and China need to begin prioritizing this problem now in their spending, regulatory, and diplomatic strategies. The cat is out of the bag, and the clock is ticking.

Anthony De Luca-Baratta is an intern at the Center for the National Interest, where his research centers on technology and defense policy. He is a Public Service Fellow at the Johns Hopkins School of Advanced International Studies (SAIS). When he is back home in Montreal, he is a proud member of JC’s morning hockey league.

Image: KaimDH / Shutterstock.com.

Читайте на 123ru.net