Machine Learning for risk forecasting | Construpedia

Navegación

Machine Learning for risk forecasting

Introduction

AI security is a multidisciplinary field concerned with preventing accidents, misuse, or other harmful consequences that could result from the use of artificial intelligence (AI) systems. It covers machine ethics and AI alignment, which aims to make AI systems moral and useful, and AI security encompasses technical issues such as monitoring systems to detect risks and make them highly trustworthy. But beyond AI research, it's about developing standards and policies that promote safety.

Reasons

AI researchers have very different opinions regarding the severity and main sources of risk posed by AI technology,[1][2][3] although some surveys suggest that experts do take risks with considerable consequences seriously. In two surveys of AI researchers, the average respondent was optimistic about AI in general, but gave a 5% probability to an "extremely bad (e.g., human extinction)" outcome from advanced AI.[1]

In a 2022 survey of the natural language processing (NLP) community, 37% agreed or nearly agreed that it is plausible that AI decisions could lead to a catastrophe "as devastating or more devastating than a full-blown nuclear war."[4] Currently, experts are discussing current risks from critical system failures,[5] bias[6] and AI surveillance;[7] emerging risks from technological unemployment, manipulation digital[8] and arms race;[9] and the speculative risks derived from the loss of control over future artificial general intelligence (AGI) agents.[10].

Some have criticized the doubts raised by the IAG, for example Andrew Ng, an associate professor at Stanford University, who compared these concerns to "worrying about overpopulation on Mars when we haven't even set foot on that planet."[11] Others, such as Stuart J. Russell, a professor at the University of California at Berkeley (Berkeley, California), urge caution, arguing that "it is better to anticipate human ingenuity than to underestimate it."[12]

Background

The risks of AI began to be discussed in depth at the beginning of the computer age:.

Between 2008 and 2009, the AAAI (American Association for Artificial Intelligence in English) requested a study to explore and address the possible long-term social influences of AI research and development. The panel was quite skeptical of the radical arguments expressed by science fiction authors, but agreed that "additional research into methods for understanding and verifying the spectrum of behaviors of complex computational systems would be valuable in order to minimize unexpected results."[13].

Machine Learning for risk forecasting

Introduction

Reasons

Background

Research areas

Contenido

Las áreas de investigación en seguridad de la IA incluyen la solidez, la supervisión y la alineación.[26][28] La solidez busca lograr que los sistemas sean altamente confiables, la supervisión trata de anticipar fallos y de detectar usos indebidos, y la alineación se centra en garantizar que persigan objetivos beneficiosos.

Solidity

The study of robustness focuses on ensuring that AI systems behave as intended in a wide range of different situations, including the following secondary problems:.

• - Robustness against black swans: create systems that behave as expected in unusual situations.

• - Adversarial robustness: designing systems to be resistant to data inputs intentionally chosen to make them fail.

Unusual data inputs can cause AI systems to fail catastrophically. For example, in the "Flash Crash" of 2010, automated trading systems reacted unexpectedly and excessively to market aberrations, destroying a trillion dollars in stock values in a matter of minutes.[30].

Note that a distribution change does not need to occur for this to occur. Black swan failures can occur when the input data is long-tailed, as is often the case in real-life situations.[31] Autonomous vehicles continue to have problems with "corner cases" that may not have arisen during the training period; For example, a vehicle could ignore a stop sign that is illuminated such as an LED grille.[32].

Although these types of problems may be resolved as machine learning (ML) systems develop a better understanding of the real world, some researchers point out that even humans often fail to respond appropriately to unprecedented events (such as the COVID-19 pandemic), arguing that robustness to black swans will be a persistent security issue.[28].

AI systems are often vulnerable to adversarial samples or "data inputs to machine learning models that an attacker has intentionally designed to cause the model to make an error."[33] For example, in 2013, Szegedy and colleagues found that adding certain imperceptible distortions to an image could cause it to be misclassified with a high level of confidence.[34] This is still a problem for neural networks, although recent studies have shown them. Distortions are usually large enough to be noticeable.[35][35][36].

All images on the right were classified as ostriches after applying a distortion. (Left) a correctly classified sample, (center) applied distortion magnified 10 times, (right) adversarial sample.[34].

Adversarial robustness is often associated with security.[37] Several researchers demonstrated that an audio signal could be imperceptibly modified so that speech-to-text systems would transcribe it into any message the attacker chose.[38] Network intrusion[39] and malware detection systems[40] must also exhibit adversarial robustness, as attackers could design attacks capable of fooling these detectors.

Models that represent goals (reward models) must also possess adversarial robustness. For example, a reward model can estimate the usefulness of a textual response and a language model can be trained to maximize this result.[41] Several researchers have shown that if a language model is trained long enough, it will exploit the vulnerabilities of the reward model to achieve a better result even by performing worse on the intended task.[42] This problem can be solved by improving the adversarial robustness of the reward model.[43] More generally, any AI system used to evaluate another AI system must have antagonistic robustness. This could include supervisory systems, as these are also susceptible to being manipulated to obtain a higher reward.[44].

Supervision

Monitoring focuses on anticipating AI system failures so they can be prevented or managed. Secondary monitoring issues include detecting untrustworthy systems, detecting malicious uses, understanding the inner workings of black box AI systems (Black Box (Systems)), and identifying hidden functions created by a malicious actor.

It is often important for human operators to evaluate the extent to which they should trust an AI system, especially in high-risk environments such as medical diagnosis.[45] ML models typically convey trust by generating probabilities; However, they are often overconfident,[46] especially in situations that differ from those for which they were trained.[47] The goal of calibration research is to get the model probabilities to correspond as closely as possible to the actual proportionality of the model being correct.

Similarly, anomaly detection or out-of-distribution detection (OOD) aims to identify when an AI system is in an unusual situation. For example, if an autonomous vehicle's sensor malfunctions or encounters difficult terrain, it must alert the driver to take control or stop.[48] Anomaly detection is typically implemented by simply training a classifier to distinguish anomalous inputs from non-anomalous inputs,[49] although other techniques are also used.[50][51].

Academics[9] and public bodies have expressed concern that AI systems could be used to help malicious actors manufacture weapons,[52] manipulate public opinion[53][54] or automate cyberattacks.[55] These concerns are a practical concern for companies like OpenAI, which host powerful online AI tools.[56] To prevent misuse, OpenAI has created detection systems that flag or restrict users based on their activity.[57].

Neural networks are often described as black boxes ("Black Box (systems)"),[58] meaning that it is difficult to understand why they make the decisions they do as a result of the enormous number of computational processes they perform.[59] This presents a challenge in staying ahead of failures. In 2018, an autonomous vehicle killed a pedestrian after failing to identify him. Due to the black box nature of AI software, the reason for the failure remains uncertain.[60].

One of the benefits of transparency is explainability.[61] Sometimes it is a legal requirement to provide an explanation of why a decision has been made to ensure fairness, for example for the automatic filtering of job applications or the assignment of credit scores.[61].

Another advantage is to reveal the cause of the failures.[58] At the beginning of the 2020 COVID-19 pandemic, several researchers used transparency tools to demonstrate that medical image classifiers were "paying attention" to irrelevant hospital labels.[62].

Transparency techniques can also be used to correct errors. For example, in the article "Locating and Editing Factual Associations in GPT", the authors were able to identify model parameters that influenced how they responded to questions about the location of the Eiffel Tower. They were then able to "edit" this knowledge so that the model responded to the questions as if it believed the tower was in Rome rather than France.[63] Although in this case the authors induced an error, these methods could be used for effective correction. There are also model editing techniques in computer vision.[64].

Systemic security and socio-technical factors

It is common for AI risks (and technological risks in general) to be classified as misuse or accidents.[103] Some scholars have suggested that this approach falls short.[103] For example, the Cuban missile crisis was clearly not an accident or misuse of technology.[103] Political analysts Zwetsloot and Dafoe wrote:[103].

Risk factors are typically "structural" or "systemic" in nature, such as competitive pressure, diffusion of harm, accelerated development, high levels of uncertainty, and inadequate safety culture.[103] In a broader safety engineering context, structural factors such as "organizational safety culture" play a central role in the popular STAMP risk analysis framework.[104]

Inspired by the structural perspective, some researchers have highlighted the importance of using machine learning to improve socio-technical security factors, for example, using ML for cyber defense, improving institutional decision-making and facilitating cooperation.[28].

Some specialists are concerned that AI could exacerbate the already imbalanced landscape between cyber attackers and cyber defenders.[105] This would increase the incentives for a "first strike" and could lead to more aggressive and destabilizing attacks. To reduce this risk, some recommend placing more emphasis on cyber defense. Likewise, software security is essential to prevent the theft and misuse of powerful AI models.[9].

The advancement of AI in economic and military fields could trigger unprecedented political challenges.[106] Some experts have compared the development of artificial intelligence to the Cold War, in which decision-making by a small number of people often meant the difference between stability and catastrophe.[107] Researchers in the field of AI have argued that AI technologies could also be used to assist in decision-making.[28] For example, intelligence systems are beginning to be developed. AI-based forecasting[108] and advice.[109].

Many of the main global threats (nuclear war,[110] climate change,[111] etc.) have been framed as problems of cooperation. As in the well-known prisoner's dilemma, some dynamics can lead to bad results for all participants, even when they act in their own benefit. For example, no agent has strong incentives to address climate change, even though the consequences can be serious if no one intervenes.[111].

One of the main challenges of AI cooperation is to avoid a "race to the bottom".[112] In this context, countries or companies would compete to build more capable artificial intelligence systems and neglect safety, leading to a catastrophic accident that would harm everyone involved. Concern about this type of situation has motivated political[113] and technical[114] efforts to facilitate cooperation between human beings and, potentially, between AI systems. Most AI research focuses on designing individual agents to perform isolated functions (often in "single-player games").[115] Several experts have suggested that as AI systems become more autonomous, it may be essential to study and shape the way they interact.[115]

Regarding governance

En general, la gobernanza de la IA se ocupa de crear normas, estándares y reglamentos que guíen el uso y el desarrollo de los sistemas de inteligencia artificial.[107] Implica formular y aplicar recomendaciones concretas, así como llevar a cabo una investigación más fundacional para informar sobre cuáles deben ser estas recomendaciones. Esta sección se centra en los aspectos de la gobernanza de la IA específicamente relacionados con garantizar que los sistemas de IA sean seguros y beneficiosos.

Investigation

The study of AI security governance ranges from foundational research on the potential impacts of AI to its concrete application. From a foundational point of view, various researchers have argued that AI could transform many aspects of society due to its broad applicability, comparing it to electricity and the steam locomotive.[116] Part of the work has focused on anticipating the specific risks that may arise from these impacts, such as mass unemployment,[117] armaments,[118] disinformation,[119] surveillance[120] and the concentration of power.[121].

Other research looks at underlying risk factors, such as the difficulty of monitoring the rapid evolution of the AI industry,[122] the availability of AI models[123] and the "race to the bottom" phenomenon.[112][124] Allan Dafoe, head of governance and long-term strategy at DeepMind, has emphasized the dangers of the race and the potential need for cooperation:[113].

Government action

Some argue that it is too early to regulate AI, fearing that regulations will hinder innovation and considering it foolish to "rush into regulation out of ignorance."[125][126] Others, such as business magnate Elon Musk, advocate for preventive action to mitigate catastrophic risks.[127] So far, few AI safety regulations have been approved at the national level, despite many bills being introduced. A prominent example is the European Union's Artificial Intelligence Law, which regulates certain "high-risk" applications of AI and restricts potentially harmful uses such as facial recognition, subliminal manipulation, and social credit scoring.

Outside of formal legislation, government agencies have proposed ethical and safety recommendations. In March 2021, the US National Security Commission on Artificial Intelligence reported that advances in AI could make it increasingly important to "ensure that systems are aligned with goals and values, such as safety, robustness, and reliability."[128] The National Institute of Standards and Technology subsequently developed a framework for AI risk management, which advises that when "catastrophic risks exist, development and deployment should cease safely until the risks can be adequately managed".[129].

In September 2021, the People's Republic of China published a set of ethical guidelines for the use of AI in China, emphasizing that AI decisions must remain under human control and requiring accountability mechanisms. That same month, the United Kingdom published its 10-year National AI Strategy,[130] which states that the British government "takes seriously the long-term risk of unaligned Artificial General Intelligence, and the unpredictable changes it would bring to the world."[131] The strategy outlines measures to assess the long-term risks related to AI, including those of a catastrophic nature.[131].

Government organizations, especially in the United States, have also encouraged the development of technical research on AI safety. The Intelligence Advanced Research Projects Activity (IARPA) initiated the TrojAI project to identify and protect AI systems against Trojan (computing) attacks.[132] The Defense Advanced Research Projects Agency (DARPA) researches explainable artificial intelligence and how to improve robustness against attacks antagonistic,[133][134] and the National Science Foundation (National Science Foundation or NSF in English) supports the Center for Trustworthy Machine Learning and allocates millions to fund empirical research on AI safety.[135].

Business self-regulation

AI labs and companies are often governed by security practices and standards that fall outside of official legislation.[136] One of the goals of governance researchers is to shape these standards. Some examples of security recommendations in relevant publications include third-party auditing,[137] offering rewards for detecting bugs,[137] sharing AI-related incidents[137] (such a database was created for this),[138] following guidelines to determine whether it is appropriate to publish research or models,[123] and improving information and cybersecurity in AI laboratories.[139].

Companies have also made concrete commitments. Cohere, OpenAI and AI21 proposed and agreed on "best practices for deploying language models", focused on reducing misuse.[140] To avoid contributing to competitive dynamics, OpenAI also stated in its charter that:[141].

Likewise, industry leaders such as DeepMind CEO Demis Hassabis and Facebook AI Director Yann LeCun have signed open letters such as the Asilomar Principles and the Open Letter on Autonomous Weapons.[142].

• - Hallucination (artificial intelligence) "Hallucination (artificial intelligence)").

• - Instrumental convergence.

• - Ethics in artificial intelligence.

• - Existential risk of artificial intelligence.

• - Robotics.

• - Center for the Study of Existential Risk.

• - Machine Intelligence Research Institute.

• - Partnership on AI.

References

[1] ↑ a b Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain (31 de julio de 2018). «Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts». Journal of Artificial Intelligence Research (en inglés) 62: 729-754. ISSN 1076-9757. doi:10.1613/jair.1.11222. Consultado el 22 de junio de 2023.: https://jair.org/index.php/jair/article/view/11222
[2] ↑ Reade, A. E.; Gregory, K. F. (1975-12). «High-temperature production of protein-enriched feed from cassava by fungi». Applied Microbiology 30 (6): 897-904. ISSN 0003-6919. PMC 376565. PMID 2105. doi:10.1128/am.30.6.897-904.1975. Consultado el 22 de junio de 2023.: https://pubmed.ncbi.nlm.nih.gov/2105
[3] ↑ https://aiimpacts.org/author/katja (4 de agosto de 2022). «2022 Expert Survey on Progress in AI». AI Impacts (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://aiimpacts.org/author/katja
[4] ↑ Michael, Julian; Holtzman, Ari; Parrish, Alicia; Mueller, Aaron; Wang, Alex; Chen, Angelica; Madaan, Divyam; Nangia, Nikita; Pang, Richard Yuanzhe; Phang, Jason; Bowman, Samuel R. (26 de agosto de 2022). What Do NLP Researchers Believe? Results of the NLP Community Metasurvey (en inglés). arXiv:2208.12852.: https://es.wikipedia.org//arxiv.org/abs/2208.12852
[5] ↑ De-Arteaga, Maria (13 de mayo de 2020). Machine Learning in High-Stakes Settings: Risks and Opportunities (PhD) (en inglés). Carnegie Mellon University.
[6] ↑ Mehrabi, Ninareh; Morstatter, Fred; Saxena, Nripsuta; Lerman, Kristina; Galstyan, Aram (13 de julio de 2021). «A Survey on Bias and Fairness in Machine Learning». ACM Computing Surveys 54 (6): 115:1-115:35. ISSN 0360-0300. doi:10.1145/3457607. Consultado el 22 de junio de 2023.: https://doi.org/10.1145/3457607
[7] ↑ Feldstein, Steven (2019), The Global Expansion of AI Surveillance (en inglés), Carnegie Endowment for International Peace .
[8] ↑ Barnes, Beth (2021). «Risks from AI persuasion». Lesswrong (en inglés). Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123055429/https://www.lesswrong.com/posts/5cWtwATHL6KyzChck/risks-from-ai-persuasion
[9] ↑ a b c Brundage, Miles; Avin, Shahar; Clark, Jack; Toner, Helen; Eckersley, Peter; Garfinkel, Ben; Dafoe, Allan; Scharre, Paul; Zeitzoff, Thomas; Filar, Bobby; Anderson, Hyrum; Roff, Heather; Allen, Gregory C; Steinhardt, Jacob; Flynn, Carrick (30 de abril de 2018). The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (en inglés). Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository. Apollo - University of Cambridge Repository. S2CID 3385567. doi:10.17863/cam.22520. Archivado desde el original el 23 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221123055429/https://www.repository.cam.ac.uk/handle/1810/275332
[10] ↑ Carlsmith, Joseph (16 de junio de 2022). Is Power-Seeking AI an Existential Risk? (en inglés). arXiv:2206.13353.: https://es.wikipedia.org//arxiv.org/abs/2206.13353
[11] ↑ Shermer, Michael. «Artificial Intelligence Is Not a Threat—Yet». Scientific American (en inglés). Consultado el 22 de junio de 2023.: https://www.scientificamerican.com/article/artificial-intelligence-is-not-a-threat-mdash-yet/
[12] ↑ Dafoe, Allan (2016). «Yes, We Are Worried About the Existential Risk of Artificial Intelligence». MIT Technology Review (en inglés). Archivado desde el original el 28 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221128223713/https://www.technologyreview.com/2016/11/02/156285/yes-we-are-worried-about-the-existential-risk-of-artificial-intelligence/
[13] ↑ a b Markoff, John (20 de mayo de 2013). «In 1949, He Imagined an Age of Robots». The New York Times (en inglés). ISSN 0362-4331. Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123061554/https://www.nytimes.com/2013/05/21/science/mit-scholars-1949-essay-on-machine-age-is-found.html
[14] ↑ Yampolskiy, Roman V.; Spellchecker, M. S. (2016). Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures. doi:10.48550/ARXIV.1610.07997. Consultado el 22 de junio de 2023.: https://arxiv.org/abs/1610.07997
[15] ↑ «PT-AI 2011 - Philosophy and Theory of Artificial Intelligence (PT-AI 2011)» (en inglés). Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123062236/https://conference.researchbib.com/view/event/13986
[16] ↑ Yampolskiy, Roman V. (2013). Müller, Vincent C., ed. Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach (en inglés) 5. Springer Berlin Heidelberg. pp. 389-396. ISBN 978-3-642-31673-9. doi:10.1007/978-3-642-31674-6_29. Consultado el 22 de junio de 2023.: https://link.springer.com/10.1007/978-3-642-31674-6_29
[17] ↑ Elon Musk [@elonmusk]. «Worth reading Superintelligence by Bostrom. We need to be super careful with AI. Potentially more dangerous than nukes.» (tuit) (en inglés) – vía X/Twitter.: https://twitter.com/i/status/495759307346952192
[18] ↑ Baidu CEO Robin Li interviews Bill Gates and Elon Musk at the Boao Forum, March 29 2015, consultado el 22 de junio de 2023 .: https://www.youtube.com/watch?v=NG0ZjUfOBUs
[19] ↑ Cellan-Jones, Rory (2 de diciembre de 2014). «Stephen Hawking warns artificial intelligence could end mankind». BBC News (en inglés). Archivado desde el original el 30 de octubre de 2015. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20151030054329/http://www.bbc.com/news/technology-30290540
[20] ↑ «Research Priorities for Robust and Beneficial Artificial Intelligence: An Open Letter». Future of Life Institute (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://futureoflife.org/open-letter/ai-open-letter/
[21] ↑ «AI Research Grants Program». Future of Life Institute (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://futureoflife.org/ai-research/
[22] ↑ «SafArtInt 2016». www.cmu.edu. Consultado el 22 de junio de 2023.: https://www.cmu.edu/safartint/
[23] ↑ «UW to host first of four White House public workshops on artificial intelligence». UW News (en inglés). Consultado el 22 de junio de 2023.: https://www.washington.edu/news/2016/05/19/uw-to-host-first-of-four-white-house-public-workshops-on-artificial-intelligence/
[24] ↑ Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (2016). Concrete Problems in AI Safety. doi:10.48550/ARXIV.1606.06565. Consultado el 22 de junio de 2023.: https://arxiv.org/abs/1606.06565
[25] ↑ «AI Principles». Future of Life Institute (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://futureoflife.org/open-letter/ai-principles/
[26] ↑ a b Research, DeepMind Safety (27 de septiembre de 2018). «Building safe artificial intelligence: specification, robustness, and assurance». Medium (en inglés). Consultado el 22 de junio de 2023.: https://deepmindsafetyresearch.medium.com/building-safe-artificial-intelligence-52f5f75058f1
[27] ↑ «SafeML ICLR 2019 Workshop». sites.google.com. Consultado el 22 de junio de 2023.: https://sites.google.com/view/safeml-iclr2019
[28] ↑ a b c d e f Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob (2021). Unsolved Problems in ML Safety. doi:10.48550/ARXIV.2109.13916. Consultado el 24 de junio de 2023.: https://arxiv.org/abs/2109.13916
[29] ↑ Browne, Ryan (12 de junio de 2023). «British Prime Minister Rishi Sunak pitches UK as home of A.I. safety regulation as London bids to be next Silicon Valley». CNBC (en inglés). Consultado el 25 de junio de 2023.: https://www.cnbc.com/2023/06/12/pm-rishi-sunak-pitches-uk-as-geographical-home-of-ai-regulation.html
[30] ↑ Kirilenko, Andrei; Kyle, Albert S.; Samadi, Mehrdad; Tuzun, Tugkan (2017-06). «The Flash Crash: High-Frequency Trading in an Electronic Market: The Flash Crash». The Journal of Finance (en inglés) 72 (3): 967-998. doi:10.1111/jofi.12498. Consultado el 24 de junio de 2023.: https://onlinelibrary.wiley.com/doi/10.1111/jofi.12498
[31] ↑ Newman, Mej (2005-09). «Power laws, Pareto distributions and Zipf's law». Contemporary Physics (en inglés) 46 (5): 323-351. ISSN 0010-7514. doi:10.1080/00107510500052444. Consultado el 24 de junio de 2023.: http://www.tandfonline.com/doi/abs/10.1080/00107510500052444
[32] ↑ Eliot, Lance. «Whether Those Endless Edge Or Corner Cases Are The Long-Tail Doom For AI Self-Driving Cars». Forbes. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124070049/https://www.forbes.com/sites/lanceeliot/2021/07/13/whether-those-endless-edge-or-corner-cases-are-the-long-tail-doom-for-ai-self-driving-cars/
[33] ↑ «Attacking machine learning with adversarial examples». openai.com (en inglés estadounidense). Consultado el 29 de junio de 2023.: https://openai.com/research/attacking-machine-learning-with-adversarial-examples
[34] ↑ a b Szegedy, Christian; Zaremba, Wojciech; Sutskever, Ilya; Bruna, Joan; Erhan, Dumitru; Goodfellow, Ian; Fergus, Rob (19 de febrero de 2014). Intriguing properties of neural networks. arXiv:1312.6199.: https://es.wikipedia.org//arxiv.org/abs/1312.6199
[35] ↑ a b Madry, Aleksander; Makelov, Aleksandar; Schmidt, Ludwig; Tsipras, Dimitris; Vladu, Adrian (4 de septiembre de 2019). Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv:1706.06083.: https://es.wikipedia.org//arxiv.org/abs/1706.06083
[36] ↑ Kannan, Harini; Kurakin, Alexey; Goodfellow, Ian (16 de marzo de 2018). Adversarial Logit Pairing. arXiv:1803.06373.: https://es.wikipedia.org//arxiv.org/abs/1803.06373
[37] ↑ Gilmer, Justin; Adams, Ryan P.; Goodfellow, Ian; Andersen, David; Dahl, George E. (19 de julio de 2018). Motivating the Rules of the Game for Adversarial Example Research. arXiv:1807.06732.: https://es.wikipedia.org//arxiv.org/abs/1807.06732
[38] ↑ Carlini, Nicholas; Wagner, David (29 de marzo de 2018). Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. arXiv:1801.01944.: https://es.wikipedia.org//arxiv.org/abs/1801.01944
[39] ↑ Sheatsley, Ryan; Papernot, Nicolas; Weisman, Michael; Verma, Gunjan; McDaniel, Patrick (9 de septiembre de 2022). Adversarial Examples in Constrained Domains. arXiv:2011.01183.: https://es.wikipedia.org//arxiv.org/abs/2011.01183
[40] ↑ Suciu, Octavian; Coull, Scott E.; Johns, Jeffrey (13 de abril de 2019). Exploring Adversarial Examples in Malware Detection. arXiv:1810.08280.: https://es.wikipedia.org//arxiv.org/abs/1810.08280
[41] ↑ Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (4 de marzo de 2022). Training language models to follow instructions with human feedback. arXiv:2203.02155.: https://es.wikipedia.org//arxiv.org/abs/2203.02155
[42] ↑ Gao, Leo; Schulman, John; Hilton, Jacob (19 de octubre de 2022). Scaling Laws for Reward Model Overoptimization. arXiv:2210.10760.: https://es.wikipedia.org//arxiv.org/abs/2210.10760
[43] ↑ Yu, Sihyun; Ahn, Sungsoo; Song, Le; Shin, Jinwoo (27 de octubre de 2021). RoMA: Robust Model Adaptation for Offline Model-based Optimization. arXiv:2110.14188.: https://es.wikipedia.org//arxiv.org/abs/2110.14188
[44] ↑ a b Hendrycks, Dan; Mazeika, Mantas (20 de septiembre de 2022). X-Risk Analysis for AI Research. arXiv:2206.05862.: https://es.wikipedia.org//arxiv.org/abs/2206.05862
[45] ↑ Tran, Khoa A.; Kondrashova, Olga; Bradley, Andrew; Williams, Elizabeth D.; Pearson, John V.; Waddell, Nicola (2021). «Deep learning in cancer diagnosis, prognosis and treatment selection». Genome Medicine (en inglés) 13 (1): 152. ISSN 1756-994X. PMC 8477474. PMID 34579788. doi:10.1186/s13073-021-00968-x.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC8477474
[46] ↑ Guo, Chuan; Pleiss, Geoff; Sun, Yu; Weinberger, Kilian Q. (6 de agosto de 2017). «On calibration of modern neural networks». Proceedings of the 34th international conference on machine learning. Proceedings of machine learning research 70. PMLR. pp. 1321-1330.
[47] ↑ Ovadia, Yaniv; Fertig, Emily; Ren, Jie; Nado, Zachary; Sculley, D.; Nowozin, Sebastian; Dillon, Joshua V.; Lakshminarayanan, Balaji et al. (17 de diciembre de 2019). Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift. arXiv:1906.02530. Se sugiere usar |número-autores= (ayuda).: https://es.wikipedia.org//arxiv.org/abs/1906.02530
[48] ↑ Bogdoll, Daniel; Breitenstein, Jasmin; Heidecker, Florian; Bieshaar, Maarten; Sick, Bernhard; Fingscheidt, Tim; Zöllner, J. Marius (2021). «Description of Corner Cases in Automated Driving: Goals and Challenges». 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW): 1023-1028. ISBN 978-1-6654-0191-3. S2CID 237572375. arXiv:2109.09607. doi:10.1109/ICCVW54120.2021.00119.: https://api.semanticscholar.org/CorpusID:237572375
[49] ↑ Hendrycks, Dan; Mazeika, Mantas; Dietterich, Thomas (28 de enero de 2019). Deep Anomaly Detection with Outlier Exposure. arXiv:1812.04606.: https://es.wikipedia.org//arxiv.org/abs/1812.04606
[50] ↑ Wang, Haoqi; Li, Zhizhong; Feng, Litong; Zhang, Wayne (21 de marzo de 2022). ViM: Out-Of-Distribution with Virtual-logit Matching. arXiv:2203.10807.: https://es.wikipedia.org//arxiv.org/abs/2203.10807
[51] ↑ Hendrycks, Dan; Gimpel, Kevin (3 de octubre de 2018). A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. arXiv:1610.02136.: https://es.wikipedia.org//arxiv.org/abs/1610.02136
[52] ↑ Urbina, Fabio; Lentzos, Filippa; Invernizzi, Cédric; Ekins, Sean (2022). «Dual use of artificial-intelligence-powered drug discovery». Nature Machine Intelligence (en inglés) 4 (3): 189-191. ISSN 2522-5839. PMC 9544280. PMID 36211133. doi:10.1038/s42256-022-00465-9.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC9544280
[53] ↑ Center for Security and Emerging Technology; Buchanan, Ben; Lohn, Andrew; Musser, Micah; Sedova, Katerina (2021). Truth, Lies, and Automation: How Language Models Could Change Disinformation. S2CID 240522878. doi:10.51593/2021ca003. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124073719/https://cset.georgetown.edu/publication/truth-lies-and-automation/
[54] ↑ «Propaganda-as-a-service may be on the horizon if large language models are abused». VentureBeat. 14 de diciembre de 2021. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124073718/https://venturebeat.com/ai/propaganda-as-a-service-may-be-on-the-horizon-if-large-language-models-are-abused/
[55] ↑ Center for Security and Emerging Technology; Buchanan, Ben; Bansemer, John; Cary, Dakota; Lucas, Jack; Musser, Micah (2020). «Automating Cyber Attacks: Hype and Reality». Center for Security and Emerging Technology. S2CID 234623943. doi:10.51593/2020ca002. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124074301/https://cset.georgetown.edu/publication/automating-cyber-attacks/
[56] ↑ «Lessons Learned on Language Model Safety and Misuse». OpenAI. 3 de marzo de 2022. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124074259/https://openai.com/blog/language-model-safety-and-misuse/
[57] ↑ Markov, Todor; Zhang, Chong; Agarwal, Sandhini; Eloundou, Tyna; Lee, Teddy; Adler, Steven; Jiang, Angela; Weng, Lilian (10 de agosto de 2022). «New-and-Improved Content Moderation Tooling». OpenAI. Archivado desde el original el 11 de enero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230111020935/https://openai.com/blog/new-and-improved-content-moderation-tooling/
[58] ↑ a b Savage, Neil (29 de marzo de 2022). «Breaking into the black box of artificial intelligence». Nature. PMID 35352042. S2CID 247792459. doi:10.1038/d41586-022-00858-1. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124074724/https://www.nature.com/articles/d41586-022-00858-1
[59] ↑ Center for Security and Emerging Technology; Rudner, Tim; Toner, Helen (2021). Key Concepts in AI Safety: Interpretability in Machine Learning. S2CID 233775541. doi:10.51593/20190042. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124075212/https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-interpretability-in-machine-learning/
[60] ↑ McFarland, Matt (19 de marzo de 2018). «Uber pulls self-driving cars after first fatal crash of autonomous vehicle». CNNMoney. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124075209/https://money.cnn.com/2018/03/19/technology/uber-autonomous-car-fatal-crash/index.html
[61] ↑ a b Doshi-Velez, Finale; Kortz, Mason; Budish, Ryan; Bavitz, Chris; Gershman, Sam; O'Brien, David; Scott, Kate; Schieber, Stuart; Waldo, James; Weinberger, David; Weller, Adrian; Wood, Alexandra (20 de diciembre de 2019). Accountability of AI Under the Law: The Role of Explanation. arXiv:1711.01134.: https://es.wikipedia.org//arxiv.org/abs/1711.01134
[62] ↑ Fong, Ruth; Vedaldi, Andrea (2017). «Interpretable Explanations of Black Boxes by Meaningful Perturbation». 2017 IEEE International Conference on Computer Vision (ICCV): 3449-3457. ISBN 978-1-5386-1032-9. S2CID 1633753. arXiv:1704.03296. doi:10.1109/ICCV.2017.371.: https://api.semanticscholar.org/CorpusID:1633753
[63] ↑ Meng, Kevin; Bau, David; Andonian, Alex; Belinkov, Yonatan (2022). «Locating and editing factual associations in GPT». Advances in Neural Information Processing Systems 35. arXiv:2202.05262.: https://es.wikipedia.org//arxiv.org/abs/2202.05262
[64] ↑ Bau, David; Liu, Steven; Wang, Tongzhou; Zhu, Jun-Yan; Torralba, Antonio (30 de julio de 2020). Rewriting a Deep Generative Model. arXiv:2007.15646.: https://es.wikipedia.org//arxiv.org/abs/2007.15646
[65] ↑ Räuker, Tilman; Ho, Anson; Casper, Stephen; Hadfield-Menell, Dylan (5 de septiembre de 2022). Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. arXiv:2207.13243.: https://es.wikipedia.org//arxiv.org/abs/2207.13243
[66] ↑ Bau, David; Zhou, Bolei; Khosla, Aditya; Oliva, Aude; Torralba, Antonio (19 de abril de 2017). Network Dissection: Quantifying Interpretability of Deep Visual Representations. arXiv:1704.05796.: https://es.wikipedia.org//arxiv.org/abs/1704.05796
[67] ↑ McGrath, Thomas; Kapishnikov, Andrei; Tomašev, Nenad; Pearce, Adam; Wattenberg, Martin; Hassabis, Demis; Kim, Been; Paquet, Ulrich et al. (22 de noviembre de 2022). «Acquisition of chess knowledge in AlphaZero». Proceedings of the National Academy of Sciences (en inglés) 119 (47): e2206625119. Bibcode:2022PNAS..11906625M. ISSN 0027-8424. PMC 9704706. PMID 36375061. arXiv:2111.09259. doi:10.1073/pnas.2206625119. Se sugiere usar |número-autores= (ayuda).: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC9704706
[68] ↑ Goh, Gabriel; Cammarata, Nick; Voss, Chelsea; Carter, Shan; Petrov, Michael; Schubert, Ludwig; Radford, Alec; Olah, Chris (2021). «Multimodal neurons in artificial neural networks». Distill 6 (3). S2CID 233823418. doi:10.23915/distill.00030.: https://api.semanticscholar.org/CorpusID:233823418
[69] ↑ Olah, Chris; Cammarata, Nick; Schubert, Ludwig; Goh, Gabriel; Petrov, Michael; Carter, Shan (2020). «Zoom in: An introduction to circuits». Distill 5 (3). S2CID 215930358. doi:10.23915/distill.00024.001.: https://api.semanticscholar.org/CorpusID:215930358
[70] ↑ Cammarata, Nick; Goh, Gabriel; Carter, Shan; Voss, Chelsea; Schubert, Ludwig; Olah, Chris (2021). «Curve circuits». Distill 6 (1). doi:10.23915/distill.00024.006. Archivado desde el original el 5 de diciembre de 2022. Consultado el 5 de diciembre de 2022.: https://web.archive.org/web/20221205140056/https://distill.pub/2020/circuits/curve-circuits/
[71] ↑ Olsson, Catherine; Elhage, Nelson; Nanda, Neel; Joseph, Nicholas; DasSarma, Nova; Henighan, Tom; Mann, Ben; Askell, Amanda; Bai, Yuntao; Chen, Anna; Conerly, Tom; Drain, Dawn; Ganguli, Deep; Hatfield-Dodds, Zac; Hernandez, Danny; Johnston, Scott; Jones, Andy; Kernion, Jackson; Lovitt, Liane; Ndousse, Kamal; Amodei, Dario; Brown, Tom; Clark, Jack; Kaplan, Jared; McCandlish, Sam; Olah, Chris (2022). «In-context learning and induction heads». Transformer Circuits Thread. arXiv:2209.11895.: https://es.wikipedia.org//arxiv.org/abs/2209.11895
[72] ↑ Olah, Christopher. «Interpretability vs Neuroscience [rough note]». Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124114744/https://colah.github.io/notes/interp-v-neuro/
[73] ↑ Gu, Tianyu; Dolan-Gavitt, Brendan; Garg, Siddharth (11 de marzo de 2019). BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv:1708.06733.: https://es.wikipedia.org//arxiv.org/abs/1708.06733
[74] ↑ Chen, Xinyun; Liu, Chang; Li, Bo; Lu, Kimberly; Song, Dawn (14 de diciembre de 2017). Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv:1712.05526.: https://es.wikipedia.org//arxiv.org/abs/1712.05526
[75] ↑ Carlini, Nicholas; Terzis, Andreas (28 de marzo de 2022). Poisoning and Backdooring Contrastive Learning. arXiv:2106.09667.: https://es.wikipedia.org//arxiv.org/abs/2106.09667
[76] ↑ Gabriel, Iason (1 de septiembre de 2020). «Artificial Intelligence, Values, and Alignment». Minds and Machines 30 (3): 411-437. ISSN 1572-8641. S2CID 210920551. doi:10.1007/s11023-020-09539-2. Consultado el 23 de julio de 2022.: https://doi.org/10.1007/s11023-020-09539-2
[77] ↑ a b Russell, Stuart J.; Norvig, Peter (2020). Artificial intelligence: A modern approach (4th edición). Pearson. pp. 31-34. ISBN 978-1-292-40113-3. OCLC 1303900751.: https://www.pearson.com/us/higher-education/program/Russell-Artificial-Intelligence-A-Modern-Approach-4th-Edition/PGM1263338.html
[78] ↑ Langosco, Lauro Langosco Di; Koch, Jack; Sharkey, Lee D; Pfau, Jacob; Krueger, David (17 de julio de 2022). «Goal misgeneralization in deep reinforcement learning». Proceedings of the 39th international conference on machine learning. Proceedings of machine learning research 162. PMLR. pp. 12004-12019.
[79] ↑ Krakovna, Victoria; Orseau, Laurent; Ngo, Richard; Martic, Miljan; Legg, Shane (6 de diciembre de 2020). «Avoiding Side Effects By Considering Future Tasks». Advances in Neural Information Processing Systems 33 (NeurIPS 2020) 33. arXiv:2010.07877.: https://proceedings.neurips.cc/paper/2020/hash/dc1913d422398c25c5f0b81cab94cc87-Abstract.html
[80] ↑ a b Russell, Stuart J. (2020). Human compatible: Artificial intelligence and the problem of control. Penguin Random House. ISBN 9780525558637. OCLC 1113410915.: https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/
[81] ↑ a b c d e f g Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob (2022-06-16). «Unsolved Problems in ML Safety». arXiv:2109.13916
[82] ↑ Carlsmith, Joseph (2022-06-16). «Is Power-Seeking AI an Existential Risk?». arXiv:2206.13353
[83] ↑ Christian, Brian (2020). The alignment problem: Machine learning and human values. W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753. Archivado desde el original el 10 de febrero de 2023. Consultado el 10 de octubre de 2022.: https://web.archive.org/web/20230210114137/https://wwnorton.co.uk/books/9780393635829-the-alignment-problem
[84] ↑ Kober, Jens; Bagnell, J. Andrew; Peters, Jan (1 de septiembre de 2013). «Reinforcement learning in robotics: A survey». The International Journal of Robotics Research (en inglés) 32 (11): 1238-1274. ISSN 0278-3649. doi:10.1177/0278364913495721.: http://journals.sagepub.com/doi/10.1177/0278364913495721
[85] ↑ a b Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette et al. (12 de julio de 2022). «On the Opportunities and Risks of Foundation Models». Stanford CRFM. arXiv:2108.07258. Se sugiere usar |número-autores= (ayuda).: https://fsi.stanford.edu/publication/opportunities-and-risks-foundation-models
[86] ↑ Zaremba, Wojciech (10 de agosto de 2021). «OpenAI Codex». OpenAI. Consultado el 23 de julio de 2022.: https://openai.com/blog/openai-codex/
[87] ↑ Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (11 de marzo de 2022). Reward (Mis)design for Autonomous Driving. arXiv:2104.13906.: https://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/knox2021reward.pdf
[88] ↑ Stray, Jonathan (2020). «Aligning AI Optimization to Community Well-Being». International Journal of Community Well-Being (en inglés) 3 (4): 443-463. ISSN 2524-5295. PMID 34723107. doi:10.1007/s42413-020-00086-3.: http://link.springer.com/10.1007/s42413-020-00086-3
[89] ↑ Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob (14 de febrero de 2022). The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models. International Conference on Learning Representations. Consultado el 21 de julio de 2022.: https://openreview.net/forum?id=JYtwGwIL7ye
[90] ↑ Future of Life Institute (11 de agosto de 2017). «Asilomar AI Principles». Future of Life Institute. Consultado el 18 de julio de 2022.: https://futureoflife.org/2017/08/11/ai-principles/
[91] ↑ Naciones Unidas (2021), Nuestra agenda común: Informe del Secretario General, Nueva York: Naciones Unidas, pp. 63-64 .: https://www.un.org/es/content/common-agenda-report/assets/pdf/informe-nuestra-agenda-comun.pdf
[92] ↑ a b c Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (2016-06-21). «Concrete Problems in AI Safety» (en en). arXiv:1606.06565
[93] ↑ Ortega, Pedro A. (27 de septiembre de 2018). «Building safe artificial intelligence: specification, robustness, and assurance». DeepMind Safety Research - Medium. Consultado el 18 de julio de 2022.: https://deepmindsafetyresearch.medium.com/building-safe-artificial-intelligence-52f5f75058f1
[94] ↑ a b Rorvig, Mordechai (14 de abril de 2022). «Researchers Gain New Understanding From Simple AI». Quanta Magazine. Consultado el 18 de julio de 2022.: https://www.quantamagazine.org/researchers-glimpse-how-ai-gets-so-good-at-language-processing-20220414/
[95] ↑ Russell, Stuart; Dewey, Daniel; Tegmark, Max (31 de diciembre de 2015). «Research Priorities for Robust and Beneficial Artificial Intelligence». AI Magazine 36 (4): 105-114. ISSN 2371-9621. doi:10.1609/aimag.v36i4.2577.: https://ojs.aaai.org/index.php/aimagazine/article/view/2577
[96] ↑ Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes (2017). «A survey of preference-based reinforcement learning methods». Journal of Machine Learning Research 18 (136): 1-46.
[97] ↑ Christiano, Paul F.; Leike, Jan; Brown, Tom B.; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). «Deep reinforcement learning from human preferences». Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS'17. Red Hook, NY, USA: Curran Associates Inc. pp. 4302-4310. ISBN 978-1-5108-6096-4.
[98] ↑ Heaven, Will Douglas (27 de enero de 2022). «The new version of GPT-3 is much better behaved (and should be less toxic)». MIT Technology Review. Consultado el 18 de julio de 2022.: https://www.technologyreview.com/2022/01/27/1044398/new-gpt3-openai-chatbot-language-model-ai-toxic-misinformation/
[99] ↑ Clifton, Jesse (2020). «Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda». Center on Long-Term Risk. Consultado el 18 de julio de 2022.: https://longtermrisk.org/research-agenda/
[100] ↑ Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (6 de mayo de 2021). «Cooperative AI: machines must learn to find common ground». Nature (en inglés) 593 (7857): 33-36. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. doi:10.1038/d41586-021-01170-0.: http://www.nature.com/articles/d41586-021-01170-0
[101] ↑ Prunkl, Carina; Whittlestone, Jess (7 de febrero de 2020). «Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society». Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (en inglés) (New York NY USA: ACM): 138-143. ISBN 978-1-4503-7110-0. doi:10.1145/3375627.3375803.: https://dl.acm.org/doi/10.1145/3375627.3375803
[102] ↑ Irving, Geoffrey; Askell, Amanda (19 de febrero de 2019). «AI Safety Needs Social Scientists». Distill 4 (2): 10.23915/distill.00014. ISSN 2476-0757. doi:10.23915/distill.00014.: https://distill.pub/2019/safety-needs-social-scientists
[103] ↑ a b c d e f Zwetsloot, Remco; Dafoe, Allan (11 de febrero de 2019). «Thinking About Risks From AI: Accidents, Misuse and Structure». Lawfare. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124122021/https://www.lawfareblog.com/thinking-about-risks-ai-accidents-misuse-and-structure
[104] ↑ Zhang, Yingyu; Dong, Chuntong; Guo, Weiqun; Dai, Jiabao; Zhao, Ziming (2022). «Systems theoretic accident model and process (STAMP): A literature review». Safety Science (en inglés) 152: 105596. S2CID 244550153. doi:10.1016/j.ssci.2021.105596. Archivado desde el original el 15 de marzo de 2023. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20230315184342/https://www.sciencedirect.com/science/article/abs/pii/S0925753521004367?via%3Dihub
[105] ↑ Center for Security and Emerging Technology; Hoffman, Wyatt (2021). AI and the Future of Cyber Competition. S2CID 234245812. doi:10.51593/2020ca007. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124122253/https://cset.georgetown.edu/publication/ai-and-the-future-of-cyber-competition/
[106] ↑ Center for Security and Emerging Technology; Imbrie, Andrew; Kania, Elsa (2019). AI Safety, Security, and Stability Among Great Powers: Options, Challenges, and Lessons Learned for Pragmatic Engagement. S2CID 240957952. doi:10.51593/20190051. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124122652/https://cset.georgetown.edu/publication/ai-safety-security-and-stability-among-great-powers-options-challenges-and-lessons-learned-for-pragmatic-engagement/
[107] ↑ a b Future of Life Institute (27 de marzo de 2019). AI Strategy, Policy, and Governance (Allan Dafoe). Escena en 22:05. Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123055429/https://www.youtube.com/watch?v=2IpJ8TIKKtI
[108] ↑ Zou, Andy; Xiao, Tristan; Jia, Ryan; Kwon, Joe; Mazeika, Mantas; Li, Richard; Song, Dawn; Steinhardt, Jacob; Evans, Owain; Hendrycks, Dan (9 de octubre de 2022). Forecasting Future World Events with Neural Networks. arXiv:2206.15474.: https://es.wikipedia.org//arxiv.org/abs/2206.15474
[109] ↑ Gathani, Sneha; Hulsebos, Madelon; Gale, James; Haas, Peter J.; Demiralp, Çağatay (8 de febrero de 2022). Augmenting Decision Making via Interactive What-If Analysis. arXiv:2109.06160.: https://es.wikipedia.org//arxiv.org/abs/2109.06160
[110] ↑ Lindelauf, Roy (2021), «Nuclear Deterrence in the Algorithmic Age: Game Theory Revisited», en Osinga, Frans; Sweijs, Tim, eds., NL ARMS Netherlands Annual Review of Military Studies 2020, Nl Arms (en inglés) (The Hague: T.M.C. Asser Press): 421-436, ISBN 978-94-6265-418-1, S2CID 229449677, doi:10.1007/978-94-6265-419-8_22, archivado desde el original el 15 de marzo de 2023, consultado el 24 de noviembre de 2022 .: https://web.archive.org/web/20230315184337/https://link.springer.com/chapter/10.1007/978-94-6265-419-8_22
[111] ↑ a b Newkirk II, Vann R. (21 de abril de 2016). «Is Climate Change a Prisoner's Dilemma or a Stag Hunt?». The Atlantic. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124123011/https://www.theatlantic.com/politics/archive/2016/04/climate-change-game-theory-models/624253/
[112] ↑ a b Armstrong, Stuart; Bostrom, Nick; Shulman, Carl, Racing to the Precipice: a Model of Artificial Intelligence Development, Future of Humanity Institute, Oxford University .
[113] ↑ a b c Dafoe, Allan, AI Governance: A Research Agenda, Centre for the Governance of AI, Future of Humanity Institute, University of Oxford .
[114] ↑ Dafoe, Allan; Hughes, Edward; Bachrach, Yoram; Collins, Tantum; McKee, Kevin R.; Leibo, Joel Z.; Larson, Kate; Graepel, Thore (15 de diciembre de 2020). Open Problems in Cooperative AI. arXiv:2012.08630.: https://es.wikipedia.org//arxiv.org/abs/2012.08630
[115] ↑ a b Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (2021). «Cooperative AI: machines must learn to find common ground». Nature 593 (7857): 33-36. Bibcode:2021Natur.593...33D. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. Archivado desde el original el 22 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221122230552/https://www.nature.com/articles/d41586-021-01170-0
[116] ↑ Crafts, Nicholas (23 de septiembre de 2021). «Artificial intelligence as a general-purpose technology: an historical perspective». Oxford Review of Economic Policy (en inglés) 37 (3): 521-536. ISSN 0266-903X. doi:10.1093/oxrep/grab012. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124130718/https://academic.oup.com/oxrep/article/37/3/521/6374675
[117] ↑ 葉俶禎; 黃子君; 張媁雯; 賴志樫 (1 de diciembre de 2020). «Labor Displacement in Artificial Intelligence Era: A Systematic Literature Review». 臺灣東亞文明研究學刊 (en inglés) 17 (2). ISSN 1812-6243. doi:10.6163/TJEAS.202012_17(2).0002.: https://es.wikipedia.org//portal.issn.org/resource/issn/1812-6243
[118] ↑ Johnson, James (3 de abril de 2019). «Artificial intelligence & future warfare: implications for international security». Defense & Security Analysis (en inglés) 35 (2): 147-169. ISSN 1475-1798. S2CID 159321626. doi:10.1080/14751798.2019.1600800. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124125204/https://www.tandfonline.com/doi/full/10.1080/14751798.2019.1600800
[119] ↑ Kertysova, Katarina (12 de diciembre de 2018). «Artificial Intelligence and Disinformation: How AI Changes the Way Disinformation is Produced, Disseminated, and Can Be Countered». Security and Human Rights 29 (1–4): 55-81. ISSN 1874-7337. S2CID 216896677. doi:10.1163/18750230-02901005. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124125204/https://brill.com/view/journals/shrs/29/1-4/article-p55_55.xml
[120] ↑ Feldstein, Steven (2019). The Global Expansion of AI Surveillance. Carnegie Endowment for International Peace.
[121] ↑ The economics of artificial intelligence : an agenda. Ajay Agrawal, Joshua Gans, Avi Goldfarb. Chicago. 2019. ISBN 978-0-226-61347-5. OCLC 1099435014. Archivado desde el original el 15 de marzo de 2023. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20230315184354/https://www.worldcat.org/title/1099435014
[122] ↑ Whittlestone, Jess; Clark, Jack (31 de agosto de 2021). Why and How Governments Should Monitor AI Development. arXiv:2108.12427.: https://es.wikipedia.org//arxiv.org/abs/2108.12427
[123] ↑ a b Shevlane, Toby (2022). «Sharing Powerful AI Models | GovAI Blog». Center for the Governance of AI. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124125202/https://www.governance.ai/post/sharing-powerful-ai-models
[124] ↑ Askell, Amanda; Brundage, Miles; Hadfield, Gillian (10 de julio de 2019). The Role of Cooperation in Responsible AI Development. arXiv:1907.04534.: https://es.wikipedia.org//arxiv.org/abs/1907.04534
[125] ↑ Ziegler, Bart. «Is It Time to Regulate AI?». WSJ. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124125645/https://www.wsj.com/articles/is-it-time-to-regulate-ai-11649433600
[126] ↑ Reed, Chris (13 de septiembre de 2018). «How should we regulate artificial intelligence?». Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences (en inglés) 376 (2128): 20170360. Bibcode:2018RSPTA.37670360R. ISSN 1364-503X. PMC 6107539. PMID 30082306. doi:10.1098/rsta.2017.0360.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC6107539
[127] ↑ Belton, Keith B. (7 de marzo de 2019). «How Should AI Be Regulated?». IndustryWeek. Archivado desde el original el 29 de enero de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20220129114109/https://www.industryweek.com/technology-and-iiot/article/22027274/how-should-ai-be-regulated
[128] ↑ National Security Commission on Artificial Intelligence (2021), Final Report .
[129] ↑ National Institute of Standards and Technology (12 de julio de 2021). «AI Risk Management Framework». NIST. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124130402/https://www.nist.gov/itl/ai-risk-management-framework
[130] ↑ Richardson, Tim (2021). «Britain publishes 10-year National Artificial Intelligence Strategy». Archivado desde el original el 10 de febrero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230210114137/https://www.theregister.com/2021/09/22/uk_10_year_national_ai_strategy/
[131] ↑ a b «Guidance: National AI Strategy». GOV.UK. 2021. Archivado desde el original el 10 de febrero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230210114139/https://www.gov.uk/government/publications/national-ai-strategy/national-ai-strategy-html-version
[132] ↑ Office of the Director of National Intelligence; Office of the Director of National Intelligence, Intelligence Advanced Research Projects Activity. «IARPA - TrojAI». Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124131956/https://www.iarpa.gov/research-programs/trojai
[133] ↑ Turek, Matt. «Explainable Artificial Intelligence». Archivado desde el original el 19 de febrero de 2021. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20210219210013/https://www.darpa.mil/program/explainable-artificial-intelligence
[134] ↑ Draper, Bruce. «Guaranteeing AI Robustness Against Deception». Defense Advanced Research Projects Agency. Archivado desde el original el 9 de enero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230109021433/https://www.darpa.mil/program/guaranteeing-ai-robustness-against-deception
[135] ↑ National Science Foundation. «Safe Learning-Enabled Systems». Archivado desde el original el 26 de febrero de 2023. Consultado el 27 de febrero de 2023.: https://web.archive.org/web/20230226190627/https://beta.nsf.gov/funding/opportunities/safe-learning-enabled-systems
[136] ↑ Mäntymäki, Matti; Minkkinen, Matti; Birkstedt, Teemu; Viljanen, Mika (2022). «Defining organizational AI governance». AI and Ethics (en inglés) 2 (4): 603-609. ISSN 2730-5953. S2CID 247119668. doi:10.1007/s43681-022-00143-x. Archivado desde el original el 15 de marzo de 2023. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20230315184339/https://link.springer.com/article/10.1007/s43681-022-00143-x
[137] ↑ a b c Brundage, Miles; Avin, Shahar; Wang, Jasmine; Belfield, Haydn; Krueger, Gretchen; Hadfield, Gillian; Khlaaf, Heidy; Yang, Jingying; Toner, Helen; Fong, Ruth; Maharaj, Tegan; Koh, Pang Wei; Hooker, Sara; Leung, Jade; Trask, Andrew (20 de abril de 2020). Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. arXiv:2004.07213.: https://es.wikipedia.org//arxiv.org/abs/2004.07213
[138] ↑ «Welcome to the Artificial Intelligence Incident Database». Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124132715/https://incidentdatabase.ai/
[139] ↑ Wiblin, Robert; Harris, Keiran (2022). «Nova DasSarma on why information security may be critical to the safe development of AI systems». 80,000 Hours. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124132927/https://80000hours.org/podcast/episodes/nova-dassarma-information-security-and-ai-systems/
[140] ↑ OpenAI (2 de junio de 2022). «Best Practices for Deploying Language Models». OpenAI. Archivado desde el original el 15 de marzo de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230315184334/https://openai.com/blog/best-practices-for-deploying-language-models/
[141] ↑ a b OpenAI. «OpenAI Charter». OpenAI. Archivado desde el original el 4 de marzo de 2021. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20210304235618/https://openai.com/charter/
[142] ↑ Future of Life Institute (2016). «Autonomous Weapons Open Letter: AI & Robotics Researchers». Future of Life Institute. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124114555/https://futureoflife.org/open-letter/open-letter-autonomous-weapons-ai-robotics/

The risks of AI began to be discussed in depth at the beginning of the computer age:.

In 2011, Roman Yampolskiy introduced the term "AI safety engineering"[14] during the Philosophy and Theory of Artificial Intelligence conference,[15] listing previous failures of AI systems and claiming that "the frequency and severity of such events will progressively increase as AIs become more competent."[16].

In 2014, philosopher Nick Bostrom published the book Superintelligence: Paths, Dangers, Strategies. His claim that advanced systems of the future could pose a threat to human existence prompted Elon Musk,[17] Bill Gates[18] and Stephen Hawking[19] to express similar concerns.

In 2015, dozens of artificial intelligence experts signed an open letter on this topic&action=edit&redlink=1 "Open Letter on Artificial Intelligence (2015) (not yet drafted)") calling for research into the social impacts of AI and for concrete guidance.[20] The letter has been signed by more than 8,000 people to date, including Yann LeCun, Shane Legg), Joshua Bengio and Stuart Russell.

That same year, a group of academics led by Professor Stuart Russell founded the Center for Human-Friendly Artificial Intelligence at UC Berkeley and the Institute for the Future of Life awarded $6.5 million in research grants to "ensure that artificial intelligence (AI) remains safe, ethical, and beneficial."[21]

In 2016, the White House Office of Science and Technology Policy and Carnegie Mellon University announced the Public Workshop on the Security and Control of Artificial Intelligence,[22] which was part of a series of four workshops organized by the White House with the goal of investigating "the advantages and disadvantages" of AI.[23] That same year, Concrete Security Issues in AI, one of the first and most influential technical agendas in matters of security in AI.[24].

In 2017, the Future of Life Institute sponsored the Asilomar Conference on Beneficial AI), in which more than 100 thought leaders formulated a series of principles for achieving beneficial AI, including "Avoid competition": AI system development teams must actively cooperate to avoid lowering safety standards.[25]

In 2018, the DeepMind Safety team raised several AI safety issues regarding specification, robustness, and reliability.[26] The following year, researchers organized a workshop at the ICLR (International Conference on Learning Representations in English) focused on these problem areas.[27].

In 2021, Unsolved Problems in ML Safety was published, outlining lines of research regarding robustness, supervision, alignment and systemic safety.[28].

In 2023, Rishi Sunak said he wants the UK to be the "geographic home of global AI safety regulation" and to host the first global AI safety summit.[29].

Research areas

Contenido

Solidity

The study of robustness focuses on ensuring that AI systems behave as intended in a wide range of different situations, including the following secondary problems:.

• - Robustness against black swans: create systems that behave as expected in unusual situations.

• - Adversarial robustness: designing systems to be resistant to data inputs intentionally chosen to make them fail.

Supervision

Systemic security and socio-technical factors

Regarding governance

Investigation

Government action

Business self-regulation

• - Hallucination (artificial intelligence) "Hallucination (artificial intelligence)").

• - Instrumental convergence.

• - Ethics in artificial intelligence.

• - Existential risk of artificial intelligence.

• - Robotics.

• - Center for the Study of Existential Risk.

• - Machine Intelligence Research Institute.

• - Partnership on AI.

References

[1] ↑ a b Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain (31 de julio de 2018). «Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts». Journal of Artificial Intelligence Research (en inglés) 62: 729-754. ISSN 1076-9757. doi:10.1613/jair.1.11222. Consultado el 22 de junio de 2023.: https://jair.org/index.php/jair/article/view/11222
[2] ↑ Reade, A. E.; Gregory, K. F. (1975-12). «High-temperature production of protein-enriched feed from cassava by fungi». Applied Microbiology 30 (6): 897-904. ISSN 0003-6919. PMC 376565. PMID 2105. doi:10.1128/am.30.6.897-904.1975. Consultado el 22 de junio de 2023.: https://pubmed.ncbi.nlm.nih.gov/2105
[3] ↑ https://aiimpacts.org/author/katja (4 de agosto de 2022). «2022 Expert Survey on Progress in AI». AI Impacts (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://aiimpacts.org/author/katja
[4] ↑ Michael, Julian; Holtzman, Ari; Parrish, Alicia; Mueller, Aaron; Wang, Alex; Chen, Angelica; Madaan, Divyam; Nangia, Nikita; Pang, Richard Yuanzhe; Phang, Jason; Bowman, Samuel R. (26 de agosto de 2022). What Do NLP Researchers Believe? Results of the NLP Community Metasurvey (en inglés). arXiv:2208.12852.: https://es.wikipedia.org//arxiv.org/abs/2208.12852
[5] ↑ De-Arteaga, Maria (13 de mayo de 2020). Machine Learning in High-Stakes Settings: Risks and Opportunities (PhD) (en inglés). Carnegie Mellon University.
[6] ↑ Mehrabi, Ninareh; Morstatter, Fred; Saxena, Nripsuta; Lerman, Kristina; Galstyan, Aram (13 de julio de 2021). «A Survey on Bias and Fairness in Machine Learning». ACM Computing Surveys 54 (6): 115:1-115:35. ISSN 0360-0300. doi:10.1145/3457607. Consultado el 22 de junio de 2023.: https://doi.org/10.1145/3457607
[7] ↑ Feldstein, Steven (2019), The Global Expansion of AI Surveillance (en inglés), Carnegie Endowment for International Peace .
[8] ↑ Barnes, Beth (2021). «Risks from AI persuasion». Lesswrong (en inglés). Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123055429/https://www.lesswrong.com/posts/5cWtwATHL6KyzChck/risks-from-ai-persuasion
[9] ↑ a b c Brundage, Miles; Avin, Shahar; Clark, Jack; Toner, Helen; Eckersley, Peter; Garfinkel, Ben; Dafoe, Allan; Scharre, Paul; Zeitzoff, Thomas; Filar, Bobby; Anderson, Hyrum; Roff, Heather; Allen, Gregory C; Steinhardt, Jacob; Flynn, Carrick (30 de abril de 2018). The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation (en inglés). Apollo-University Of Cambridge Repository, Apollo-University Of Cambridge Repository. Apollo - University of Cambridge Repository. S2CID 3385567. doi:10.17863/cam.22520. Archivado desde el original el 23 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221123055429/https://www.repository.cam.ac.uk/handle/1810/275332
[10] ↑ Carlsmith, Joseph (16 de junio de 2022). Is Power-Seeking AI an Existential Risk? (en inglés). arXiv:2206.13353.: https://es.wikipedia.org//arxiv.org/abs/2206.13353
[11] ↑ Shermer, Michael. «Artificial Intelligence Is Not a Threat—Yet». Scientific American (en inglés). Consultado el 22 de junio de 2023.: https://www.scientificamerican.com/article/artificial-intelligence-is-not-a-threat-mdash-yet/
[12] ↑ Dafoe, Allan (2016). «Yes, We Are Worried About the Existential Risk of Artificial Intelligence». MIT Technology Review (en inglés). Archivado desde el original el 28 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221128223713/https://www.technologyreview.com/2016/11/02/156285/yes-we-are-worried-about-the-existential-risk-of-artificial-intelligence/
[13] ↑ a b Markoff, John (20 de mayo de 2013). «In 1949, He Imagined an Age of Robots». The New York Times (en inglés). ISSN 0362-4331. Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123061554/https://www.nytimes.com/2013/05/21/science/mit-scholars-1949-essay-on-machine-age-is-found.html
[14] ↑ Yampolskiy, Roman V.; Spellchecker, M. S. (2016). Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures. doi:10.48550/ARXIV.1610.07997. Consultado el 22 de junio de 2023.: https://arxiv.org/abs/1610.07997
[15] ↑ «PT-AI 2011 - Philosophy and Theory of Artificial Intelligence (PT-AI 2011)» (en inglés). Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123062236/https://conference.researchbib.com/view/event/13986
[16] ↑ Yampolskiy, Roman V. (2013). Müller, Vincent C., ed. Artificial Intelligence Safety Engineering: Why Machine Ethics Is a Wrong Approach (en inglés) 5. Springer Berlin Heidelberg. pp. 389-396. ISBN 978-3-642-31673-9. doi:10.1007/978-3-642-31674-6_29. Consultado el 22 de junio de 2023.: https://link.springer.com/10.1007/978-3-642-31674-6_29
[17] ↑ Elon Musk [@elonmusk]. «Worth reading Superintelligence by Bostrom. We need to be super careful with AI. Potentially more dangerous than nukes.» (tuit) (en inglés) – vía X/Twitter.: https://twitter.com/i/status/495759307346952192
[18] ↑ Baidu CEO Robin Li interviews Bill Gates and Elon Musk at the Boao Forum, March 29 2015, consultado el 22 de junio de 2023 .: https://www.youtube.com/watch?v=NG0ZjUfOBUs
[19] ↑ Cellan-Jones, Rory (2 de diciembre de 2014). «Stephen Hawking warns artificial intelligence could end mankind». BBC News (en inglés). Archivado desde el original el 30 de octubre de 2015. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20151030054329/http://www.bbc.com/news/technology-30290540
[20] ↑ «Research Priorities for Robust and Beneficial Artificial Intelligence: An Open Letter». Future of Life Institute (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://futureoflife.org/open-letter/ai-open-letter/
[21] ↑ «AI Research Grants Program». Future of Life Institute (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://futureoflife.org/ai-research/
[22] ↑ «SafArtInt 2016». www.cmu.edu. Consultado el 22 de junio de 2023.: https://www.cmu.edu/safartint/
[23] ↑ «UW to host first of four White House public workshops on artificial intelligence». UW News (en inglés). Consultado el 22 de junio de 2023.: https://www.washington.edu/news/2016/05/19/uw-to-host-first-of-four-white-house-public-workshops-on-artificial-intelligence/
[24] ↑ Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (2016). Concrete Problems in AI Safety. doi:10.48550/ARXIV.1606.06565. Consultado el 22 de junio de 2023.: https://arxiv.org/abs/1606.06565
[25] ↑ «AI Principles». Future of Life Institute (en inglés estadounidense). Consultado el 22 de junio de 2023.: https://futureoflife.org/open-letter/ai-principles/
[26] ↑ a b Research, DeepMind Safety (27 de septiembre de 2018). «Building safe artificial intelligence: specification, robustness, and assurance». Medium (en inglés). Consultado el 22 de junio de 2023.: https://deepmindsafetyresearch.medium.com/building-safe-artificial-intelligence-52f5f75058f1
[27] ↑ «SafeML ICLR 2019 Workshop». sites.google.com. Consultado el 22 de junio de 2023.: https://sites.google.com/view/safeml-iclr2019
[28] ↑ a b c d e f Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob (2021). Unsolved Problems in ML Safety. doi:10.48550/ARXIV.2109.13916. Consultado el 24 de junio de 2023.: https://arxiv.org/abs/2109.13916
[29] ↑ Browne, Ryan (12 de junio de 2023). «British Prime Minister Rishi Sunak pitches UK as home of A.I. safety regulation as London bids to be next Silicon Valley». CNBC (en inglés). Consultado el 25 de junio de 2023.: https://www.cnbc.com/2023/06/12/pm-rishi-sunak-pitches-uk-as-geographical-home-of-ai-regulation.html
[30] ↑ Kirilenko, Andrei; Kyle, Albert S.; Samadi, Mehrdad; Tuzun, Tugkan (2017-06). «The Flash Crash: High-Frequency Trading in an Electronic Market: The Flash Crash». The Journal of Finance (en inglés) 72 (3): 967-998. doi:10.1111/jofi.12498. Consultado el 24 de junio de 2023.: https://onlinelibrary.wiley.com/doi/10.1111/jofi.12498
[31] ↑ Newman, Mej (2005-09). «Power laws, Pareto distributions and Zipf's law». Contemporary Physics (en inglés) 46 (5): 323-351. ISSN 0010-7514. doi:10.1080/00107510500052444. Consultado el 24 de junio de 2023.: http://www.tandfonline.com/doi/abs/10.1080/00107510500052444
[32] ↑ Eliot, Lance. «Whether Those Endless Edge Or Corner Cases Are The Long-Tail Doom For AI Self-Driving Cars». Forbes. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124070049/https://www.forbes.com/sites/lanceeliot/2021/07/13/whether-those-endless-edge-or-corner-cases-are-the-long-tail-doom-for-ai-self-driving-cars/
[33] ↑ «Attacking machine learning with adversarial examples». openai.com (en inglés estadounidense). Consultado el 29 de junio de 2023.: https://openai.com/research/attacking-machine-learning-with-adversarial-examples
[34] ↑ a b Szegedy, Christian; Zaremba, Wojciech; Sutskever, Ilya; Bruna, Joan; Erhan, Dumitru; Goodfellow, Ian; Fergus, Rob (19 de febrero de 2014). Intriguing properties of neural networks. arXiv:1312.6199.: https://es.wikipedia.org//arxiv.org/abs/1312.6199
[35] ↑ a b Madry, Aleksander; Makelov, Aleksandar; Schmidt, Ludwig; Tsipras, Dimitris; Vladu, Adrian (4 de septiembre de 2019). Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv:1706.06083.: https://es.wikipedia.org//arxiv.org/abs/1706.06083
[36] ↑ Kannan, Harini; Kurakin, Alexey; Goodfellow, Ian (16 de marzo de 2018). Adversarial Logit Pairing. arXiv:1803.06373.: https://es.wikipedia.org//arxiv.org/abs/1803.06373
[37] ↑ Gilmer, Justin; Adams, Ryan P.; Goodfellow, Ian; Andersen, David; Dahl, George E. (19 de julio de 2018). Motivating the Rules of the Game for Adversarial Example Research. arXiv:1807.06732.: https://es.wikipedia.org//arxiv.org/abs/1807.06732
[38] ↑ Carlini, Nicholas; Wagner, David (29 de marzo de 2018). Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. arXiv:1801.01944.: https://es.wikipedia.org//arxiv.org/abs/1801.01944
[39] ↑ Sheatsley, Ryan; Papernot, Nicolas; Weisman, Michael; Verma, Gunjan; McDaniel, Patrick (9 de septiembre de 2022). Adversarial Examples in Constrained Domains. arXiv:2011.01183.: https://es.wikipedia.org//arxiv.org/abs/2011.01183
[40] ↑ Suciu, Octavian; Coull, Scott E.; Johns, Jeffrey (13 de abril de 2019). Exploring Adversarial Examples in Malware Detection. arXiv:1810.08280.: https://es.wikipedia.org//arxiv.org/abs/1810.08280
[41] ↑ Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie (4 de marzo de 2022). Training language models to follow instructions with human feedback. arXiv:2203.02155.: https://es.wikipedia.org//arxiv.org/abs/2203.02155
[42] ↑ Gao, Leo; Schulman, John; Hilton, Jacob (19 de octubre de 2022). Scaling Laws for Reward Model Overoptimization. arXiv:2210.10760.: https://es.wikipedia.org//arxiv.org/abs/2210.10760
[43] ↑ Yu, Sihyun; Ahn, Sungsoo; Song, Le; Shin, Jinwoo (27 de octubre de 2021). RoMA: Robust Model Adaptation for Offline Model-based Optimization. arXiv:2110.14188.: https://es.wikipedia.org//arxiv.org/abs/2110.14188
[44] ↑ a b Hendrycks, Dan; Mazeika, Mantas (20 de septiembre de 2022). X-Risk Analysis for AI Research. arXiv:2206.05862.: https://es.wikipedia.org//arxiv.org/abs/2206.05862
[45] ↑ Tran, Khoa A.; Kondrashova, Olga; Bradley, Andrew; Williams, Elizabeth D.; Pearson, John V.; Waddell, Nicola (2021). «Deep learning in cancer diagnosis, prognosis and treatment selection». Genome Medicine (en inglés) 13 (1): 152. ISSN 1756-994X. PMC 8477474. PMID 34579788. doi:10.1186/s13073-021-00968-x.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC8477474
[46] ↑ Guo, Chuan; Pleiss, Geoff; Sun, Yu; Weinberger, Kilian Q. (6 de agosto de 2017). «On calibration of modern neural networks». Proceedings of the 34th international conference on machine learning. Proceedings of machine learning research 70. PMLR. pp. 1321-1330.
[47] ↑ Ovadia, Yaniv; Fertig, Emily; Ren, Jie; Nado, Zachary; Sculley, D.; Nowozin, Sebastian; Dillon, Joshua V.; Lakshminarayanan, Balaji et al. (17 de diciembre de 2019). Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift. arXiv:1906.02530. Se sugiere usar |número-autores= (ayuda).: https://es.wikipedia.org//arxiv.org/abs/1906.02530
[48] ↑ Bogdoll, Daniel; Breitenstein, Jasmin; Heidecker, Florian; Bieshaar, Maarten; Sick, Bernhard; Fingscheidt, Tim; Zöllner, J. Marius (2021). «Description of Corner Cases in Automated Driving: Goals and Challenges». 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW): 1023-1028. ISBN 978-1-6654-0191-3. S2CID 237572375. arXiv:2109.09607. doi:10.1109/ICCVW54120.2021.00119.: https://api.semanticscholar.org/CorpusID:237572375
[49] ↑ Hendrycks, Dan; Mazeika, Mantas; Dietterich, Thomas (28 de enero de 2019). Deep Anomaly Detection with Outlier Exposure. arXiv:1812.04606.: https://es.wikipedia.org//arxiv.org/abs/1812.04606
[50] ↑ Wang, Haoqi; Li, Zhizhong; Feng, Litong; Zhang, Wayne (21 de marzo de 2022). ViM: Out-Of-Distribution with Virtual-logit Matching. arXiv:2203.10807.: https://es.wikipedia.org//arxiv.org/abs/2203.10807
[51] ↑ Hendrycks, Dan; Gimpel, Kevin (3 de octubre de 2018). A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. arXiv:1610.02136.: https://es.wikipedia.org//arxiv.org/abs/1610.02136
[52] ↑ Urbina, Fabio; Lentzos, Filippa; Invernizzi, Cédric; Ekins, Sean (2022). «Dual use of artificial-intelligence-powered drug discovery». Nature Machine Intelligence (en inglés) 4 (3): 189-191. ISSN 2522-5839. PMC 9544280. PMID 36211133. doi:10.1038/s42256-022-00465-9.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC9544280
[53] ↑ Center for Security and Emerging Technology; Buchanan, Ben; Lohn, Andrew; Musser, Micah; Sedova, Katerina (2021). Truth, Lies, and Automation: How Language Models Could Change Disinformation. S2CID 240522878. doi:10.51593/2021ca003. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124073719/https://cset.georgetown.edu/publication/truth-lies-and-automation/
[54] ↑ «Propaganda-as-a-service may be on the horizon if large language models are abused». VentureBeat. 14 de diciembre de 2021. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124073718/https://venturebeat.com/ai/propaganda-as-a-service-may-be-on-the-horizon-if-large-language-models-are-abused/
[55] ↑ Center for Security and Emerging Technology; Buchanan, Ben; Bansemer, John; Cary, Dakota; Lucas, Jack; Musser, Micah (2020). «Automating Cyber Attacks: Hype and Reality». Center for Security and Emerging Technology. S2CID 234623943. doi:10.51593/2020ca002. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124074301/https://cset.georgetown.edu/publication/automating-cyber-attacks/
[56] ↑ «Lessons Learned on Language Model Safety and Misuse». OpenAI. 3 de marzo de 2022. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124074259/https://openai.com/blog/language-model-safety-and-misuse/
[57] ↑ Markov, Todor; Zhang, Chong; Agarwal, Sandhini; Eloundou, Tyna; Lee, Teddy; Adler, Steven; Jiang, Angela; Weng, Lilian (10 de agosto de 2022). «New-and-Improved Content Moderation Tooling». OpenAI. Archivado desde el original el 11 de enero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230111020935/https://openai.com/blog/new-and-improved-content-moderation-tooling/
[58] ↑ a b Savage, Neil (29 de marzo de 2022). «Breaking into the black box of artificial intelligence». Nature. PMID 35352042. S2CID 247792459. doi:10.1038/d41586-022-00858-1. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124074724/https://www.nature.com/articles/d41586-022-00858-1
[59] ↑ Center for Security and Emerging Technology; Rudner, Tim; Toner, Helen (2021). Key Concepts in AI Safety: Interpretability in Machine Learning. S2CID 233775541. doi:10.51593/20190042. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124075212/https://cset.georgetown.edu/publication/key-concepts-in-ai-safety-interpretability-in-machine-learning/
[60] ↑ McFarland, Matt (19 de marzo de 2018). «Uber pulls self-driving cars after first fatal crash of autonomous vehicle». CNNMoney. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124075209/https://money.cnn.com/2018/03/19/technology/uber-autonomous-car-fatal-crash/index.html
[61] ↑ a b Doshi-Velez, Finale; Kortz, Mason; Budish, Ryan; Bavitz, Chris; Gershman, Sam; O'Brien, David; Scott, Kate; Schieber, Stuart; Waldo, James; Weinberger, David; Weller, Adrian; Wood, Alexandra (20 de diciembre de 2019). Accountability of AI Under the Law: The Role of Explanation. arXiv:1711.01134.: https://es.wikipedia.org//arxiv.org/abs/1711.01134
[62] ↑ Fong, Ruth; Vedaldi, Andrea (2017). «Interpretable Explanations of Black Boxes by Meaningful Perturbation». 2017 IEEE International Conference on Computer Vision (ICCV): 3449-3457. ISBN 978-1-5386-1032-9. S2CID 1633753. arXiv:1704.03296. doi:10.1109/ICCV.2017.371.: https://api.semanticscholar.org/CorpusID:1633753
[63] ↑ Meng, Kevin; Bau, David; Andonian, Alex; Belinkov, Yonatan (2022). «Locating and editing factual associations in GPT». Advances in Neural Information Processing Systems 35. arXiv:2202.05262.: https://es.wikipedia.org//arxiv.org/abs/2202.05262
[64] ↑ Bau, David; Liu, Steven; Wang, Tongzhou; Zhu, Jun-Yan; Torralba, Antonio (30 de julio de 2020). Rewriting a Deep Generative Model. arXiv:2007.15646.: https://es.wikipedia.org//arxiv.org/abs/2007.15646
[65] ↑ Räuker, Tilman; Ho, Anson; Casper, Stephen; Hadfield-Menell, Dylan (5 de septiembre de 2022). Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. arXiv:2207.13243.: https://es.wikipedia.org//arxiv.org/abs/2207.13243
[66] ↑ Bau, David; Zhou, Bolei; Khosla, Aditya; Oliva, Aude; Torralba, Antonio (19 de abril de 2017). Network Dissection: Quantifying Interpretability of Deep Visual Representations. arXiv:1704.05796.: https://es.wikipedia.org//arxiv.org/abs/1704.05796
[67] ↑ McGrath, Thomas; Kapishnikov, Andrei; Tomašev, Nenad; Pearce, Adam; Wattenberg, Martin; Hassabis, Demis; Kim, Been; Paquet, Ulrich et al. (22 de noviembre de 2022). «Acquisition of chess knowledge in AlphaZero». Proceedings of the National Academy of Sciences (en inglés) 119 (47): e2206625119. Bibcode:2022PNAS..11906625M. ISSN 0027-8424. PMC 9704706. PMID 36375061. arXiv:2111.09259. doi:10.1073/pnas.2206625119. Se sugiere usar |número-autores= (ayuda).: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC9704706
[68] ↑ Goh, Gabriel; Cammarata, Nick; Voss, Chelsea; Carter, Shan; Petrov, Michael; Schubert, Ludwig; Radford, Alec; Olah, Chris (2021). «Multimodal neurons in artificial neural networks». Distill 6 (3). S2CID 233823418. doi:10.23915/distill.00030.: https://api.semanticscholar.org/CorpusID:233823418
[69] ↑ Olah, Chris; Cammarata, Nick; Schubert, Ludwig; Goh, Gabriel; Petrov, Michael; Carter, Shan (2020). «Zoom in: An introduction to circuits». Distill 5 (3). S2CID 215930358. doi:10.23915/distill.00024.001.: https://api.semanticscholar.org/CorpusID:215930358
[70] ↑ Cammarata, Nick; Goh, Gabriel; Carter, Shan; Voss, Chelsea; Schubert, Ludwig; Olah, Chris (2021). «Curve circuits». Distill 6 (1). doi:10.23915/distill.00024.006. Archivado desde el original el 5 de diciembre de 2022. Consultado el 5 de diciembre de 2022.: https://web.archive.org/web/20221205140056/https://distill.pub/2020/circuits/curve-circuits/
[71] ↑ Olsson, Catherine; Elhage, Nelson; Nanda, Neel; Joseph, Nicholas; DasSarma, Nova; Henighan, Tom; Mann, Ben; Askell, Amanda; Bai, Yuntao; Chen, Anna; Conerly, Tom; Drain, Dawn; Ganguli, Deep; Hatfield-Dodds, Zac; Hernandez, Danny; Johnston, Scott; Jones, Andy; Kernion, Jackson; Lovitt, Liane; Ndousse, Kamal; Amodei, Dario; Brown, Tom; Clark, Jack; Kaplan, Jared; McCandlish, Sam; Olah, Chris (2022). «In-context learning and induction heads». Transformer Circuits Thread. arXiv:2209.11895.: https://es.wikipedia.org//arxiv.org/abs/2209.11895
[72] ↑ Olah, Christopher. «Interpretability vs Neuroscience [rough note]». Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124114744/https://colah.github.io/notes/interp-v-neuro/
[73] ↑ Gu, Tianyu; Dolan-Gavitt, Brendan; Garg, Siddharth (11 de marzo de 2019). BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv:1708.06733.: https://es.wikipedia.org//arxiv.org/abs/1708.06733
[74] ↑ Chen, Xinyun; Liu, Chang; Li, Bo; Lu, Kimberly; Song, Dawn (14 de diciembre de 2017). Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv:1712.05526.: https://es.wikipedia.org//arxiv.org/abs/1712.05526
[75] ↑ Carlini, Nicholas; Terzis, Andreas (28 de marzo de 2022). Poisoning and Backdooring Contrastive Learning. arXiv:2106.09667.: https://es.wikipedia.org//arxiv.org/abs/2106.09667
[76] ↑ Gabriel, Iason (1 de septiembre de 2020). «Artificial Intelligence, Values, and Alignment». Minds and Machines 30 (3): 411-437. ISSN 1572-8641. S2CID 210920551. doi:10.1007/s11023-020-09539-2. Consultado el 23 de julio de 2022.: https://doi.org/10.1007/s11023-020-09539-2
[77] ↑ a b Russell, Stuart J.; Norvig, Peter (2020). Artificial intelligence: A modern approach (4th edición). Pearson. pp. 31-34. ISBN 978-1-292-40113-3. OCLC 1303900751.: https://www.pearson.com/us/higher-education/program/Russell-Artificial-Intelligence-A-Modern-Approach-4th-Edition/PGM1263338.html
[78] ↑ Langosco, Lauro Langosco Di; Koch, Jack; Sharkey, Lee D; Pfau, Jacob; Krueger, David (17 de julio de 2022). «Goal misgeneralization in deep reinforcement learning». Proceedings of the 39th international conference on machine learning. Proceedings of machine learning research 162. PMLR. pp. 12004-12019.
[79] ↑ Krakovna, Victoria; Orseau, Laurent; Ngo, Richard; Martic, Miljan; Legg, Shane (6 de diciembre de 2020). «Avoiding Side Effects By Considering Future Tasks». Advances in Neural Information Processing Systems 33 (NeurIPS 2020) 33. arXiv:2010.07877.: https://proceedings.neurips.cc/paper/2020/hash/dc1913d422398c25c5f0b81cab94cc87-Abstract.html
[80] ↑ a b Russell, Stuart J. (2020). Human compatible: Artificial intelligence and the problem of control. Penguin Random House. ISBN 9780525558637. OCLC 1113410915.: https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/
[81] ↑ a b c d e f g Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob (2022-06-16). «Unsolved Problems in ML Safety». arXiv:2109.13916
[82] ↑ Carlsmith, Joseph (2022-06-16). «Is Power-Seeking AI an Existential Risk?». arXiv:2206.13353
[83] ↑ Christian, Brian (2020). The alignment problem: Machine learning and human values. W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753. Archivado desde el original el 10 de febrero de 2023. Consultado el 10 de octubre de 2022.: https://web.archive.org/web/20230210114137/https://wwnorton.co.uk/books/9780393635829-the-alignment-problem
[84] ↑ Kober, Jens; Bagnell, J. Andrew; Peters, Jan (1 de septiembre de 2013). «Reinforcement learning in robotics: A survey». The International Journal of Robotics Research (en inglés) 32 (11): 1238-1274. ISSN 0278-3649. doi:10.1177/0278364913495721.: http://journals.sagepub.com/doi/10.1177/0278364913495721
[85] ↑ a b Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette et al. (12 de julio de 2022). «On the Opportunities and Risks of Foundation Models». Stanford CRFM. arXiv:2108.07258. Se sugiere usar |número-autores= (ayuda).: https://fsi.stanford.edu/publication/opportunities-and-risks-foundation-models
[86] ↑ Zaremba, Wojciech (10 de agosto de 2021). «OpenAI Codex». OpenAI. Consultado el 23 de julio de 2022.: https://openai.com/blog/openai-codex/
[87] ↑ Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (11 de marzo de 2022). Reward (Mis)design for Autonomous Driving. arXiv:2104.13906.: https://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/knox2021reward.pdf
[88] ↑ Stray, Jonathan (2020). «Aligning AI Optimization to Community Well-Being». International Journal of Community Well-Being (en inglés) 3 (4): 443-463. ISSN 2524-5295. PMID 34723107. doi:10.1007/s42413-020-00086-3.: http://link.springer.com/10.1007/s42413-020-00086-3
[89] ↑ Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob (14 de febrero de 2022). The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models. International Conference on Learning Representations. Consultado el 21 de julio de 2022.: https://openreview.net/forum?id=JYtwGwIL7ye
[90] ↑ Future of Life Institute (11 de agosto de 2017). «Asilomar AI Principles». Future of Life Institute. Consultado el 18 de julio de 2022.: https://futureoflife.org/2017/08/11/ai-principles/
[91] ↑ Naciones Unidas (2021), Nuestra agenda común: Informe del Secretario General, Nueva York: Naciones Unidas, pp. 63-64 .: https://www.un.org/es/content/common-agenda-report/assets/pdf/informe-nuestra-agenda-comun.pdf
[92] ↑ a b c Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (2016-06-21). «Concrete Problems in AI Safety» (en en). arXiv:1606.06565
[93] ↑ Ortega, Pedro A. (27 de septiembre de 2018). «Building safe artificial intelligence: specification, robustness, and assurance». DeepMind Safety Research - Medium. Consultado el 18 de julio de 2022.: https://deepmindsafetyresearch.medium.com/building-safe-artificial-intelligence-52f5f75058f1
[94] ↑ a b Rorvig, Mordechai (14 de abril de 2022). «Researchers Gain New Understanding From Simple AI». Quanta Magazine. Consultado el 18 de julio de 2022.: https://www.quantamagazine.org/researchers-glimpse-how-ai-gets-so-good-at-language-processing-20220414/
[95] ↑ Russell, Stuart; Dewey, Daniel; Tegmark, Max (31 de diciembre de 2015). «Research Priorities for Robust and Beneficial Artificial Intelligence». AI Magazine 36 (4): 105-114. ISSN 2371-9621. doi:10.1609/aimag.v36i4.2577.: https://ojs.aaai.org/index.php/aimagazine/article/view/2577
[96] ↑ Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes (2017). «A survey of preference-based reinforcement learning methods». Journal of Machine Learning Research 18 (136): 1-46.
[97] ↑ Christiano, Paul F.; Leike, Jan; Brown, Tom B.; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). «Deep reinforcement learning from human preferences». Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS'17. Red Hook, NY, USA: Curran Associates Inc. pp. 4302-4310. ISBN 978-1-5108-6096-4.
[98] ↑ Heaven, Will Douglas (27 de enero de 2022). «The new version of GPT-3 is much better behaved (and should be less toxic)». MIT Technology Review. Consultado el 18 de julio de 2022.: https://www.technologyreview.com/2022/01/27/1044398/new-gpt3-openai-chatbot-language-model-ai-toxic-misinformation/
[99] ↑ Clifton, Jesse (2020). «Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda». Center on Long-Term Risk. Consultado el 18 de julio de 2022.: https://longtermrisk.org/research-agenda/
[100] ↑ Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (6 de mayo de 2021). «Cooperative AI: machines must learn to find common ground». Nature (en inglés) 593 (7857): 33-36. Bibcode:2021Natur.593...33D. ISSN 0028-0836. PMID 33947992. doi:10.1038/d41586-021-01170-0.: http://www.nature.com/articles/d41586-021-01170-0
[101] ↑ Prunkl, Carina; Whittlestone, Jess (7 de febrero de 2020). «Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society». Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (en inglés) (New York NY USA: ACM): 138-143. ISBN 978-1-4503-7110-0. doi:10.1145/3375627.3375803.: https://dl.acm.org/doi/10.1145/3375627.3375803
[102] ↑ Irving, Geoffrey; Askell, Amanda (19 de febrero de 2019). «AI Safety Needs Social Scientists». Distill 4 (2): 10.23915/distill.00014. ISSN 2476-0757. doi:10.23915/distill.00014.: https://distill.pub/2019/safety-needs-social-scientists
[103] ↑ a b c d e f Zwetsloot, Remco; Dafoe, Allan (11 de febrero de 2019). «Thinking About Risks From AI: Accidents, Misuse and Structure». Lawfare. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124122021/https://www.lawfareblog.com/thinking-about-risks-ai-accidents-misuse-and-structure
[104] ↑ Zhang, Yingyu; Dong, Chuntong; Guo, Weiqun; Dai, Jiabao; Zhao, Ziming (2022). «Systems theoretic accident model and process (STAMP): A literature review». Safety Science (en inglés) 152: 105596. S2CID 244550153. doi:10.1016/j.ssci.2021.105596. Archivado desde el original el 15 de marzo de 2023. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20230315184342/https://www.sciencedirect.com/science/article/abs/pii/S0925753521004367?via%3Dihub
[105] ↑ Center for Security and Emerging Technology; Hoffman, Wyatt (2021). AI and the Future of Cyber Competition. S2CID 234245812. doi:10.51593/2020ca007. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124122253/https://cset.georgetown.edu/publication/ai-and-the-future-of-cyber-competition/
[106] ↑ Center for Security and Emerging Technology; Imbrie, Andrew; Kania, Elsa (2019). AI Safety, Security, and Stability Among Great Powers: Options, Challenges, and Lessons Learned for Pragmatic Engagement. S2CID 240957952. doi:10.51593/20190051. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124122652/https://cset.georgetown.edu/publication/ai-safety-security-and-stability-among-great-powers-options-challenges-and-lessons-learned-for-pragmatic-engagement/
[107] ↑ a b Future of Life Institute (27 de marzo de 2019). AI Strategy, Policy, and Governance (Allan Dafoe). Escena en 22:05. Archivado desde el original el 23 de noviembre de 2022. Consultado el 23 de noviembre de 2022.: https://web.archive.org/web/20221123055429/https://www.youtube.com/watch?v=2IpJ8TIKKtI
[108] ↑ Zou, Andy; Xiao, Tristan; Jia, Ryan; Kwon, Joe; Mazeika, Mantas; Li, Richard; Song, Dawn; Steinhardt, Jacob; Evans, Owain; Hendrycks, Dan (9 de octubre de 2022). Forecasting Future World Events with Neural Networks. arXiv:2206.15474.: https://es.wikipedia.org//arxiv.org/abs/2206.15474
[109] ↑ Gathani, Sneha; Hulsebos, Madelon; Gale, James; Haas, Peter J.; Demiralp, Çağatay (8 de febrero de 2022). Augmenting Decision Making via Interactive What-If Analysis. arXiv:2109.06160.: https://es.wikipedia.org//arxiv.org/abs/2109.06160
[110] ↑ Lindelauf, Roy (2021), «Nuclear Deterrence in the Algorithmic Age: Game Theory Revisited», en Osinga, Frans; Sweijs, Tim, eds., NL ARMS Netherlands Annual Review of Military Studies 2020, Nl Arms (en inglés) (The Hague: T.M.C. Asser Press): 421-436, ISBN 978-94-6265-418-1, S2CID 229449677, doi:10.1007/978-94-6265-419-8_22, archivado desde el original el 15 de marzo de 2023, consultado el 24 de noviembre de 2022 .: https://web.archive.org/web/20230315184337/https://link.springer.com/chapter/10.1007/978-94-6265-419-8_22
[111] ↑ a b Newkirk II, Vann R. (21 de abril de 2016). «Is Climate Change a Prisoner's Dilemma or a Stag Hunt?». The Atlantic. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124123011/https://www.theatlantic.com/politics/archive/2016/04/climate-change-game-theory-models/624253/
[112] ↑ a b Armstrong, Stuart; Bostrom, Nick; Shulman, Carl, Racing to the Precipice: a Model of Artificial Intelligence Development, Future of Humanity Institute, Oxford University .
[113] ↑ a b c Dafoe, Allan, AI Governance: A Research Agenda, Centre for the Governance of AI, Future of Humanity Institute, University of Oxford .
[114] ↑ Dafoe, Allan; Hughes, Edward; Bachrach, Yoram; Collins, Tantum; McKee, Kevin R.; Leibo, Joel Z.; Larson, Kate; Graepel, Thore (15 de diciembre de 2020). Open Problems in Cooperative AI. arXiv:2012.08630.: https://es.wikipedia.org//arxiv.org/abs/2012.08630
[115] ↑ a b Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (2021). «Cooperative AI: machines must learn to find common ground». Nature 593 (7857): 33-36. Bibcode:2021Natur.593...33D. PMID 33947992. S2CID 233740521. doi:10.1038/d41586-021-01170-0. Archivado desde el original el 22 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221122230552/https://www.nature.com/articles/d41586-021-01170-0
[116] ↑ Crafts, Nicholas (23 de septiembre de 2021). «Artificial intelligence as a general-purpose technology: an historical perspective». Oxford Review of Economic Policy (en inglés) 37 (3): 521-536. ISSN 0266-903X. doi:10.1093/oxrep/grab012. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124130718/https://academic.oup.com/oxrep/article/37/3/521/6374675
[117] ↑ 葉俶禎; 黃子君; 張媁雯; 賴志樫 (1 de diciembre de 2020). «Labor Displacement in Artificial Intelligence Era: A Systematic Literature Review». 臺灣東亞文明研究學刊 (en inglés) 17 (2). ISSN 1812-6243. doi:10.6163/TJEAS.202012_17(2).0002.: https://es.wikipedia.org//portal.issn.org/resource/issn/1812-6243
[118] ↑ Johnson, James (3 de abril de 2019). «Artificial intelligence & future warfare: implications for international security». Defense & Security Analysis (en inglés) 35 (2): 147-169. ISSN 1475-1798. S2CID 159321626. doi:10.1080/14751798.2019.1600800. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124125204/https://www.tandfonline.com/doi/full/10.1080/14751798.2019.1600800
[119] ↑ Kertysova, Katarina (12 de diciembre de 2018). «Artificial Intelligence and Disinformation: How AI Changes the Way Disinformation is Produced, Disseminated, and Can Be Countered». Security and Human Rights 29 (1–4): 55-81. ISSN 1874-7337. S2CID 216896677. doi:10.1163/18750230-02901005. Archivado desde el original el 24 de noviembre de 2022. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20221124125204/https://brill.com/view/journals/shrs/29/1-4/article-p55_55.xml
[120] ↑ Feldstein, Steven (2019). The Global Expansion of AI Surveillance. Carnegie Endowment for International Peace.
[121] ↑ The economics of artificial intelligence : an agenda. Ajay Agrawal, Joshua Gans, Avi Goldfarb. Chicago. 2019. ISBN 978-0-226-61347-5. OCLC 1099435014. Archivado desde el original el 15 de marzo de 2023. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20230315184354/https://www.worldcat.org/title/1099435014
[122] ↑ Whittlestone, Jess; Clark, Jack (31 de agosto de 2021). Why and How Governments Should Monitor AI Development. arXiv:2108.12427.: https://es.wikipedia.org//arxiv.org/abs/2108.12427
[123] ↑ a b Shevlane, Toby (2022). «Sharing Powerful AI Models | GovAI Blog». Center for the Governance of AI. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124125202/https://www.governance.ai/post/sharing-powerful-ai-models
[124] ↑ Askell, Amanda; Brundage, Miles; Hadfield, Gillian (10 de julio de 2019). The Role of Cooperation in Responsible AI Development. arXiv:1907.04534.: https://es.wikipedia.org//arxiv.org/abs/1907.04534
[125] ↑ Ziegler, Bart. «Is It Time to Regulate AI?». WSJ. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124125645/https://www.wsj.com/articles/is-it-time-to-regulate-ai-11649433600
[126] ↑ Reed, Chris (13 de septiembre de 2018). «How should we regulate artificial intelligence?». Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences (en inglés) 376 (2128): 20170360. Bibcode:2018RSPTA.37670360R. ISSN 1364-503X. PMC 6107539. PMID 30082306. doi:10.1098/rsta.2017.0360.: https://es.wikipedia.org//www.ncbi.nlm.nih.gov/pmc/articles/PMC6107539
[127] ↑ Belton, Keith B. (7 de marzo de 2019). «How Should AI Be Regulated?». IndustryWeek. Archivado desde el original el 29 de enero de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20220129114109/https://www.industryweek.com/technology-and-iiot/article/22027274/how-should-ai-be-regulated
[128] ↑ National Security Commission on Artificial Intelligence (2021), Final Report .
[129] ↑ National Institute of Standards and Technology (12 de julio de 2021). «AI Risk Management Framework». NIST. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124130402/https://www.nist.gov/itl/ai-risk-management-framework
[130] ↑ Richardson, Tim (2021). «Britain publishes 10-year National Artificial Intelligence Strategy». Archivado desde el original el 10 de febrero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230210114137/https://www.theregister.com/2021/09/22/uk_10_year_national_ai_strategy/
[131] ↑ a b «Guidance: National AI Strategy». GOV.UK. 2021. Archivado desde el original el 10 de febrero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230210114139/https://www.gov.uk/government/publications/national-ai-strategy/national-ai-strategy-html-version
[132] ↑ Office of the Director of National Intelligence; Office of the Director of National Intelligence, Intelligence Advanced Research Projects Activity. «IARPA - TrojAI». Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124131956/https://www.iarpa.gov/research-programs/trojai
[133] ↑ Turek, Matt. «Explainable Artificial Intelligence». Archivado desde el original el 19 de febrero de 2021. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20210219210013/https://www.darpa.mil/program/explainable-artificial-intelligence
[134] ↑ Draper, Bruce. «Guaranteeing AI Robustness Against Deception». Defense Advanced Research Projects Agency. Archivado desde el original el 9 de enero de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230109021433/https://www.darpa.mil/program/guaranteeing-ai-robustness-against-deception
[135] ↑ National Science Foundation. «Safe Learning-Enabled Systems». Archivado desde el original el 26 de febrero de 2023. Consultado el 27 de febrero de 2023.: https://web.archive.org/web/20230226190627/https://beta.nsf.gov/funding/opportunities/safe-learning-enabled-systems
[136] ↑ Mäntymäki, Matti; Minkkinen, Matti; Birkstedt, Teemu; Viljanen, Mika (2022). «Defining organizational AI governance». AI and Ethics (en inglés) 2 (4): 603-609. ISSN 2730-5953. S2CID 247119668. doi:10.1007/s43681-022-00143-x. Archivado desde el original el 15 de marzo de 2023. Consultado el 28 de noviembre de 2022.: https://web.archive.org/web/20230315184339/https://link.springer.com/article/10.1007/s43681-022-00143-x
[137] ↑ a b c Brundage, Miles; Avin, Shahar; Wang, Jasmine; Belfield, Haydn; Krueger, Gretchen; Hadfield, Gillian; Khlaaf, Heidy; Yang, Jingying; Toner, Helen; Fong, Ruth; Maharaj, Tegan; Koh, Pang Wei; Hooker, Sara; Leung, Jade; Trask, Andrew (20 de abril de 2020). Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims. arXiv:2004.07213.: https://es.wikipedia.org//arxiv.org/abs/2004.07213
[138] ↑ «Welcome to the Artificial Intelligence Incident Database». Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124132715/https://incidentdatabase.ai/
[139] ↑ Wiblin, Robert; Harris, Keiran (2022). «Nova DasSarma on why information security may be critical to the safe development of AI systems». 80,000 Hours. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124132927/https://80000hours.org/podcast/episodes/nova-dassarma-information-security-and-ai-systems/
[140] ↑ OpenAI (2 de junio de 2022). «Best Practices for Deploying Language Models». OpenAI. Archivado desde el original el 15 de marzo de 2023. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20230315184334/https://openai.com/blog/best-practices-for-deploying-language-models/
[141] ↑ a b OpenAI. «OpenAI Charter». OpenAI. Archivado desde el original el 4 de marzo de 2021. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20210304235618/https://openai.com/charter/
[142] ↑ Future of Life Institute (2016). «Autonomous Weapons Open Letter: AI & Robotics Researchers». Future of Life Institute. Archivado desde el original el 24 de noviembre de 2022. Consultado el 24 de noviembre de 2022.: https://web.archive.org/web/20221124114555/https://futureoflife.org/open-letter/open-letter-autonomous-weapons-ai-robotics/