Unfortunately, human errors are bound to happen. Checklists allows one to verify that all the required actions are correctly done, and in the correct order. The military has it, the health care sector has it, professional diving has it, the aviation and space industries have it, software engineering has it. Why not artificial intelligence practitioners?
In October 1935, the two pilots of the new Boeing warplane B-17 were killed in the crash of the aircraft. The crash was caused by an oversight of the pilots, who forgot to release a lock during the takeoff procedure. Since then, following the checklist during flight operations is mandatory and reduced the number of accidents. During the Apollo 13 mission of 1970, carefully written checklists mitigated the oxygen tank explosion accident.
In healthcare, checklists are widespread too. For example, the World Health Organization released a checklist outlining the required steps before, during, and after a surgery. A meta-analysis suggested that using the checklist was associated with mortality and complication rates reduction.
Because artificial intelligence is used for increasingly important matters, accidents can have important consequences. During a test, a chatbot suggested harmful behaviors to fake patients. The data scientists explained that the AI had no scientific or medical expertise. AI in law enforcement can also cause serious trouble. For example, a facial recognition software mistakenly identified a criminal, resulting in the arrest of an innocent, an algorithm used to determine the likelihood of crime recidivism was judged unfair towards black defendants. AI is also used in healthcare where a simulation of the Covid-19 outbreak in the United Kingdom shaped policy and led to a nation-wide lockdown. However, the AI simulation was badly programmed, causing serious issues. Root cause analysis determined that the simulation was not deterministic and badly tested. The lack of checklist could have played a role.
Just like in the aforementioned complex and critical industries, checklists should be leveraged to make sure the building and reporting of AI models includes everything required to reproduce results and make an informed judgement, which fosters trust in the AI accuracy. However, checklists for building prediction models like TRIPOD are not often used by data scientists, even though they might help. Possible reasons might be ignorance about the existence of such checklists, perceived lack of usefulness or misunderstandings caused by the use of different vocabularies among AI developers.
Enforcing the use of standardized checklists would lead to better idioms and practices, thereby fostering fair and accurate AI with a robust evaluation, making its adoption easier for sensitive tasks. In particular, a checklist on AI should include points about how the training dataset was constructed and preprocessed, the model specification and architecture and how its efficiency, accuracy and fairness were assessed. A list of all intended purposes of the AI should also be disclosed, as well as known risks and limitations.
As a novel field, one can understand why checklists are not widely used for AI. However, they are used in other fields for known reasons, and taking notes from past mistakes and ideas would be great this time.