Transparency Evaluation Reveals Low Scores for Leading AI Models from OpenAI, Meta, and Google

A new assessment reveals alarmingly low scores for OpenAI, Meta, and Google on Stanford’s recent AI transparency evaluation. This evaluation, conducted in collaboration with MIT and Princeton, introduces the Foundational Model Transparency Index (FMTI) to shed light on the inner workings of AI models, urging companies to enhance transparency in their AI systems.

The FMTI scrutinizes the top 10 AI models across 100 different transparency-related dimensions, encompassing aspects like their construction, training data, computational requirements, as well as policies regarding model usage, data protection, and risk mitigation. For an in-depth look at the metrics and methodology, you can refer to the accompanying 110-page paper.

The collective score across all models disappointingly averaged just 37%, signaling a stark deficiency in transparency across the board. None of the models came close to achieving a satisfactory level of transparency.

Meta’s Llama 2 model secured the highest score at 54 out of 100 points, although it’s emphasized that Meta should not serve as the sole benchmark for transparency. Instead, the aim should be for all AI entities to strive for scores of 80, 90, or even 100.

Bloomz, a model hosted on Hugging Face, secured second place with a score of 53, followed by OpenAI’s GPT-4 with a score of 48. The study criticizes OpenAI for its lack of transparency concerning various aspects of its flagship model, GPT-4, despite the “open” in its name.

Stability AI’s Stable Diffusion 2 model came in fourth with a score of 47, and Google’s PaLM2 model, which powers Bard, rounded out the top five with a score of 40.

Notably, Stanford engaged in discussions with each company’s leadership to review and possibly amend scores if they contested the results, fostering transparency and dialogue.

The evaluation indicates that open models outperformed closed models. Models are considered “open” if their code is publicly accessible as open-source software. Llama 2 and Bloomz, two of the highest-ranking models, are both open, while GPT-4 is an example of a closed model.

The report highlights that one of the contentious policy debates in AI pertains to whether models should be open or closed. Stanford aspires to use the Foundational Model Index to influence positive policy changes and plans to release the FTMI annually, with 2023 marking its inaugural year.

Nine out of the 10 evaluated companies have committed to the White House’s responsible AI initiatives. The FMTI is anticipated to encourage these companies to honor their commitments. Moreover, it is expected to provide valuable insights for the European Union as it crafts its next iteration of the AI Act, guiding legislative and regulatory decisions.

Transparency Evaluation Reveals Low Scores for Leading AI Models from OpenAI, Meta, and Google

Newer Articles

Older Articles