Student sample for assessment
Written by a Year 10 student in Cairns, Queensland, Australia.
This submission argues in favour of requiring AI companies to publicly disclose the data sources used to train their models, on the grounds that transparency about training data is a precondition for meaningful accountability, informed consent and accurate public understanding of AI capabilities and limitations. The case for mandatory disclosure rests on three related arguments. First, AI systems trained on data that is undisclosed cannot be adequately audited for bias, errors or harmful content: if the training data is unknown, evaluating the reliability and fairness of the system’s outputs is impossible. Second, much of the data used to train AI systems has been taken from creators — writers, artists, musicians and coders — without their knowledge or consent. Disclosure is a necessary precondition for any framework of consent or compensation for the use of creative work. Third, public understanding of AI capabilities and limitations is significantly distorted when the provenance of training data is unknown: people cannot accurately assess what an AI system can and cannot do if they do not know what it has learned from. Disclosure does not solve these problems but it is a necessary precondition for addressing them. The most significant objection to mandatory disclosure is that training datasets contain commercially sensitive information — including proprietary data curation strategies and partnerships — that would be exposed to competitors through public disclosure. This objection has merit, and it does not require disclosure of the training methodology or proprietary filtering processes. What it requires is disclosure of the categories and sources of training data at a level of specificity sufficient to allow independent audit and public understanding. This is not a novel regulatory requirement: pharmaceutical companies are required to disclose the ingredients of their products without disclosing the manufacturing processes that combine them. The inquiry is invited to recommend mandatory training data disclosure at the category-and-source level, with a defined exemption process for genuinely commercially sensitive specifics, and with independent audit rights to verify the completeness of disclosure.