CALIF: A Pedagogical Framework for Integrating AI Literacy into University Curricula

1. Introduction: AI Literacy as an Epistemological Emergency in Higher Education

The pervasive penetration of generative artificial intelligence systems into academic practices has generated an epistemological fracture that higher education institutions have not yet managed to bridge. This is not simply a technological problem — the question of students' familiarity with tools like ChatGPT, Gemini, or Claude — but a foundational one: higher education is training professionals who daily use probabilistic language generation systems without possessing the conceptual tools to understand their nature, assess their reliability, and govern their impact on their own professional decisions.

The empirical data are eloquent and rapidly evolving. The UNESCO 2023 report documents that over 70% of university students in OECD countries use generative AI tools for academic activities, a percentage that according to updated 2025 estimates now exceeds 85% in the Anglophone world. Yet less than 15% of these students are able to formulate a structured critical evaluation of these systems' output. The systematic review by Zawacki-Richter et al. (2019), published in the International Journal of Educational Technology in Higher Education, had already documented how AI adoption in higher education was growing exponentially — while faculty preparedness to manage its pedagogical impact remained essentially unchanged. Six years later, this asymmetry has not only not narrowed but has dramatically amplified with the arrival of conversational Large Language Models.

Kasneci et al. (2023), in their analysis published in Learning and Individual Differences, further highlighted how ChatGPT and large-scale language models are transforming the educational landscape at a speed that structurally exceeds academic institutions' adaptive capacity. The problem is not speed per se — it is the absence of a shared pedagogical framework enabling institutions to respond in a coordinated rather than fragmented manner.

It is in this context that I developed the CALIF framework (Comprehensive AI Literacy Integration Framework), presented in a peer-reviewed article published in Computers and Education: Artificial Intelligence, Volume 7 (2026). The framework's objective is not to add yet another "introduction to AI" course to already overloaded curricula, but to propose an integration architecture that permeates existing courses, transforming AI literacy from a sectoral competency into a transversal one on par with academic writing or statistical reasoning.

2. Positioning in the Literature and Gap Identification

The literature on AI in education is vast but presents a structural gap that the CALIF framework aims to fill.

The first research strand, represented by Long and Magerko (2020) with their foundational work "What is AI Literacy?", established definitional foundations identifying competencies an informed citizen should possess for critical interaction with AI systems. However, their framework focuses on general population literacy without addressing higher education specificities — where the question is not so much understanding what AI is, but critically integrating it into disciplinary professional practices.

The second strand, represented by Ng et al. (2021) in their review published in Computers and Education, proposed an AI literacy conceptualization articulated across four dimensions: AI knowledge and understanding, AI use and application, AI evaluation and creation, and AI ethics. This taxonomy is valuable as an analytical framework but provides no operational guidance on how to integrate it into existing curricula — a step requiring not only pedagogical theory but also curricular architecture and validated assessment instruments.

The third, more recent strand addresses the specific impact of generative AI on academic integrity. Data emerging from this corpus are alarming: a 2025 UK national survey revealed that 92% of students use AI in some form, and 88% have specifically used generative AI for coursework. The University of Reading study demonstrated that 94% of AI-generated work goes undetected by currently deployed detection systems. These data confirm that the problem cannot be contained through repressive measures — detection is structurally behind generation — but requires a fundamental rethinking of the relationship between student, knowledge, and AI tools.

The gap the CALIF framework aims to fill lies precisely at the intersection of these three strands: how to move from the theoretical definition of AI literacy (Long and Magerko), through a multidimensional taxonomy (Ng et al.), to an operational curricular architecture that institutions can adopt without restructuring their degree programs.

3. Theoretical Foundations: Bloom's Taxonomy as Shared Grammar

The choice to anchor the CALIF framework to Bloom's taxonomy as revised by Anderson and Krathwohl (2001) is neither decorative nor conventional. It is an architectural decision motivated by three specific reasons.

The first reason is pragmatic: Bloom's taxonomy is the most widely adopted instructional design framework in higher education globally. Faculty know it, curriculum committees use it to define learning outcomes, accreditation bodies require it in quality documentation. Anchoring AI literacy to this taxonomy means speaking the language institutions already speak, drastically reducing the adoption barrier.

The second reason is epistemological: the cognitive progression Bloom's taxonomy describes — from remembering to creating, through understanding, applying, analyzing, and evaluating — corresponds to the natural progression of competency with AI systems. A student beginning to interact with an LLM traverses the same cognitive phases: first understanding what it is (remembering/understanding), then using it within their domain (applying), then critically evaluating its output (analyzing/evaluating), and finally contributing to governance of its use in their professional field (creating). This isomorphism is not coincidental but a consequence of the fact that Bloom's taxonomy describes a universal learning structure, not a domain-specific one.

The third reason is political, in the academic sense: frameworks proposing new proprietary taxonomies, however intellectually stimulating, encounter adoption resistance because they require faculty to learn a new language and curriculum committees to restructure their processes. A framework grafting onto an already consolidated taxonomy transforms the faculty task from "learning a new system" to "integrating a new dimension into a system I already command." The difference in real-world adoption probability is significant.

4. Framework Architecture: Four Levels of Progressive Integration

The CALIF framework is articulated across four levels corresponding to progressive degrees of competency and curricular integration. The progression is not rigidly sequential — a course can operate simultaneously on multiple levels — but is conceptually ordered: each successive level presupposes the competencies of the previous one.

Level 1 — Foundational Awareness

The student understands what AI systems are, how they function at a conceptual (not implementational) level, and what their capabilities and limitations are. This level corresponds to Bloom's remembering and understanding cognitive processes.

The objective is not to train AI engineers but informed professionals who can distinguish a deterministic system from a probabilistic one, a model trained on data from a model programmed with explicit rules. The law student at this level understands that an LLM generates statistically plausible text, not legally correct text. The medical student understands that a diagnostic support system operates by statistical correlation, not clinical reasoning. This distinction — apparently elementary — is absent in the training of the vast majority of professionals who today use these tools.

Level 1 requires minimal curricular investment: 4-6 hours distributed in the first semester of any degree program, integrated into an existing course rather than offered as a standalone module. The key is that content must be disciplinary, not generic: an architecture student receives examples from architecture, not from programming.

Level 2 — Disciplinary Critical Evaluation

The student can analyze an AI system's output within the specific context of their discipline. Here applying and analyzing processes intervene: the student not only understands what AI is but uses it and critically evaluates its results in light of disciplinary knowledge.

A jurist at Level 2 can interrogate an LLM on a civil law question, identify juridical hallucinations in the output (citations of nonexistent rulings, erroneous normative references, plausible but unfounded legal reasoning), and document the output's limitations in a structured critical note. A physician at Level 2 can use an AI diagnostic support system, compare the algorithmic suggestion with available clinical evidence, and identify training biases potentially influencing the recommendation (for example, a system predominantly trained on adult patient data producing inadequate recommendations for pediatric patients — a problem I have extensively documented in my research on AI in pediatric medicine).

Level 2 is the framework's operational core, because it is the level where the disciplinary professor becomes protagonist. An AI expert is not needed to teach an economics student to critically evaluate an LLM-generated financial analysis — an economics professor who knows what to look for is needed. Faculty training at this level focuses not on AI itself but on AI's failure modes in their specific domain.

Level 3 — Conscious Operational Integration

The student uses AI tools as amplifiers of their disciplinary work, understanding their boundaries and documenting their use. The dominant cognitive process is evaluating: the student actively chooses when to use AI and when not to, basing the decision on a contextual cost-benefit assessment.

A social researcher at Level 3 uses NLP models for sentiment analysis on large text corpora, but knows the model was predominantly trained on Anglophone text and that its application to Italian or dialectal corpora requires specific methodological cautions. An architect at Level 3 uses generative AI to explore design variants, but knows the model tends to produce solutions convergent toward dominant styles in the training dataset, and actively compensates this tendency in their design practice.

At this level, the academic integrity question assumes a different connotation from the purely punitive one. The student is not trained to "not use AI" (a now unsustainable position) but to transparently document how and why they used it, in a process logic rather than a product logic. The analogy is with the calculator in mathematics: no engineering professor bans calculators, but every professor requires the student to know what they are calculating and why.

Level 4 — Ethical Reflection and Governance Contribution

The student actively participates in the debate on AI governance in their profession. The cognitive process is creating: the student produces normative artifacts, not merely technical ones. They draft usage policy proposals for their professional context, contribute to disciplinary ethical guidelines, identify biases in datasets relevant to their area, and propose mitigation strategies.

This level is reserved for advanced study cycles (master's degrees, doctorates, continuing professional education) and represents the point where AI literacy transforms from individual competency to civic competency. A magistrate at Level 4 not only can evaluate the output of an AI risk assessment system in juvenile justice, but also can contribute to the normative debate on whether and how such systems should be used, what transparency constraints they should respect, and what audits they should pass before deployment.

5. Development and Validation Methodology

The CALIF framework's development and validation followed a mixed-methods research protocol structured in four complementary phases, designed to ensure result triangulation and replicability.

The first phase consisted of a systematic literature review analyzing 247 academic sources from five principal databases: IEEE Xplore, ACM Digital Library, Scopus, Web of Science, and ERIC. Inclusion criteria required peer-reviewed publications in English, Italian, French, and Spanish, from 2018 to 2025, with explicit focus on AI literacy, AI in higher education, or AI pedagogical frameworks. The objective was twofold: mapping the state of the art and systematically identifying gaps in the existing literature justifying development of a new framework.

The second phase involved 48 semi-structured interviews with university faculty from 12 different disciplines, selected through stratified sampling by disciplinary area (hard sciences, social sciences, humanities, professional disciplines including medicine, law, and engineering). Interviews, with a mean duration of 55 minutes, were guided by a 14-question protocol organized across three thematic areas: perception of AI in their teaching, resistance and barriers to integration, and resources perceived as necessary. Thematic analysis was conducted following the Braun and Clarke (2006) protocol, with independent coding by two reviewers and discrepancy resolution by consensus.

The third phase involved a three-round Delphi study with 23 international experts — researchers in AI education, pedagogists, curriculum design experts, and educational technology professionals — from 11 countries. The Delphi was aimed at iterative validation of the four framework levels, calibration of assessment metrics, and identification of institutional prerequisites for adoption. The required consensus level for each item's stabilization was 75% (median ≥ 4 on a 5-point Likert scale).

The fourth phase involved a pilot implementation with 342 students distributed across 6 courses in 4 different faculties, with a single-group pre-post design. During the pilot, the AILAS scale (AI Literacy Assessment Scale) was developed and validated, comprising 28 items across 4 subscales corresponding to the four framework levels. The scale's psychometric properties proved robust: overall Cronbach's alpha is .91, with subscale values ranging between .83 and .89. Confirmatory factor analysis supported the four-factor structure with acceptable fit indices (CFI > .95, RMSEA < .06). Pilot results showed statistically significant increases in critical evaluation competencies for AI output, with moderate effect sizes (Cohen's d) for levels 1 and 2 and small but significant for level 3.

6. Design Principle: Lateral Integration

The CALIF framework's fundamental architectural principle is lateral integration: AI literacy is not added as a standalone course — a choice that would be perceived as yet another didactic burden and would encounter resistance both in curriculum committees and among students — but is incorporated into existing courses through thematic modules calibrated to the four levels.

A Private Law course, for example, can integrate a Level 2 module (8 hours in the semester) on critical evaluation of AI-generated legal opinions, with practical exercises where students receive an AI opinion on a real case and must identify juridical hallucinations, normative inaccuracies, and argumentative gaps. A Diagnostic Imaging course can integrate a Level 3 module (12 hours) on using AI systems as reporting support, with specific attention to training biases and AI use documentation in clinical reports.

This architecture resolves the disciplinary scalability problem: the framework is invariant across faculties, but contents, exercises, and assessment criteria are discipline-specific. The Sociology professor does not need to become an expert in neural networks to teach AI literacy to their students — they need to understand the four levels and apply them to their domain. Estimated necessary faculty training is 16-24 hours of disciplinary workshop, not a specialized master's degree.

7. Limitations, Future Directions, and Connections to Ongoing Research

The CALIF framework presents limitations that must be declared with full transparency, consistent with the epistemological principle guiding all my research activity: truth is law.

The pilot study sample (342 students, 6 courses, 4 faculties) was recruited in a single Western European university context. Generalizability of results to different cultural contexts — particularly to Global South university systems, where access to AI technologies is uneven and pedagogical traditions differ — requires specific validation with representative samples.

The AILAS scale, though presenting robust psychometric properties in its current version, was validated in a single language (Italian) and requires translation, cultural adaptation, and cross-linguistic revalidation before it can serve as an international comparative instrument. The four-factor structure, though supported by confirmatory factor analysis, should be verified through exploratory factor analysis on independent samples to exclude overfitting effects on the development sample.

The framework does not explicitly address pre-university AI literacy, a research area I have treated separately and extensively in my work on AI's impact on minors, where I proposed the ABDI framework (Attachment-Based Digital Interaction) for understanding how children form emotional bonds with AI systems and what implications this has for cognitive and emotional development. The connection between ABDI (focus 0-18 years) and CALIF (focus 18+ years) outlines an AI literacy continuum covering the entire education span, from early childhood to continuing professional development — a continuum that future research must explore through dedicated longitudinal studies.

Future research must focus on three principal directions. The first is longitudinal validation of the framework's effects on post-graduation professional competencies: do CALIF-trained students make better professional decisions in AI-involving contexts? The second is adapting the framework to continuing professional education contexts (practicing physicians, lawyers, engineers), where resistance to technological updating is typically stronger than in university settings. The third is developing automated assessment instruments integrating AI into the evaluation of AI literacy itself — closing the epistemological circle so that the measurement instrument is consistent with the measured object.

Methodological Note

The complete paper was published as a peer-reviewed article in Computers and Education: Artificial Intelligence, Volume 7, February 2026. The mixed-methods methodology was designed to ensure result triangulation and protocol replicability. All sources cited in the paper and in this analysis are real and verifiable in international academic databases. The author is sole and no co-authors are present.

Giuseppe Siciliani Independent Cybersecurity Researcher & AI Consultant, Milan Media Lives Cybersecurity Research Lab (MLCSL), Media Lives S.r.l.