Mira is an integrated research environment for scholars working with primary and secondary sources across heterogeneous collections. It combines resource acquisition, archival management, full-text reading, model-assisted analysis, and structured composition within a single application — replacing the fragmented workflow of separate PDF viewers, browser tabs, reference managers, and word processors that characterises most contemporary research practice.
Large language models are prone to confabulation — producing plausible text that has no basis in any source document. When researchers rely on general-purpose LLM tools for textual work, they often lack the means to determine whether a quoted passage is genuine, whether a cited reference exists, or whether an argument has been faithfully represented.
Mira addresses this problem architecturally. When a document enters the system, it is segmented into discrete excerpts that preserve the original text verbatim. These excerpts constitute the evidence base for all subsequent operations: they are what the language model receives as input, what appears in search results, and what transfers into drafts. At no stage does the model generate or modify source text. The passages that reach the Writing Desk are identical to those extracted during initial acquisition, and each can be traced to its point of origin in the source document.
The integrity of digital texts is likely to become a defining methodological concern of the coming decades. States, corporations, and automated systems all possess both the motive and the increasingly accessible means to alter or selectively present historical documents. Mira operates on the assumption that verifiability is not a convenience but a precondition of rigorous scholarship.
Mira is not coupled to any single language model or commercial provider. The application supports whatever models the researcher has access to: small open-source models running on local hardware, cost-effective services such as DeepSeek, providers operating under EU data-protection regulations, or frontier models accessed through individual API credentials.
A growing number of universities now host open-source language models on institutional infrastructure. Virginia Tech, for instance, operates over forty models through its Advanced Research Computing programme, available to students and faculty at no cost with no data transmitted to external parties. As institutional hosting becomes more prevalent, Mira is designed to integrate with these services as they emerge.
The application also employs machine learning to improve the effectiveness of smaller models — optimising not the model itself, but the architecture of its interaction with the researcher’s materials: context selection, query formulation, and result evaluation. A well-configured affordable model can, in many cases, approximate the performance of a considerably more expensive one.
Mira is free to use, distribute, and modify. The source code is publicly available and the project is not operated for profit.
Where a commercial language model is accessed through the application, costs are disclosed in full. API charges are passed through from the provider; any markup is stated and allocated to hosting and infrastructure. Pre-configured model access is offered as an option for researchers who prefer not to manage their own API credentials, but the application is designed to function without it — through institutional resources, existing subscriptions, or locally hosted open-source models.
The following describes a long-term aspiration for the project, not infrastructure that exists today.
The concentration of computational resources within a small number of corporate entities represents a structural constraint on independent scholarship. Researchers increasingly depend on proprietary systems for the tools of their work — systems whose terms of access, pricing, and data practices are subject to change without consultation or consent.
We hope to orient Mira’s future development toward an alternative model: federated infrastructure owned and governed by the communities that use it. Such a model would entail collectively maintained hardware, shared processing — so that the computational work of preparing a text need not be duplicated across every individual machine — and, in time, open-source language models trained on materials curated by the academic community itself. None of this exists yet; it is a direction we would like to pursue.
The objective would be research infrastructure that operates in the service of scholarship rather than of shareholders — infrastructure over which researchers exercise material, not merely nominal, control.