AI Can't Accelerate What Data Doesn't Support

ETEN Innovation Lab
May 18
4 min read

The Data Commons - An Emerging Solution to Data Challenges

“At the ETEN Innovation Lab, we have, of course, been naturally looking at AI and its application in Bible translation over the last five or so years. And we realized pretty early on, like many others in the space, the technology really wasn’t the problem. The bigger challenge for all of us was data availability and data quality.” – Peter Huang, Executive Director of the Innovation Lab, at Missional AI 2026 Global Summit

Advances in AI are expanding the possibilities of accelerated, quality Bible translation. Tasks that once required significant time and effort can be supported by AI-assisted tools that equip teams in a variety of contexts. Adoption of technology by local communities is surely playing a major role in the progress ETEN is making in achieving the All Access Goals.

Yet progress is not driven by technology alone, as artificial intelligence is only as good as the data it relies on. And in the case of the Bible translation movement, as Peter says in the quote above, lack of quality data can be a challenge. As the role of AI continues to grow, the question we face is not just how to build better tools or how to scale their use around the world, but also how to ensure the right data exists and is being leveraged by these tools.

The Data Gap in Bible Translation

Not all AI-assisted translation approaches carry the same data requirements. General-purpose large language models (LLMs) are trained on vast, diverse data sets and can engage with low-resource languages without requiring a large corpus of language-specific data in advance. These models offer translation teams the opportunity to get work started and build language data as they go.

But for purpose-built translation models, which are systems trained specifically for Bible translation, such as Serval and VachanEngine (both currently in the utilization stage of exploration), curated, in-domain data is needed. If we are using this kind of AI-assisted technology in the translation process, data is necessary at each stage. For drafting, training AI systems requires Scripture data, or curated sentence sets, to generate initial text. For refining, improving the quality of drafts depends on language data to catch spelling errors, missing words, and basic formatting issues. For working within modalities beyond text (for oral and sign language work), besides written data, audio and visual data is needed too.

This presents a significant challenge given the global language landscape. There are more than 7,000 languages spoken around the world today, and only a fraction of them is well represented in the data that powers mainstream AI technology. As many of the languages with unmet All Access Goals are in low-resource contexts, little or no data is available to work with.

In many cases, translation work itself becomes the primary source of data. When teams translate, they are simultaneously building the very datasets that future tools will depend on. While this creates long-term opportunity, it also means that there is a current gap that can slow down progress.

The Need for Collaboration

Although some data does not yet exist, other needed data exists in fragmented forms. The global Bible translation community has spent generations developing language resources in the form of Scripture texts, glossaries, audio recordings, and more. This work, carried out by many partners even beyond Bible translation, represents a valuable foundation. But the resources are difficult to access, combine, and reuse effectively because they are spread across a variety of organizations and housed in different systems and formats.

“That means acceleration is not available to all of us because the data is fragmented and living in silos,” says Daniel Wilson, CEO at XRI, a partner in supporting the development of datasets for low-resource languages in Bible translation. No single organization holds enough of this data to fully address the need on its own. However, up until now, there has not been a shared way to build on the data that already exists.

At its core, ETEN is a collaborative alliance. We recognize that the vision of Scripture access requires more than a single organization. As the role of data becomes more central to the use of AI, the collaborative approach we take in translation work is also needed in how we steward language data and make it available for future use.

An Emerging Response: The Data Commons

To address the growing need for accessible, high-quality language data, new efforts are beginning to take shape. One response is the development of the Data Commons.

The Data Commons is an emerging, collaborative approach to sharing language data across the Bible translation and broader ministry ecosystem. It will serve as a centralized platform where partners can contribute to and responsibly access language data and resources. The goal is to collect and distribute data in more coordinated ways that support AI-assisted tools and translation workflows.

While initial discussions took place last year, the Data Commons is still in its early stages of development. It is being explored and shaped by ETEN partners, alongside the Innovation Lab and others.

Get Involved

By exploring new ways to share and use data across Bible translation efforts, continued acceleration of work toward meeting the All Access Goals becomes attainable. As the Data Commons takes shape in the coming months, there are opportunities for partners to engage by contributing existing resources, sharing insights from their context, shaping standards, and following along as it develops.

If you’re interested in learning more about how you can contribute to the Data Commons effort, please contact us at lab@eten.bible.

AI Can't Accelerate What Data Doesn't Support

The Data Commons - An Emerging Solution to Data Challenges

The Data Gap in Bible Translation

The Need for Collaboration

An Emerging Response: The Data Commons

Get Involved

Recent Posts

Comments

Menu