Image Credit: Nubelson Fernandes on Unsplash
Image Credit: Nubelson Fernandes on Unsplash

Welcome to the first post in the new #AskScholastica blog series, where Scholastica’s team answers questions from editors and publishers about digital journal publishing standards and best practices!

We’re starting with a query we frequently hear from small journal publishers: What are the benefits of producing journal articles in full-text XML, and should we start?

Let’s get to it!

Answer: Full-text XML benefits and when you need it

This question is a two-parter, so we’ll break it down, starting with a primer on full-text XML and its benefits:

What is full-text XML, and what are the benefits?

At the highest level, XML, which stands for Extensible Markup Language, is a text-encoding system that defines rules for structuring documents in a format that is readable by humans and machines (though it’s primarily intended for machines — since most humans don’t like to read computer code). The term “full-text” XML means that the entire textual content of a document is in the XML file, along with any other relevant metadata or structural information. In contrast, some XML documents may only contain specific elements or portions of a piece of content, leaving out the complete text (e.g., XML article-level metadata files).

There are specific types of XML for different use cases, often called DTDs. As explained by W3Schools, “DTD stands for Document Type Definition. A DTD defines the structure and the legal elements and attributes of an XML document.”

In scholarly journal publishing, the predominant XML DTD is the Journal Article Tag Suite (a.k.a. JATS) maintained by the National Information Standards Organization. There are then various “flavors” of JATS for specific scenarios from there, including the Archiving and Interchange Tag Set (JATS-AI) and the Journal Publishing Tag Set (JATS-JPub). These variations cater to different stages in the publishing process, such as archiving, interchange, and final publication. Some services also have custom XML specifications in addition to JATs, like PubMed Central (NLM DTD) and Silverchair (SCJATS).

Full-text XML is often necessary in scenarios where the entire content of a document needs to be preserved or processed in a structured way. For example, PubMed Central requires full-text XML article file deposits.

Among the primary benefits of full-text XML are:

  • Discoverability: Search engines and databases can better index and retrieve relevant information from full-text XML files.
  • Interoperability: Scholarly content encoded in full-text XML can be exchanged and integrated into various publishing workflows, databases, and archives more easily than DOCX or other files (though it often requires formatting the XML into different DTDs or “flavors” before it’s ready!).
  • Accessibility: structured content enhances accessibility for users, including those with specific needs, such as individuals using assistive technologies.
  • Adaptability: Full-text XML is adaptable to different publishing platforms and presentation formats. It allows publishers to deliver content in multiple ways, catering to diverse user preferences and devices.

For the above reasons, many scholarly publishing organizations/initiatives recommend producing full-text XML, including Plan S. We cover answers to common questions about Plan S requirements and recommendations here.

When do journals need full-text XML (and when is it a “nice to have”)?

Many archives and indexes require XML article-level metadata deposits to process content, such as Medline. And even in cases where an index can accept manual metadata inputs, depositing XML metadata files (e.g., via an FTP server or an API integration) is more efficient and less prone to error. With that said, producing at least basic XML article-level metadata files is a best practice all journals should work towards.

However, not all journals necessarily need full-text XML, though all could benefit from it as discussed above.

In the world of scholarly journal publishing, the primary use cases for full-text XML are content archiving, indexing, interchange, and retrieval. With that said, the short answer to the question, “when do journals need full-text XML?” is:

  • If they want to submit content to a database or platform that requires it (e.g., PubMed Central)
  • If they have a high concentration of authors who would like to extract and analyze information from articles programmatically via text and data mining
  • If they need full-text XML to make their hosted articles comply with accessibility standards (something all should strive for regardless of requirements!)
  • When formatting articles in full-text XML makes it easier for the journal to produce rich metadata to improve the preservation and discoverability of its content without more work (e.g., the journal is using a production software/service that produces full-text XML by default with robust metadata elements)

Putting it all together

Ultimately, the decision of whether a journal requires full-text XML will depend on the specific archiving, indexing, discovery, and accessibility needs and preferences of that title and its publisher.

For more information on XML use cases in scholarly publishing, we encourage you to check out our recent webinar with the University of Oregon Libraries and GW College of Professional Studies, “When XML Marks the Spot: Machine-readable Journal Articles for Discovery and Preservation” — now available to watch on-demand here.

Have a question? #AskScholastica: We hope you found this first #AskScholastica blog post helpful! Do you have a question about a technical journal publishing standard or recommendation? Submit it via this form or post your query on LinkedIn or X (formerly Twitter) with #AskScholastica, and we’ll do our best to answer!

You can read the next post in the #AskScholastica blog series answering the question, “Should we import past and pending manuscripts from our old peer review system into our new one?” here.

Webinar Connecting Scholarly Metadata