Introduction to texor
tools for converting LaTeX source files into RJ-web-articles
Source:vignettes/introduction-to-texor.Rmd
introduction-to-texor.Rmd
Introduction
texor is a proposed package that can help in converting old LaTeX based documents, research papers to HTML through intermediate conversions. This is particularly a problem for R Research papers where HTML export was not available and hence modern compatibility to export a HTML file was missed out. To bring these documents online HTML based webpages are a much better alternative as opposed to PDFs and to solve the lack of web versions texor will provide a mechanism.
Conversion Workflow:
Stage 0 : preparing to convert
In this stage, we will check the basics like using correct path, normalizing the path,extracting the file_name/ wrapper_name etc..
# normalizing path using xfun package
dir <- xfun::normalize_path(dir)
# getting wrapper file name
wrapper_file <- texor::get_wrapper_type(dir)
# getting the main LaTeX file name
file_name <- texor::get_texfile_name(dir)
Stage 1 : Copying/Removing Style files
Pandoc does not need, all of the style files as it is not trying to compile, but rather convert. Hence, to workaround certain limitations, we have to remove the RJournal.sty file and include a new style file which redefines certain commands.
# This function will remove RJournal.sty file,
# Copy the Metafix.sty file and link it in wrapper.
texor::include_style_file(dir)
Stage 2 : Handling Bibliographies
As we do not desire the embedded bibliography to be included as a div element in the article itself, we need to convert it to Bibtex format.
For removing the bibliography div elements from the article we use a Lua filter later on.
For converting the embedded bibliography we use rebib package. By default I have set up the bibliography aggregation function, which will logically create/update the bibtex file and include it in the article_tex_file as well (if not linked).
# bibliography aggregation when both bibtex and embedded bibliography available,
# Using Bibtex file for bibliography if no embedded bibliography available,
# Create a new bibtex file using the embedded bibliography in the document.
rebib::aggregate_bibliography(dir)
Stage 3 : Handing Figures
Texor package creates a yaml report about the figure environments, including tikz, algorithm2e images. There is also a logical function which uses pandoc’s Image data for converting PDF images to PNG.
data <- texor::handle_figures(dir, file_name)
Stage 4 : Patching Environments
Pandoc does not support certain environments, like:
in figures : figure*, algorithmic, algorithm. in table : table*.
in code : example, example*, Sin, Sout, Scode, Sinput, Soutput,
smallverbatim, boxedverbatim.
Here, texor will use the stream editor to patch these environments to
the default types figure
,table
and
verbatim
.
There is also a function to patch equations (especially eqnarray environment).
texor::patch_code_env(dir)
texor::patch_table_env(dir)
texor::patch_figure_env(dir)
texor::patch_equations(dir)
Stage 5 : Converting the LaTeX document to Markdown
Here we will convert the document to Markdown, with a lot of Lua filters modifying the document.
pre_conversion statistics will read through the article and find statistical elements like figure, table, code blocks, citations etc.
meta <- texor::pre_conversion_statistics(dir)
texor::convert_to_markdown(dir)
Stage 6 : Copying over dependency files
This function will copy the files such as figures of all kinds, bibtex file, pdfs etc. to the /web folder.
texor::copy_other_files(dir)
Stage 7 : Converting the Markdown to Rmarkdown
In this stage we convert the markdown to Rmarkdown by reading and adding metadata information like ctv,CRANpkgs,BIOpkgs,slug,author metadata, title, abstract,etc..
We also add important parameters for
rjtools::rjournal_web_article
like:
- self_contained = TRUE
- toc = FALSE
- legacy_pdf = TRUE
- web_only = TRUE
- mathjax = “https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js”
texor::generate_rmd(dir)
Stage 8 : Creating RJ-web-article from the Rmarkdown file
texor::produce_html(dir)
Note on Dependencies
This package is involved with solving conversion problems on multiple fronts, thus has to rely on multiple software tools. A list with reasoning is included here:
- Pandoc (min: V3.1) 1 : To convert the basic LaTeX to rmarkdown.
- poppler pdfutils : To convert embedded PDF to PNG.
- LaTeX/TinyTex2 : For Tikz and PDF article generation.
- rebib : For bibliography conversions/aggregations.
- rjtools : For the rjournal_web_article template.
- stringr : for string manipulation functions.
- Lua filters 3 4 : Filters for manipulation of ingested files.