Skip to contents

Overall popularity of Java-dependent R packages

While not as popular as widely known and highly popular ggplot2 and data.table packages, and rJava-dependent packages (104 on CRAN and another 14 on Bioconductor) (and consequently, rJava itself ) are widely used in the R community (see Figure 1). The rJava (Urbanek 2024) itself was downloaded 107,725 times in September, 2024. The total number of downloads for rJava-dependent packages was 178,185 on CRAN and 3,565 on Bioconductor. To put this into context, ggplot2 was downloaded 1,329,676 times and data.table was downloaded 729,796 times in September, 2024. So rJava-based packages collectively are 7.5 times less popular than ggplot2, but they have a noticeable number of users.

Figure 1: CRAN Downloads Over Time for rJava and rJava-dependent Packages, compared to popular ggplot2 and data.table for context

Note that the analysis above only covers packages that are available on CRAN and Bioconductor and only those that explicitly depend on the rJava package. There are other packages that use Java but do not depend on the rJava package. For example, the {opentripplanner} package also relies on underlying Java-based software but calls it from the command line. This, however, also requires system environment variables to be set up correctly.

Identifying packages such as opentripplanner is more complicated, as they do not have a direct dependency on the rJava package. We can assume that there are not as many of them compared to those that depend on rJava.

Individual Java-dependent packages

If we zoom in to the individual rJava-dependent packages, we will see in Figure 2, that most downloads are generated by xlsx and its “companion” xlsxjars.

Figure 2: CRAN Downloads Over Time for all rJava-dependent Packages

If we remove xlsx (and xlsxjars) as an outlier, we will see in Figure 3), that top packages are:

  • r5r for “rapid realistic routing on multimodal transport networks (walk, bike, public transport and car)” (Pereira et al. 2021). The package users experience multiple issues with Java and report them on GitHub, just few examples include 1, 2, 3 and many more.

  • RJDBC that “[p]rovides Access to Databases Through the JDBC Interface” (Urbanek 2022). I was not able to find a bug tracker for this package, but a simple web search reveals multiple issues such as this one on StackOverflow.

  • mailR for “send[ing] emails from R” (Premraj 2021) (has a Java related issue on GitHub). Web search also reveals StackOverflow discussions related to Java version issues.

  • RWeka, R interface to Weka. Weka itself “is a collection of machine learning algorithms for data mining tasks written in Java(Hornik, Buchta, and Zeileis 2009). StackOverflow discussions related to Java version issues.

Figure 3: CRAN Downloads Over Time for top 20 rJava-dependent Packages

Some other packages:

To summarize, regardless of the Java-dependent R package being used, users consistently encounter issues with having the correct Java runtime installed on their system. Additionally, they may be using various R packages that depend on different Java versions, complicating the management of Java environment variables. This task is particularly challenging for ordinary users who simply want to get their analysis running smoothly and efficiently.

rJavaEnv R package as a solution

rJavaEnv aims to assist users of all Java/rJava-dependent packages by providing functions to quickly install the required Java version and set environment variables. This ensures that the packages the user plans to use pick up the correct Java version with minimal intervention to the user’s system. Compared to manually downloading Java from Oracle, Amazon, or another vendor and installing it using the installer, rJavaEnv downloads non-installer archives of Java, extracts them to a cache folder, and links them in the current project or working directory. This way, rJavaEnv does not contaminate the user’s machine with unnecessary installations and configurations.

Furthermore, rJavaEnv streamlines the process, allowing users to focus on their analysis without worrying about complex Java setup issues. By automating these tasks, rJavaEnv reduces the potential for errors and ensures a smoother experience for users who need to manage multiple Java-dependent R packages.

References

Dragulescu, Adrian, and Cole Arendt. 2020. Xlsx: Read, Write, Format Excel 2007 and Excel 97/2000/XP/2003 Files. https://CRAN.R-project.org/package=xlsx.
Hornik, Kurt. 2019. openNLP: Apache OpenNLP Tools Interface. https://CRAN.R-project.org/package=openNLP.
Hornik, Kurt, Christian Buchta, and Achim Zeileis. 2009. “Open-Source Machine Learning: R Meets Weka.” Computational Statistics 24 (2): 225–32. https://doi.org/10.1007/s00180-008-0119-7.
Pereira, Rafael H. M., Marcus Saraiva, Daniel Herszenhut, Carlos Kaue Vieira Braga, and Matthew Wigginton Conway. 2021. “R5r: Rapid Realistic Routing on Multimodal Transport Networks with R5^{\textrm{5}} in r.” Findings, March. https://doi.org/10.32866/001c.21262.
Premraj, Rahul. 2021. mailR: A Utility to Send Emails from r. https://CRAN.R-project.org/package=mailR.
Urbanek, Simon. 2022. RJDBC: Provides Access to Databases Through the JDBC Interface. https://CRAN.R-project.org/package=RJDBC.
———. 2024. rJava: Low-Level r to Java Interface. https://CRAN.R-project.org/package=rJava.