Beyond the Chatbox

Beyond the Chatbox: LLM Coding and Research Agents for Academics

Introduction

Warning

THIS IS STILL WORK IN PROGRESS - EXPECT BROKEN LINKS AND MISSING CONTENT

© Image generated by Google Gemini | Nanobanana

Overview

This interactive tutorial and workshop provides a practical introduction to using large language models (LLMs) in coding agents, featuring applied examples in population research. It is oragnized by Egor Kotov with support from the Max Planck Institute for Demographic Research.

Workshop Details

Abstract The software industry has been enjoying the benefits of coding agents for over a year now, with even top software engineers now having AI assistants generate a vast majority of their code. Now, the tools for agentic coding are mature enough for anyone to try.

Unlike many educational materials in this field, this workshop will also address the new vectors of attack that novice users might expose themselves to by using coding agents blindly. We will cover key security risks associated with agentic systems, including data leakage, unsafe code execution, and unintended modifications to analysis code and research materials. To mitigate these risks, we will introduce accessible strategies such as containerization, sandboxing, and workflow isolation.

Participants will also see how the agents themselves can assist in setting up and using these protective tools to improve transparency and reproducibility. The workshop will combine short demonstrations with applied examples relevant to population research. Participants are encouraged to bring their own projects or ideas, which we will use to explore how LLM agents can be integrated into real research workflows. For those without a project, guided exercises and example tasks will be provided.

Before the Workshop: Accounts & Setup

To make the most of our hands-on sessions, please complete the following steps before the workshop on June 3, 2026.

Important🔒 Data Privacy & Security Warning

Free tiers for these LLM services typically use your prompts and uploaded code for model training. Do not use these tools on private, sensitive, or proprietary research data! Only use public or non-sensitive datasets during the workshop and for general testing.

Coding Agents We Will Use (and Why)

The coding agents are already pre-installed and configured inside the workshop’s Docker container / Codespaces environment. We will focus on:

  • Google Antigravity CLI (and its predecessor Gemini CLI)
    • Why: It provides a free daily quota to test cutting-edge models (like Gemini 3.5 Flash, Gemini 3.1 Pro, and Claude models) directly from your Google account.
  • OpenCode CLI
    • Why: A flexible, open-source-friendly agent that can connect to multiple backends. We will use it with free models from OpenCode Zen, OpenRouter, and NVIDIA NIM to show how easy it is to switch between different model providers without vendor lock-in.

What is CLI? It stands for “Command Line Interface” and means that you will interact with the agent by typing commands in a terminal (similar to your R console), rather than through a graphical user interface (GUI). This allows for more flexibility and control over the agent’s behavior, as well as easier integration into coding workflows. This also gives you better insight into how the agent works under the hood, which is important for understanding its capabilities and limitations. You are of course free to use the same agents through their graphical interfaces too. We are also limited by the current tutorial environment and tools that are free to use. In our tutorial environment Gemini and Antigravity are only available as CLI, OpenCode is aslo CLI, but you will learn how to run it in graphical mode as well.


To get started with the workshop materials, please navigate through the sections in the navigation bar on the left or with pagination links in the bottom.