Publish to PyPI Build and Push Docker Images PyPI version PyPI - Downloads Python 3.9+ License: Apache 2.0

QueryGym Logo

A lightweight, reproducible toolkit for LLM-based query reformulation

📚 Documentation📊 Leaderboard📦 PyPI📄 Paper


Features

Installation

Option 1: Install from PyPI

pip install querygym
# GPU version (default)
docker pull ghcr.io/ls3-lab/querygym:latest
docker run -it --gpus all ghcr.io/ls3-lab/querygym:latest

# CPU version (lightweight)
docker pull ghcr.io/ls3-lab/querygym:cpu
docker run -it ghcr.io/ls3-lab/querygym:cpu

# Or use Docker Compose
docker compose run --rm querygym

📖 Docker Setup: See DOCKER_SETUP.md for quick start or the full Docker guide for detailed usage.

Quickstart

import querygym as qg

# Load data
queries = qg.load_queries("queries.tsv")
qrels = qg.load_qrels("qrels.txt")
contexts = qg.load_contexts("contexts.jsonl")

# Create reformulator
reformulator = qg.create_reformulator("genqr_ensemble", model="gpt-4")

# Reformulate
results = reformulator.reformulate_batch(queries)

# Save
qg.DataLoader.save_queries(
    [qg.QueryItem(r.qid, r.reformulated) for r in results],
    "reformulated.tsv"
)

CLI

pip install -e .[hf,beir,dev]
export OPENAI_API_KEY=sk-...

# Run a method (e.g., genqr_ensemble)
querygym run --method genqr_ensemble \
  --queries-tsv queries.tsv \
  --output-tsv reformulated.tsv \
  --cfg-path querygym/config/defaults.yaml

Loading Datasets

BEIR:

import querygym as qg

# Download with BEIR library
from beir.datasets.data_loader import GenericDataLoader
data_path = GenericDataLoader("nfcorpus").download_and_unzip()

# Load with querygym
queries = qg.loaders.beir.load_queries(data_path)
qrels = qg.loaders.beir.load_qrels(data_path)

MS MARCO:

import querygym as qg

# Load from local files (download with ir_datasets)
queries = qg.loaders.msmarco.load_queries("queries.tsv")
qrels = qg.loaders.msmarco.load_qrels("qrels.tsv")

Examples

See the examples directory for:

Check examples/README.md for the full guide.

Contributing

We welcome contributions! Here’s how you can help:

Adding a New Prompt

  1. Edit querygym/prompt_bank.yaml
  2. Add an entry with fields: id, method_family, version, introduced_by, license, authors, tags, template:{system,user}, notes

Adding a New Method

  1. Create a class under querygym/methods/*.py
  2. Subclass BaseReformulator, annotate VERSION, and register with @register_method("name")
  3. Pull templates via PromptBank.render(prompt_id, query=...)

Reporting Issues

For detailed development guidelines, see the Contributing Guide in our documentation.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Citation

If you use QueryGym in your research, please cite:

@misc{bigdeli2025querygymtoolkitreproduciblellmbased,
      title={QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation}, 
      author={Amin Bigdeli and Radin Hamidi Rad and Mert Incesu and Negar Arabzadeh and Charles L. A. Clarke and Ebrahim Bagheri},
      year={2025},
      eprint={2511.15996},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2511.15996}, 
}