RegionE

TL;DR

Announcing RegionE, a training-free method that losslessly accelerates SOTA instruction-based image editing models, including Qwen-Image-Edit, FLUX.1-Kontext, and Step1X-Edit, achieving acceleration factors of 2-3×. The key lies in exploiting the spatial redundancy and timestep-wise redundancy in the image editing process. With simple pip installation, acceleration can be achieved in just four lines of code.

Motivation

💡Most instruction-based image edits affect only local regions, yet existing models regenerate the entire image, wasting computation on unedited areas. This made us wonder: can full-image editing models be adapted to generate only the regions that matter, boosting editing efficiency?

Attention Portion Figure — Figure: Some examples of instruction-based image editing.

Deep Insights into Edited and Unedited Region

Instruction-based image editing involves two distinct regions: the Edited Region (changes) and the Unedited Region (consistency). Our research reveals a fundamental divergence in the underlying generation process of these two regions:

Unedited Region: These follow straight trajectory, enabling early velocity estimates to accurately predict the final result in a single step.
Edited Region: These exhibit curved trajectory, rendering early estimates unreliable and necessitating iterative denoising for precision.
Temporal Consistency: Across both regions, velocity remains highly consistent. between adjacent timesteps.

The video below demonstrates one-step predictions of the final edited image using velocity from different denosing timesteps. Consistent with our trajectory analysis, unedited regions are accurately predicted early due to their straight trajectory, whereas edited regions require more iterations to resolve, confirming the necessity of distinct processing for their complex, curved generative trajectories.

Video: One-step predictions of the final edited image using velocity from different denoising timesteps.

Method

RegionE is a plug-and-play acceleration framework that exploits internal mechanisms of pretrained models to distinguish edited from unedited image regions, and applies region-specific acceleration strategies accordingly. Its workflow consists of three stages:

Stabilization Stage (STS): Regional Identification
The goal of this stage is to differentiate between edited and unedited regions.
- Adaptive Region Partition (ARP): We propose ARP, which generates an estimated edited image through a one-step denoising process. Cosine similarity between this estimate and the reference image is then used to classify regions with high similarity as unedited and those with low similarity as edited.
- RIKV-Cache Preservation: Due to the low signal-to-noise ratio of the initial image fed into DiT, early estimates may be inaccurate. Therefore, we do not apply any operations at these timesteps to ensure stability. At its final timestep, ARP is applied to finalize the regional mask, and the Key–Value states of the unedited regions and the reference image in the DiT attention layers are cached.
Region-Aware Generation Stage (RAGS): Differentiated Acceleration
In this stage, different acceleration strategies are applied based on the identified regions:
- Unedited Region: These areas utilize one-step estimation to generate results.
- Edited Region: These regions follow the iterative denoising process. To maintain global context, information from the unedited regions and the reference image are injected into the attention mechanism via Region-Instruction KV Cache (RIKVCache). To further boost speed, we leverage the similarity between consecutive timesteps using Adaptive Velocity Decay Cache (AVDCache).
Smoothing Stage (SMS): Boundary Refinement
To ensure visual consistency, this final stage eliminates any visible boundaries between the edited and unedited regions. Specifically, no modifications are applied to the denoising process in the last few timesteps.

Quantitive Results

RegionE is compatible with models such as Qwen-Image-Edit, FLUX.1-Kontext, and Step1X-Edit. Evaluated on GEdit-Bench and Kontext-Bench, it achieves an overall 2–3x lossless acceleration.

Qualitative Results

The following showcases demonstrate RegionE's broad applicability. Select a model and an editing task to explore the performance. The following examples are randomly sampled from GEdit-Bench and Kontext-Bench, with editing speedups of 2–3×.

Model / Task

🚀 Quick Start

Integrate RegionE in seconds for lossless acceleration.

1 Install via PyPI

Terminal

# Install RegionE module
$ pip install RegionE

2 Integrate with 4 Lines of Code

example_inference.py

import torch
from diffusers import Step1XEditPipeline
from diffusers.utils import load_image

# (1) --- Import RegionE ---
from RegionE import RegionEHelper
# Loading the original pipeline
pipeline = Step1XEditPipeline.from_pretrained(
    "stepfun-ai/Step1X-Edit-v1p1-diffusers",
    torch_dtype=torch.bfloat16
)
pipeline.to("cuda")

# (2-4) --- Enable RegionE Acceleration ---
regionehelper = RegionEHelper(pipeline)
regionehelper.set_params()  # default hyperparameter
regionehelper.enable()
# Generate Image
image = load_image("demo_0.png").convert("RGB")
prompt = "Replace the text 'SUMMER' with 'WINTER'"
image = pipeline(
    image=image,
    prompt=prompt,
    num_inference_steps=28,
    true_cfg_scale=6.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
image.save("output_edit.jpg")

regionehelper.disable()

* RegionE seamlessly integrates with Step1X-Edit, Qwen-Image-Edit, FLUX.1-Kontext and so on.

Full Usage Guide →

Get in Touch

If you have any questions or would like to discuss further, please feel free to contact me at:
Pengt.Chen@gmail.com

BibTeX

@article{chen2025regione,
  title={RegionE: Adaptive Region-Aware Generation for Efficient Image Editing},
  author={Chen, Pengtao and Zeng, Xianfang and Zhao, Maosen and Shen, Mingzhu and Ye, Peng and Xiang, Bangyin and Wang, Zhibo and Cheng, Wei and Yu, Gang and Chen, Tao},
  journal={arXiv preprint arXiv:2510.25590},
  year={2025}
}

RegionE: Adaptive Region-Aware Generation for Efficient Image Editing