Ingredient
PlantVillage dataset
Also known as: PlantVillage, PV dataset
Open-access labeled dataset of ~54,000 leaf images covering 14 crop species and 38 disease-or-healthy classes — the de-facto training corpus for crop-disease computer-vision models. Released by Penn State and EPFL (2016, CC0); downloadable from Kaggle and Hugging Face. The right ingredient when a project needs to train or fine-tune a plant-disease classifier without collecting and labeling images from scratch. Limitations: laboratory-style backgrounds (single leaf on uniform background); models trained on PlantVillage often fail in field conditions and need transfer-learning on field images. Despite limitations, it's the foundational starting dataset for this domain.
What it is
- 54,000 images across 14 crop species (apple, [[blueberry|blueberry]], cherry, corn, grape, orange, peach, pepper, potato, [[raspberry|raspberry]], soybean, squash, strawberry, tomato)
- 38 classes — combinations of healthy and various diseased leaves (e.g., “Tomato___Late_blight”, “Apple___Cedar_apple_rust”)
- Format: JPEG images, ~256×256 px each, organized in folders per class
- License: CC0 (public domain)
Solves / unlocks
- Training disease-classifier CNNs (ResNet, EfficientNet, MobileNet) for these 14 crops
- Benchmark dataset for new architectures
- Transfer-learning starting point for custom field datasets
- Educational ML projects in agricultural computer vision
- Mobile / edge-device disease screening apps
Constraints
- Lab backgrounds — most images are single leaves on neutral/uniform backgrounds; field images have soil, multiple leaves, varied lighting, dew, occlusion. Models that hit 99% on PlantVillage often drop to 50–70% on field images.
- Class imbalance — some classes have ~5,000 images, others <500.
- Limited geography — collected primarily in U.S. and East Africa; doesn’t cover all regional disease variants.
- Single-leaf assumption — real field-imaging is multi-leaf; need YOLO-style detection-then-classification pipelines.
Source
- Original paper: Hughes & Salathé, 2016 (https://arxiv.org/abs/1511.08060)
- Kaggle (most common access): https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset
- GitHub: https://github.com/spMohanty/PlantVillage-Dataset
See also
Auto-generated from this entry’s typed relations: frontmatter, grouped by relation type so the editorial signal isn’t flattened.
- Member of: [[ingredient]]
- Combines with: [[opencv]] · [[nvidia-jetson]] · [[raspberry-pi]] · [[yolo]]
What links here, and how
Inbound connections from across the wiki, grouped by lens and by relationship. These appear automatically — every entity page declares what it links to, and that data populates here on the targets.
Practical
contains
- Farm-tech toolkit dataset / labeled crop-disease leaf images
combines with
- YOLO (object detection) YOLO trained on PlantVillage for combined leaf-detection + disease-classification
2 inbound links · 5 outbound