About our Dataset

A Comprehensive Dataset

The dataset that underlies ParasiteID is one of the most comprehensive image sets of parasitic worm eggs found in stool and contains nine different species. The dataset contains images of the eggs of three Schistosome species (S. mansoni, S. haematobium, S. japonicum), the eggs of Hookworm, Roundworm (Ascaris lumbricoides), Whipworm (Trichuris trichiura), Tapeworm (Taenia spp.), Dwarf Tapeworm (Hymenolepsis Nana), and Liver Fluke (Fasciola Hepaticum).

We've been able to assemble our dataset because of the generosity of several research groups.

How we assembled the dataset:

1) We captured images of Schistosomes ourselves.

We captured some of the images in the dataset ourselves because we did not identify any published datasets that included Schistosome eggs, and none of the experts we contacted knew of any.

We were able to include eggs of Schistosome species in our dataset only because the eggs were shared with us by Dr. Danielle Skinner and Dr. Conor Caffrey (UC San Diego) and by Dr. Margaret Mentink-Kane (NIH Schistosome Research Center). We thank them for their generosity and their expertise. We captured images of these Schistosome species on both conventional microscopes and a Foldscope.

2) Research groups from around the world shared their image datasets.

Our dataset contains images of parasite eggs from patient stool samples prepared at the Mulago National Referral Hospital in Uganda. We thank Dr. John Quinn (Makerere University, Uganda) for sharing these images. The dataset contains images of Hookworm, Tapeworm and Dwarf Tapeworm.

Our dataset also contains images of parasite eggs from patient stool samples collected at the Hospital Universiti Sains Malaysia. We thank Dr. Kamarul Hawari Ghazali (Universiti Malaysia Pahang) for sharing these images of Whipworm and Roundworm.

Finally, our dataset contains images of parasite eggs from patient stool samples collected at Universidad Peruana Cayetano Heredia in Peru. We thank Dr. Mirko Zimic and Dr. Alicia Alva (Universidad Peruana Cayetano Heredia, Lima, Peru) for sharing images of Tapeworm, Whipworm, and Liver Fluke.

A Global Dataset for a Global Impact

We have assembled our dataset to make ParasiteID useful across geographic regions. Our dataset contains images of parasites that are endemic to people in different regions of the world. This means that not every parasite in our dataset is found in every region of the world that has a high incidence of some of the parasites in our dataset. For example, whereas regions of Asia and Sub-Saharan Africa both have high incidences of Roundworm, a different parasite, Schistosoma japonicum, is much more common in Asia than other parts of the world and does not occur in Sub-Saharan Africa. We designed our dataset to include all the parasite eggs found in feces that we could obtain images of, and we’ve done so in order to make ParasiteID useful globally.