Animal Adoption-How Data Science Can be Used to Help Animals in Shelter?
This is the data science project I did while in Metis Data Science Bootcamp.
According to the report from the National Council on Pet Population Study and Policy (NCPPSP) and American Society for the Prevention of Cruelty to Animals (ASPCA). Approximately 6.5 Million companion animals enter U.S. animal shelters every year. And, 1.5 Million shelter animals are euthanized per year.
If homes can be found for these animals, we may be able to reduce the euthanasia rate for these precious lives. I believe that this can be achieved by showcasing specific types of animals with a higher risk of Euthanasia due to them being neglected. I am motivated to complete and showcase this project so that we can increase adoption rate and provide more assistance for the shelters.
To achieve the goal of reducing the Euthanasia rate. I created classification model to predict whether a given animal in shelter will be adopted or euthanized, so that shelters can focus their energy on the specific animal that need an extra help to find a new home. Also, I created recommender system based on image to help potential owner efficiently to find their ideal pet.
The data (2013 OCT to 2021 Mar) from Austin Animal Center for classification model. And, I used Beautiful Soup and Selenium to web scraped adoptable image data from ASPCA from recommender system. Below is a screenshot of what the ASPCA pet finder looks like.
Classification Model- Variables
Feature variables were included: animal type(dog/cat), intake condition, animal profile(breed, color, sex, neutered/sprayed.)
Target Variable, the value we are trying to predict, is outcome. There are to results: adoption or euthanasia.
From below pie plot, we can see that this is an imbalanced dataset. To handle imbalance, I used stratify to keep the proportion of class when splitting the dataset. Then, using under-sampling method to balance the size of the two classes.
- Model Selection
In order to help as many animals that actually need additional assistance to be adopted. I would like to optimize recall rate, and minimum the number of the class euthanizing that misclassified as adopted(minimum false negative).
I did cross validation on logistic regression, KNN, decision trees, random forest and gradient boost (XGBoost) algorithms to predict outcome for each animal, XGBoost gave us the best performance.
- Model Result- XGBoost
The recall score is 0.83% (false negative=127/ actual Euthanasia=740). Which means this model is able to predict 83% of animals under class euthanasia correctly.
The Influences of Each Input Variables
I used SHAP to visualize the XGBoost and make it interpretable. The y-axis indicates the variable name, in the order of importance from top to bottom. The X-axis is the SHAP value which are measures of contributions each feature has. All of the value on the left represent the observations that shift prediction in the negative direction towards class adoption. On the right is opposite.
- What We Learned From Above Plot?
- People tend to adopt younger animals.
Having a higher age is associated with positive values
2. Neutered or spayed animals are more likely to get adopted
For variable Neutered or spayed, when neutered or spayed is true, SHAP value is low.
3. Influence of Breed
Pitbulls and domestic cats are less popular. (*Domestic cat: short hair mixed cat)
Chihuahuas and miniature pets are more popular
Sometimes, people have an ideal pet in their mind. However, they might not know the breed and they don’t know where to find a similar looking pet. Therefore, I created a recommender system based on images to help them efficiently to find their ideal pet.
Also, I combined information we found from the classification model. The older dogs in particular are less likely to be adopted. Therefore, this recommender system preferentially showcases older dogs.
I used pre-trained deep learning models, VGG-16 as feature extractors. Since we don’t need models as classifiers, I removed VGG-16’s final output layer, so we will get features instead of predictions. First, I loaded 21,000 adoptable dog image data. Then, I got feature vectors for input and training images. Next, I compute cosine similarity then return top 10 similar looking dogs.
Here is the example for recommender system. We input the left image yellow brownish dog, and returned top 10 similar dogs on the right. We can see that this recommender system works pretty well. It returned the dog with same color and look at the eyes/ears, and nose, their facial features are similar.
After I got top 10 similar looking dogs, I rerank them based on their age. There are 4 age levels; Baby, Young, Adult, and Senior. Senior and Adult dogs will preferentially be recommended to potential owners.
- Classification XGBoost Model-Capture 83% of euthanized animals
- Which types of animals are less likely to be adopted?
- Older dog/cat
- Dogs and cats who are not spayed or neutered
- The Breed: Pitbulls & Domestic Cat
- Large dogs/cats
- Recommender System for adoptable dog
- Help potential pet owners save time to find their ideal dog.
- Showcase older dogs in order to lower risk of euthanasia