Thursday, April 16, 2015

Staring at Goats II: Gender distribution

This is part 2 of a series of posts about goat breeding data.
[part 1]

In a previous post ("Staring at Goats") I showed how you can build and query a graph database of goats, using the Neo4J graph database and the cypher query language. It is now time to explore the data in more detail.

The extended model

We will now extend this model with gender data and the year in which the goats are born. Here is the extended model. (Lesson 2 in the Dutch language: Geslacht = Gender, Jaar = Year; see the previous post for lesson 1.)

We used the same technique loading these data from csv file as described in the previous post.

Just to check that the model works, let's find my own goats again and see with what nodes they are linked. We now see the gender and year added.



We can now ask questions like how many bucks and goats are born in a specific year? The cypher query for this is as follows. In this example, we choose 2015 as the year.
match (geit:GESLACHT)<-[:GESLACHT]-(g:GEIT)-[:JAAR]->(jaar:JAAR {jaar:15}) return geit.geslacht,count(g)
The outcome of this query is 178 bucks and 156 goats.

We can do this for each individual year, but we can grab this result for all years in one go using this cypher query:
match (:GESLACHT{geslacht:"Geit"})<-[:GESLACHT]-(geit:GEIT)-[:JAAR]->(jaar:JAAR)<-[:JAAR]-(bok:GEIT)-[:GESLACHT]->(:GESLACHT{geslacht:"Bok"}) return jaar.jaar,count(distinct(bok)),count(distinct(geit)) order by jaar.jaar descending
We now get a list with counted bucks and goats:

Cool! There is an interesting difference in the number of bucks and goats being born. Where we expect a 50/50 distribution, there are clearly more goats than bucks being born.

Hmm. This is probably a systematic effect: probably not all bucks are entered into the database! This might influence statistical data that is gender-dependent.

Gender ratios

Let's plot the percentages bucks and goats since 1975. For visualisation let's use D3.js (see https://github.com/mbostock/d3/wiki)

Our first plot simply shows the percentage of bucks versus goats being born, per year, since 1975.


Interestingly, in 2009, the percentage of bucks entered into the database went up from around 30% to about 50%. Since 2009, the gender statistics make sense. Before that, bucks may have ended up at the butcher's before ever being recorded. In the 1970's we talk about small numbers, so there is a larger variation in distribution.

If we plot the total number of goats born per year, we see how there is a steady rise from small numbers in 1975, to record 1200 goats in 2009. From then on, a decreasing trend in the number of bucks and goats being born is observed. That's less good news for the total population.


Read part 3: fertility

No comments:

Post a Comment