An example of Proportional Sampling

Saurabh Raj
4 min readSep 19, 2020

A walk through the concept of proportional sampling by an example explanation with python codes to perform the same.

Table of contents:

  • What is proportional sampling?
  • Example problem
  • Algorithm
  • Python Code
  • Summary

What is proportional sampling?

In most simple words, proportional sampling is a sampling of a population in which the probability of finding an element is proportional to some common shared attribute or property of all the elements in the population. For example, suppose you have a set of numbers, say {2,5,8,15,46,90}, and you want to randomly pick a number but you don’t want the probability to be uniform. Instead, you want the probability of finding a number to be proportional to the face values of the number which precisely means that 90 should have the highest probability and 2 should have the lowest probability to be picked up.

An example

In Europe a new football tournament was announced. Many big business men came forward to start their own clubs. There was one owner, Mr. Robert, who had no knowledge about how to choose right player for the team. But he was quite certain that more number of goals a player has scored, the better is the player. With this much of knowledge is arranged the player vs number of goals data set.

Table 1

Each team has to select 18 players. Now the rule of player choosing was that Mr. Robert could take 4 players of his choice and has to select the other 14 randomly. So basically Mr. Robert has to select 14 random players from table 1.

Now the problem is how to randomly select the player such that the probability of selecting the player is more if the number of goals is more.

Algorithm:

  • Compute the total sum of goals.
eq. 1
  • Normalize goals of each player with respect to S.
eq. 2
  • Compute Cumulative Normalized Sum of goals for each player. Note: The cumulative normalized sum for the last player will be equal to 1.0.
eq. 3
  • Pick a value randomly, r, from the range (0,1).
  • For each cumulative normalized G in the list of G” , if r ≤ G’’_i ,then return player corresponding to (G’’_i)*S.

Python Code

First import all the necessary libraries:

Then load the data set consisting the unique players verses total goals scored by them:

Check for any duplicates or NaN values:

Now calculate the total sum of goals:

Compute the normalize value of goals for each player:

Compute the Cumulative normalized sum value of goals for each player:

Finally find all the players by using simple for loops:

Summary

So, proportional sampling can be a very handy technique when it comes to data sampling in machine learning and data science. There are some naive ways also, like create duplicates of each data point proportional to the number of goals and then randomly select any value with uniform probability. Proportional sampling is also knows as stratified sampling in machine learning.

--

--