Sampler

Overview:

class epios.Sampler(data=None, data_store_path='./input/', pre_process=True, num_age_group=None, age_group_width=None, mode='Base')[source]

Bases: DataProcess

The base sampling class.

This class will perform a totally random sampling for a single time.

Parameters:

If you want to input new data, you can input that into data argument and set the pre_process to True.

If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.

num_age_groupint

This will be used when age stratification is enabled indicating how many age groups are there.

The last group includes age >= some threshold

age_group_widthint

This will be used when age stratification is enabled, indicating the width of each age group (except for the last group)

modestr

This indicates the specific mode to process the data. This should be the name of the modes that can be identified.

If you want this class sample as originally designed, do not change this value

person_allowed(sample: list, choice: str, threshold: int = 3)[source]

Function to see if the sampled person should be included in the generic sample

Parameters:

samplelist

List of people who have already been chosen

choicestr

string id of the person being sampled

thresholdint

The cap on the number of people sampled per household

sample(sample_size: int)[source]

This method samples data for a given sample size randomly.

Parameters:

sample_sizeint

The size of sample

Output:

reslist

A list of ID of people who is sampled

class epios.SamplerAge(data=None, data_store_path='./input/', pre_process=True, num_age_group=17, age_group_width=5, mode='Age')[source]

Bases: Sampler

The sampling class with age stratification.

Parameters:

If you want to input new data, you can input that into data argument and set the pre_process to True If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.

num_age_groupint

Indicating how many age groups are there.

The last group includes age >= some threshold

age_group_widthint

Indicating the width of each age group (except for the last group)

modestr

This indicates the specific mode to process the data. This should be the name of the modes that can be identified.

If you want this class sample as originally designed, do not change this value

get_age_dist()[source]

Read the age distribution from pop_dist.json processed from DataProcess class

Output:

configlist

A list of floats, with sum 1, length should be the number of age groups

multinomial_draw(n: int, prob: list)[source]

Perform a multinomial draw with caps, it will return a tuple of lists. The first output is the number of people that I want to draw from each group, specified by age.

Parameters:

nint

The sample size

problist

List of floats, sum to 1. Length should be number of age groups

Output:

reslist

A list of integers indicating the number of samples from each age group

sample(sample_size: int)[source]

Given a sample size, and the additional sample, should return a list of people’s IDs drawn from the population

Parameters:

sample_sizeint

The size of sample

Output:

reslist

A list of ID of the sampled people

class epios.SamplerRegion(data=None, data_store_path='./input/', pre_process=True, mode='Region')[source]

Bases: Sampler

The sampling class with region stratification.

Parameters:

If you want to input new data, you can input that into data argument and set the pre_process to True If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.

modestr

This indicates the specific mode to process the data. This should be the name of the modes that can be identified.

If you want this class sample as originally designed, do not change this value

additional_nonresponder(non_resp_id: list, sampling_percentage=0.1, proportion=0.01, threshold=None)[source]

Generate the additional samples according to the non-responder IDs

Parameters:

non_resp_idlist

A list containing the non-responder IDs

sampling_percentagefloat, between 0 and 1

The proportion of additional samples taken from a specific regional group

proportionfloat, between 0 and 1

The proportion of total groups to be sampled additionally

thresholdNoneType or Int

The lowest number of regional groups to be sampled additionally

Note: proportion and threshold both determine the number of groups to be sampled additionally,

but both are depending on how many groups can be sampled additionally

Output:

additional_samplelist with length num_region_group

A list containing how many additional samples we would like to draw from each region group

get_region_dist()[source]

Extract the geo-distribution from the microcells.csv generated by DataProcess class

Output:

distlist

A list of floats, with sum 1, length should be the number of cells

multinomial_draw(n: int, prob: list)[source]

Perform a multinomial draw with caps, it will return a tuple of lists. The first output is the number of people that I want to draw from each group, specified by region. The second output is for convenience of the following sampling function.

Parameters:

nint

The sample size

problist

List of floats, sum to 1. Length should be number of region groups

Output:

reslist

A list of integers indicating the number of samples from each region group

res_cap_blocklist

A list of caps for each region group

sample(sample_size: int, additional_sample: list | None = None, household_criterion=False, household_threshold: int = 3)[source]

Given a sample size, and the additional sample, should return a list of people’s IDs drawn from the population

Parameters:

sample_sizeint

The size of sample

additional_samplelist

List of integers indicating the number of additional samples drawn from each region group

household_criterionbool

Turn on or off the household criterion

household_thresholdint

The maximum number of people sampled from one household

Output:

reslist

A list of ID of the sampled people

class epios.SamplerAgeRegion(data=None, data_store_path='./input/', pre_process=True, num_age_group=17, age_group_width=5, mode='AgeRegion')[source]

Bases: Sampler

The sampling class with age and region stratification.

Parameters:

If you want to input new data, you can input that into data argument and set the pre_process to True If you want to use previous processed data, you can input the data_store_path to read data files, and set the pre_process to False.

num_age_groupint

Indicating how many age groups are there.

The last group includes age >= some threshold

age_group_widthint

Indicating the width of each age group (except for the last group)

modestr

This indicates the specific mode to process the data. This should be the name of the modes that can be identified.

If you want this class sample as originally designed, do not change this value

additional_nonresponder(non_resp_id: list, sampling_percentage=0.1, proportion=0.01, threshold=None)[source]

Generate the additional samples according to the non-responder IDs

Parameters:

non_resp_idlist

A list containing the non-responder IDs

sampling_percentagefloat, between 0 and 1

The proportion of additional samples taken from a specific age-regional group

proportionfloat, between 0 and 1

The proportion of total groups to be sampled additionally

thresholdNoneType or Int

The lowest number of age-regional groups to be sampled additionally

Note: proportion and threshold both determine the number of groups to be sampled additionally,

but both are depending on how many groups can be sampled additionally

Output:

additional_samplelist of 2D, with dimension (num_region_group, num_age_group)

A list containing how many additional samples we would like to draw from each age-region group

get_age_dist()[source]

Read the age distribution from pop_dist.json processed from DataProcess class

Output:

configlist

A list of floats, with sum 1, length should be the number of age groups

get_region_dist()[source]

Extract the geo-distribution from the microcells.csv generated by DataProcess class

Output:

distlist

A list of floats, with sum 1, length should be the number of cells

multinomial_draw(n: int, prob: list)[source]

Perform a multinomial draw with caps, it will return a tuple of lists. The first output is the number of people that I want to draw from each group, specified by age and region. The second output is for convenience of the following sampling function.

Parameters:

nint

The sample size

problist

List of floats, sum to 1. Length should be number of age groups times number of region groups

Output:

reslist

A list of integers indicating the number of samples from each age-region group

res_cap_blocklist

A list of caps for each age-region group

sample(sample_size: int, additional_sample: list | None = None, household_criterion=False, household_threshold: int = 3)[source]

Given a sample size, and the additional sample, should return a list of people’s IDs drawn from the population

Parameters:

sample_sizeint

The size of sample

additional_samplelist

List of integers indicating the number of additional samples drawn from each age-region group

household_criterionbool

Turn on or off the household criterion

household_thresholdint

The maximum number of people sampled from one household

Output:

reslist

A list of ID of the sampled people