Decotools API¶

File I/O¶

There are two functions used to retrieve DECO data: get_iOS_files and get_android_files.

decotools.fileio_iOS.get_iOS_files(start_date=None, end_date=None, data_dir='/net/deco/iOSdata', include_events=True, include_min_bias=False, phone_model=None, device_id=None, return_metadata=False, n_jobs=1, verbose=0)[source]¶

Function to retrieve deco iOS image files

Parameters:

start_date : str, optional

Starting date for the iOS files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default starting date is ‘2016.01.01’.

end_date : str, optional

Ending date for the iOS files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default is the current date.

data_dir : str, optional

Base directory to search for iOS image files.

include_events : bool, optional

Option to include images files flagged as events. Default is True.

include_min_bias : bool, optional

Option to include minimum bias image files. Default is False.

phone_model : str or array-like, optional

Option to specify which phone models you would like to look at. Can be either a string, e.g. ‘iPhone 7’, or a list of models, e.g. [‘iPhone 5’, ‘iPhone 5s’]. Default is to include all phone models.

device_id : str or array-like, optional

Option to specify which devices you want to look at. Can either be a string, e.g. ‘EFD5764E-7209-4579-B0A8-EAF80C950147’, or a list of device IDs, e.g. [‘EFD5764E-7209-4579-B0A8-EAF80C950147’, ‘F216114B-8710-4790-A05D-D645C9C79C27’]. Default is to include all device IDs.

return_metadata : boolean, optional

Return a DataFrame with metadata information for each image file (default is False).

n_jobs : int, optional

The number of jobs to run in parallel (default is 1).

verbose : int {0, 1, or 2}

Option to have verbose output when getting files. Where 0 is least verbose, while 2 is the most verbose.

Returns:

numpy.ndarray

Numpy array containing files that match specified criteria

decotools.fileio_android.get_android_files(start_date=None, end_date=None, data_dir='/net/deco/deco_data', db_file='/net/deco/db_hourly_safe.csv', include_events=True, include_min_bias=False, device_id=None, return_metadata=False, verbose=0)[source]¶

Function to retrieve deco android image files

Parameters:

start_date : str, optional

Starting date for the files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default starting date is ‘2010.01.01’.

end_date : str, optional

Ending date for the files to retrieve. Use any common date format (e.g. ‘2017-01-01’, ‘20170101’, ‘Jan 1, 2017’, etc). Default is the current date.

data_dir : str, optional

Base directory to retrieve android image files.

db_file : str, optional

File path to android database.

include_events : bool, optional

Option to include images files flagged as events. Default is True.

include_min_bias : bool, optional

Option to include minimum bias image files. Default is False.

device_id : str or array-like, optional

Option to specify which devices you want to look at. Can either be a string, e.g. ‘DECO-00000000-450a-7561-433f-0516209b4922’, or a list of device IDs, e.g. [‘DECO-00000000-450a-7561-433f-0516209b4922’, ‘DECO-ffffffff-bd6f-e5fb-842b-56b10033c587’]. Default is to include all device IDs.

return_metadata : boolean, optional

Return a DataFrame with metadata information for each image file (default is False).

verbose : int {0, 1, or 2}

Option to have verbose output when getting files. Where 0 is least verbose, while 2 is the most verbose.

Returns:

numpy.ndarray

Numpy array containing files that match specified criteria

Blob Extraction¶

Finding interesting structure in images is done using the extract_blobs function.

decotools.blob_extraction.extract_blobs(image_file, threshold=20.0, rgb_sum=False, min_area=10.0, max_area=1000.0, max_dist=5.0, group_max_area=None, size=None)[source]¶

Function to perform blob detection on an input image

Blobs are found using the marching squares algorithm implemented in scikit-image.

Parameters:

image_file : str

Path to image file.

threshold : float, optional

Threshold for blob detection. Only pixels with an intensity above this threshold will be used in blob detection (default: 20).

rgb_sum : bool, optional

Whether to use a simple RGB sum to convert to image to grayscale, or to use a weighted RGB sum (default: False).

min_area : float, optional

Minimum area for a blob to be kept. This helps get rid of noise in an image (default: 10).

max_area : float, optional

Maximum area for a blob to be kept. This helps get rid of pathological events in an image (default: 1000).

max_dist : float, optional

Distance scale for grouping close by blobs. If two blobs are separated by less than max_dist, they are grouped together as a single blob (defualt: 5).

group_max_area : float, optional

Maximum area for a blob group to be kept. This helps get rid of pathological events in an image (default: None).

size : {None, int, array-like of shape=(2,)}, optional

Size of zoomed image of extracted blobs. If an integer is provided, the zoomed image will be a square box of size 2*size in each dimension. If an array-like object (of shape=(2,)) is provided, then the zoomed image will be of size 2*size[0] by 2*size[1]. Otherwise, the default behavior is to return a square image of size twice the equivalent diameter of the blob.

Returns:

pandas.DataFrame

A DataFrame containing information about the found blobs is returned. Each row in the DataFrame corresponds to a blob group, while each column corresponds to a pertinent quanitity (area, eccentricity, zoomed image array, etc.).

Metrics¶

Functions to calculate intensity metrics and histogram images are in the metrics module.

decotools.metrics.get_rgb_hists(files, cumulative=False, n_jobs=1)[source]¶

Calculates histograms of the pixel RGB sum distributions

Parameters:

files : str, sequence

Image file path (or sequence of file paths) to be analyzed.

cumulative : bool, optional

Option to calculate cumulative histograms. Histogrammed quantities will be N pixels > threshold.

n_jobs : int, optional

The number of jobs to run in parallel (default is 1).

Returns:

hists : pandas.DataFrame

Dataframe containing histograms of pixel RGB sums. Each row of the Dataframe corresponds to a single image and each column corresponds to an RGB sum value.

decotools.metrics.get_intensity_metrics(files, rgb_sum=False, n_jobs=1)[source]¶

Calculates various metrics related to the image intensity

Parameters:

files : str, sequence

Image file path (or sequence of file paths) to be analyzed.

rgb_sum : bool, optional

Option to use simple RGB sum for grayscale conversion (default is to use weighted RGB sum).

n_jobs : int, optional

The number of jobs to run in parallel (default is 1).

Returns:

image_intensities : pandas.DataFrame

DataFrame with intensity metrics

Event classification¶

CNN event classification is done using the CNN object.

class decotools.convnet.CNN(weights_file=None, model_file=None, custom_model=None, training=False, n_classes=4)[source]¶

CNN class

Parameters:

weights_file : str, optional

Path and file name of an hdf5 file containing the trained model weights to be used by the CNN (default is None).

model_file : str, optional

Path and file name of an hdf5 file containing a trained model. Typically, this should only be used when continuing an existing training session (default is None).

custom_model : keras model, optional

User-defined, compiled keras model to be used in place of the default (default is None).

training : bool, optional

If True, initializes the model structure used for training. If False, initializes the model structure used for predictions (default is False).

n_classes : int, optional

Number of classes to be used by the CNN (default is 4).

evaluate(images, labels, batch_size=32, verbose=0)[source]¶

Evaluate accuracy and loss of model predictions

Parameters:

images : numpy.ndarray

Array of grayscale, normalized images to be used for evaluation. Input shape = (n_image,n_row,n_cols,1).

labels : numpy.ndarray

Array of labels to be used for evaluation, shape=(n_images,).

batch_size : int, optional

Batch size to use for predictions (default is 32).

verbose : int, optional

Verbosity mode to use, 0 or 1 (default is 0).

Returns:

list

list containing [test_loss, test_accuracy]

fit(train_images, train_labels, test_images=None, test_labels=None, cv=None, batch_size=64, seed=None, epochs=800, initial_epoch=0, smooth_factor=0.004, horizontal_flip=True, vertical_flip=True, width_shift_range=0.08, height_shift_range=0.08, rotation_range=180.0, zoom_range=(0.9, 1.1), fill_mode='constant', cval=0, shuffle=True, save_model=None, save_weights=None, save_history=None, check_point=None, check_point_weights_only=True, verbose=False)[source]¶

Fit CNN

Parameters:

train_images : numpy.ndarray

Array of grayscale, normalized images to be used for training the CNN. Input shape = (n_image,n_row,n_cols,1).

train_labels : numpy.ndarray

Array of training labels, shape=(n_images,).

test_images : numpy.ndarray, optional

Array of grayscale, normalized images to be used for testing the CNN. Input shape = (n_image,n_row,n_cols,1).

test_labels : numpy.ndarray, optional

Array of testing labels, shape=(n_images,).

cv : int, scikit-learn cross validator, None, optional

Option for cross-validation fitting. If cv is an integer sklearn.model_selection.StratifiedKFold will be used with cv number of folds. Other cross validators from sklearn.model_selection can be passed to cv as well (default is None).

batch_size : int, optional

Number of samples per gradient update (default is 64).

seed : int, optional

Random seed to be used for reproducibility. (default is None).

epochs : int, optional

Number of epochs to train the model. Note that in conjunction with initial_epoch, the parameter epochs is to be understood as “final epoch”. (default is 800).

initial_epoch : int, optional

Epoch at which to start training. Useful for resuming a previous training run (default is 0).

smooth_factor : float in range (0, 1), optional

Level of smoothing to apply to one-hot label vector. Ex. smooth_factor of 0.004 applied to [0, 1, 0, 0], results in [0.001, 0.997, 0.001, 0.001] (default is 0.004).

horizontal_flip : bool, optional

Randomly flip inputs horizontally (default is True).

vertical_flip : bool, optional

Randomly flip inputs vertically (default is True).

width_shift_range : float, optional

Range for random horizontal shifts (default is 0.08).

height_shift_range : float, optional

Range for random vertical shifts (default is 0.08).

rotation_range : int, optional

Degree range for random rotations (default is 180).

zoom_range : float or (lower, upper), optional

Range for random zoom. If a float, (lower, upper) = (1-zoom_range, 1+zoom_range) (default is (0.9, 1.1)).

fill_mode : {“constant”, “nearest”, “reflect” or “wrap”}, optional

Points outside the boundaries of the input are filled according to the given mode (default is “constant”).

cval : float or int, optional

Value used for interpolated pixels when fill_mode="constant" (default is 0).

shuffle : bool, optional

Whether to shuffle the order of the batches at the beginning of each epoch (default is True).

save_model : str, optional

If specified, a copy of the model from the final training epoch will be saved. For example, save_model='my_model.h5'. Typically used for continued training (default is None).

save_weights : str, optional

If specified, a copy of the model weights from the final training epoch will be saved. For example, save_weights='my_weights.h5' (default is None).

save_history : str, optional

If specified, the training history (accuracy and loss for training and testing) from each epoch will be saved to a CSV file. For example, save_history='my_history.csv' (default is None).

check_point : str, optional

If specified, saves a running copy of the model corresponding to the lowest validation loss epoch. Each time a new low is reached, the previous best model is over-written by the new one. For example, check_point='my_checkpoint.h5' (default is None).

check_point_weights_only : bool, optional

If True, only the model’s weights will be saved in the check point. Otherwise the full model is saved. Ignored if check_point=False (default is True).

verbose : bool, optional

Option for verbose output.

Returns:

self : CNN

Trained CNN.

model_summary()[source]¶: Print summary of currently loaded model

predict(images, batch_size=32, verbose=0)[source]¶

Predict classifications for an input image array

Parameters:

images : numpy.ndarray

Array of grayscale, normalized images to be used for class predictions. Input shape = (n_image,n_row,n_cols,1)

batch_size : int, optional

Batch size to use for predictions (default is 32).

verbose : int, optional

Verbosity mode to use, 0 or 1 (default is 0).

Returns:

numpy.ndarray

Array containing class probabilities for each image. The array output is ordered as follows: [n_images, p(worm), p(spot), p(track), p(noise)]