You have implemented some AI solutions and have realized the value of machine learning and would like to go deeper. For various reasons, you have decided that the out-of-the-box models don’t deliver the functionality you are looking for, or you believe the performance can be improved on and you wish to develop your own custom algorithm to improve your performance metrics.
Step 1: Create your development environment
If you prefer to code in your own environment or workspace, you may do that. You will need to package the files according to the documentation here when complete. If you have chosen to develop outside of the platform, make sure you understand your requirements in step to then proceed to step ##. Otherwise, open the Custom Algorithms section of Coleman and add a new custom algorithm from the home screen, and select “Open Notebook.” Give the algorithm a name and save to enable the notebook.
Step 2: Understand your requirements
In creating a custom algorithm, you will need to create 2 essential scrips, and a set of hyperparameters. These essential items will be treated differently based on your choice in step 1. The scripts are:
- train: The program that is invoked to train the model
- predictor.py: The program that implements the Flask web server that can be called to get predictions from the trained model.
Step 3: Understanding the interface
There are several areas of this interface in which your development will take place.
The playbook tab is a non-required coding space where a Jupyter Notebook interface lets developers code and experiment with their development in a sandbox-style workspace. Creating a new notebook allows you the familiar modular interface of a jupyter notebook. You can also access the available python packages and their versons with a “!pip list” command.
Train and Predictor Tabs
Train and Predictor tabs are spaces ready to receive the final train and predictor.py scripts for deployment. These spaces have no testing capabilities so it is best practice to develop and test your scripts in the playbook tab and copy them to the train and predictor tabs when complete. The hyperparameters tab is where you will upload a csv detailing the hyperparameters for your algorithm and their default values and types.
The hyperparameters tab will let you individually add hyperparameters and their properties directly to the grid. You may optionally import a CSV with this information that will then be displayed in the table and is editable from there.
The datasets section will expand from the left bar allowing you to bring datasets already staged in Coleman into the jupyter notebook environment.
More instructions and sample files for easy access.
Step 4. Prepare your data
If you haven’t already, stage your data in Coleman.
::: Link data import video :::
Expand the datasets panel on the left side of the jupyterhub interface. Type the name of your loaded dataset and find it in the list. Select Load and you will see it appear in the list of loaded datasets.
Once you have loaded a dataset, a new folder named “datasets” will appear in the jupyterhub directory.
Use the folder structure to locate your dataset, and the subfolders it might live in, create an import command to bring your dataset into the kernel for use. Use a command like read_csv in pandas:
Step 5: Write your code!
Unleash your creative juices. Code, test, recode, and compare models. Shape your code into the train and predictor scripts and copy the appropriate code to their respective tabs.
Step 6. Import Hyperparameters:
Add your hyperparameters to the grid in the Hyperparameters tab and set their attributes (default falue