Wednesday, February 20, 2019

Using Tensorflow with Keras - Introduction

Tensorflow + Keras


Yeah i have heard of em... Yay!! Am a techie!!


Well good for you.. we are not going to talk in detail about what Tensorflow and Keras are... 
Although am sure, many of us at least want a crash course! Therefore, as a super quick intro.. 

Tensorflow: An open source machine learning framework backed by google (kinda sdk for machine learning). They have done the mathematical implementations so you don't have to re-invent the wheels.

Keras: Is also an open source library (kinda sdk) BUT it is an interface. What it does, is to further simplify the frameworks like Tensorflow so that even people like me can code for Machine learning!

For the sake of simplicity, will try to stick with imitating linear regression that we used in our previous example.. by this example you will see that using Keras with tensorflow simplifies our lives so much more!!

Lets take a sample data-set.

Nice huh!!! The data set above contains a randomized sample of information in X & Y columns. The goal of our app is to build a simplest neural network that can iterate through and help us predict values for given input.

Now, since we are going to use a linear regression, obviously the outputs shall always be somewhat a straight line.

Alright, lets begin! Will try to keep the codes in code sections so its easy for us to iterate through.

Lets prepare our dataset (i.e. Pre-processing of data)

#We will use Pandas to read the csv file
import pandas as pd

file1 = "../data/input_rand1.csv"

# Incase you have mode than 1 csv, you may want to use this piece of code to
# combine them
all_files = [file1]
dataset = pd.concat((pd.read_csv(f,delimiter=',') for f in all_files))

# We don't need empty values
dataset = dataset.dropna(how="any", axis=0)

#replace spaces with _ for headers
dataset.rename(columns=lambda x: x.replace(' ', '_'), inplace=True)

Lets divide the data into training set and testing set

train = dataset.sample(frac=0.8,random_state=200)
test = dataset.drop(train.index)

Once the data set has been divided, lets identify our features and labels (in this case, its X -> features, and Y-> Label)

X_train = train.drop("Y", axis=1)
Y_train = train["Y"]
# Also for Test set
X_test = test.drop("Y", axis=1)
Y_test = test["Y"]

So now are data is ready lets get the bigshots in the game

from tensorflow import keras
# We will like to see how the training went on tensorboard too!
from tensorflow.keras.callbacks import TensorBoard
import time

Initializing tensorboard and providing a location where it may want to store its files

NAME = "Linear_{}".format(int(time.time()))
tb = TensorBoard(log_dir='logs/{}'.format(NAME))

Okay.. now comes the most complicated part! We will need to build the model...
If you can recall, in our previous code, we had to create input_fn and all other fancy stuff so that we can convert our datasets into tensors and then pass it to the estimator.

Well in case of Keras with Tensorflow, you may not need that ..

So building model is simply.. 

model = keras.Sequential([
keras.layers.Dense(10, input_shape=[len(list(X_test))]),
keras.layers.Dense(1)
])

The above code basically mean that you are creating a model with 2 layers.. 1st layer has 10 nodes with input shape of number of columns in features.

And since we want only 1 output as result for every row of input that we give, we have 1 node as output layer.

model.compile(loss='mse',
optimizer='adam',
metrics=['mae', 'mse'])

For this example we are going to use adam optimizer, in case you want more details on Mr. Adam, go here

now let the training begin..!!!!
As mentioned no need to change pandas dataset to anything special, you can pass them as is to the keras system.

model.fit(X_train, Y_train, epochs=1000, validation_split = 0.2, callbacks=[tb])

Now the only thing remaining to do is test it out for predictions.
For this example am just going to run all the inputs (i.e. X) again through the predictor and see what will be the result (even though i know it should be a straight line somewhere in the middle of the graph.. still, its fun!)

input_dict = train
input_dict_x = input_dict.drop("Y", axis=1)
input_dict_y = input_dict["Y"]
predict_results = model.predict(input_dict_x)

Lets check our updated graph!!

Behold The Graph!


As you can see the graph has a straight line right in the middle, which we anyways expected..

In case you want to just download the code and run it.. feel free to swing by on my github : https://github.com/abhinavasr/machinelearning  I'll try to share most of my learnings there!

Till next time!!


Saturday, February 16, 2019

Learning Machine Learning







Learning The Machine Learning


Hello good people!!

It has been awfully long when we had our last chat!


Sorry Had been busy :(


Anyways, moving on.. , thought of sharing some of the learning what i have been able to get for past few months of working on various platforms and tools.

This is 1st of hopefully many blogs that will be published on machine learning.


LEARNING MATHSS!!!!!

Naahh..  not going to bore you guys with loadz of Maths and stuff, there are far better websites and blogs which talk about the theory of how and why machine learning works. While we will work through the code we will talk about some of the concepts behind these elements, hopefully will give you a good insight to how things will work out!

And please note!!! I myself am a student of this discipline so.. in case your system blows up.. am not responsible!

So where do we start?

Well i though of going directly on Keras for Deep learning, but we cannot make buildings without foundations! 
So will start with some of the very-very basic stuffs and then build toward more complex stuffs!

In order to start doing anything with Machine learning, we need to know what exactly machine needs to learn?

Do we need our machines to start predicting things based on history for example how will be the weather tomorrow (i wish it was so easy) or do we want our machines to categorize the data into already known blocks, for example based on the RGB values what is the closest color (assuming you have only trained the system with main color codes)?

Apparently these 2 methods have very complex names for itself 

Prediction and Classification


I know most of you want to see in the future so we will start with just that!

We are going to build a simple application which will read the currency values in past and try to predict whats gonna happen on a a given day in future! For the entire exercise we will be using Python3 as our coding tool and Tensorflow for Machine Learning operations!

For this first example we are going to use Linear Regression (fancy way of saying its a method that wants to find a straight line which can be closest to most of the data points that we have in our historical.



Linear regression.svg


Okay So Lets get the Code out of this!


Image result for enough talk lets fight


You can download the historical data for currencies here, we will be using Pandas to import our csv which will be stored in ./data/file.csv


import pandas as pdfile = "./files/data.csv"
dataset = pd.read_csv(file,delimiter=',')
print(dataset.head())

Cleaning up data for our usage wil require dropping the non-required values (e.g. SEC Filing) also converting dates to date format and then to int for our calculations, also drop any NAN values



#If Date was in string

dataset['Date'] = pd.to_datetime(dataset['Date'])

dataset['Date'] = dataset.Date.apply(lambda x: x.strftime('%Y%m%d')).astype(int)

dataset = dataset.dropna(how="any", axis=0)
print(dataset.head())

Divide your data into test and train data, also identify the X (features) and Y (Labels) In the example we will try to find how many cards were

train = dataset.sample(frac=0.8,random_state=200)
test = dataset.drop(train.index)
X_train = train.drop("Value", axis=1)
Y_train = train["Value"]
X_test = test.drop("Value", axis=1)
Y_test = test["Value"]


The dataset will look something like this (you can plot using matplotlib)

Import tensorflow and covert the dataset into the tensors


import tensorflow as tf



# Converting the Panda datset to tensorflow dataset, 

# this should be directly passed to the tf.estimator as input fn

num_epochs = 1000
estimator_input_fn = tf.estimator.inputs.pandas_input_fn(
      x=X_train,
      y=Y_train,
      batch_size=100,
      num_epochs=num_epochs,
      shuffle=True,
      num_threads=5)

Now we have all the data we need, so lets start building the classifer, for this example we will use Keras and build a simple linear model Common question, why do you need features identified when we already provide dataset itself. You may want to see the below diagram to understand how tensorflow relates data.


# Build features and setup classifier
dates = tf.feature_column.numeric_column(key="Date")
countries = tf.feature_column.categorical_column_with_vocabulary_list(
    key='Country', vocabulary_list=X_train.Country.unique(), default_value=0)
values = tf.feature_column.numeric_column(key="Value")

feature_columns = [dates,countries]
model = tf.estimator.LinearRegressor(feature_columns=feature_columns)

Since now we have our model ready, let's train this dude!


STEPS = 1000

model.train(input_fn=estimator_input_fn, steps=STEPS)



Lets also build the input function for testing the model

estimator_input_test_fn = tf.estimator.inputs.pandas_input_fn(
      x=X_test,
      y=Y_test,
      batch_size=100,
      num_epochs=num_epochs,
      shuffle=True,
      num_threads=5)

Evaluate how the model performs on data it has not yet seen.

eval_result = model.evaluate(input_fn=estimator_input_test_fn)
print("Evaluation result: ", eval_result)

Now lets make some predictions of price and see how our model is working

input_dict = {
      "Date": [20190216],
      "Country": ["India"]
  }
# For the sake of simplicity, we will stick to dataframes from pandas, 
# there is an alternative to use numpy, but dont wanna confuse with too many stuff
df = pd.DataFrame(data=input_dict)
predict_input_fn = tf.estimator.inputs.pandas_input_fn(df, shuffle=False)


Now we are ready with data so lets predict

predict_results = model.predict(input_fn=predict_input_fn)
print("\nPrediction results:")
for i, prediction in enumerate(predict_results):
    print(prediction)


There! All done!!


Now don't panic if your predictions sucks! 

Not your fault! We will need to work on lots of factors that influence the predictions. But hey!! you did loadz today... 

Hope this was useful!

Until next time..