Saturday, February 16, 2019

Learning Machine Learning







Learning The Machine Learning


Hello good people!!

It has been awfully long when we had our last chat!


Sorry Had been busy :(


Anyways, moving on.. , thought of sharing some of the learning what i have been able to get for past few months of working on various platforms and tools.

This is 1st of hopefully many blogs that will be published on machine learning.


LEARNING MATHSS!!!!!

Naahh..  not going to bore you guys with loadz of Maths and stuff, there are far better websites and blogs which talk about the theory of how and why machine learning works. While we will work through the code we will talk about some of the concepts behind these elements, hopefully will give you a good insight to how things will work out!

And please note!!! I myself am a student of this discipline so.. in case your system blows up.. am not responsible!

So where do we start?

Well i though of going directly on Keras for Deep learning, but we cannot make buildings without foundations! 
So will start with some of the very-very basic stuffs and then build toward more complex stuffs!

In order to start doing anything with Machine learning, we need to know what exactly machine needs to learn?

Do we need our machines to start predicting things based on history for example how will be the weather tomorrow (i wish it was so easy) or do we want our machines to categorize the data into already known blocks, for example based on the RGB values what is the closest color (assuming you have only trained the system with main color codes)?

Apparently these 2 methods have very complex names for itself 

Prediction and Classification


I know most of you want to see in the future so we will start with just that!

We are going to build a simple application which will read the currency values in past and try to predict whats gonna happen on a a given day in future! For the entire exercise we will be using Python3 as our coding tool and Tensorflow for Machine Learning operations!

For this first example we are going to use Linear Regression (fancy way of saying its a method that wants to find a straight line which can be closest to most of the data points that we have in our historical.



Linear regression.svg


Okay So Lets get the Code out of this!


Image result for enough talk lets fight


You can download the historical data for currencies here, we will be using Pandas to import our csv which will be stored in ./data/file.csv


import pandas as pdfile = "./files/data.csv"
dataset = pd.read_csv(file,delimiter=',')
print(dataset.head())

Cleaning up data for our usage wil require dropping the non-required values (e.g. SEC Filing) also converting dates to date format and then to int for our calculations, also drop any NAN values



#If Date was in string

dataset['Date'] = pd.to_datetime(dataset['Date'])

dataset['Date'] = dataset.Date.apply(lambda x: x.strftime('%Y%m%d')).astype(int)

dataset = dataset.dropna(how="any", axis=0)
print(dataset.head())

Divide your data into test and train data, also identify the X (features) and Y (Labels) In the example we will try to find how many cards were

train = dataset.sample(frac=0.8,random_state=200)
test = dataset.drop(train.index)
X_train = train.drop("Value", axis=1)
Y_train = train["Value"]
X_test = test.drop("Value", axis=1)
Y_test = test["Value"]


The dataset will look something like this (you can plot using matplotlib)

Import tensorflow and covert the dataset into the tensors


import tensorflow as tf



# Converting the Panda datset to tensorflow dataset, 

# this should be directly passed to the tf.estimator as input fn

num_epochs = 1000
estimator_input_fn = tf.estimator.inputs.pandas_input_fn(
      x=X_train,
      y=Y_train,
      batch_size=100,
      num_epochs=num_epochs,
      shuffle=True,
      num_threads=5)

Now we have all the data we need, so lets start building the classifer, for this example we will use Keras and build a simple linear model Common question, why do you need features identified when we already provide dataset itself. You may want to see the below diagram to understand how tensorflow relates data.


# Build features and setup classifier
dates = tf.feature_column.numeric_column(key="Date")
countries = tf.feature_column.categorical_column_with_vocabulary_list(
    key='Country', vocabulary_list=X_train.Country.unique(), default_value=0)
values = tf.feature_column.numeric_column(key="Value")

feature_columns = [dates,countries]
model = tf.estimator.LinearRegressor(feature_columns=feature_columns)

Since now we have our model ready, let's train this dude!


STEPS = 1000

model.train(input_fn=estimator_input_fn, steps=STEPS)



Lets also build the input function for testing the model

estimator_input_test_fn = tf.estimator.inputs.pandas_input_fn(
      x=X_test,
      y=Y_test,
      batch_size=100,
      num_epochs=num_epochs,
      shuffle=True,
      num_threads=5)

Evaluate how the model performs on data it has not yet seen.

eval_result = model.evaluate(input_fn=estimator_input_test_fn)
print("Evaluation result: ", eval_result)

Now lets make some predictions of price and see how our model is working

input_dict = {
      "Date": [20190216],
      "Country": ["India"]
  }
# For the sake of simplicity, we will stick to dataframes from pandas, 
# there is an alternative to use numpy, but dont wanna confuse with too many stuff
df = pd.DataFrame(data=input_dict)
predict_input_fn = tf.estimator.inputs.pandas_input_fn(df, shuffle=False)


Now we are ready with data so lets predict

predict_results = model.predict(input_fn=predict_input_fn)
print("\nPrediction results:")
for i, prediction in enumerate(predict_results):
    print(prediction)


There! All done!!


Now don't panic if your predictions sucks! 

Not your fault! We will need to work on lots of factors that influence the predictions. But hey!! you did loadz today... 

Hope this was useful!

Until next time..