In artificial neural networks, the activation function of a node defines the output of that node, or “neuron”, given a
set of inputs. This output is then used as input for the next node. A neural network without an activation function is
essentially just a linear regression model. The activation function does the non-linear transformation to the input
making it capable to learn and perform more complex tasks.

```
Y = activation_function(sum(weight * input) + biases)
```

There are many types of activation functions and depending upon the application we use the appropriate one.

**Binary Step Function**
**Linear Function**
**Sigmoid Function**
**Hyperbolic Tan (Tanh) Function**
**Rectified Linear unit (ReLU) Function**
**Leaky Rectified Linear Unit (ReLU) Function**
**Softmax Function**

Depending upon the propertise of the problem we can make a better choice of activation function for easy and quicker
convergence of the network.

- Sigmoid function and their combinations generally work better in the case of classifiers
- Sigmoid and tanh functions are sometimes avoided due to the vanishing gradient problem.
- ReLU function is a genral activation function ans is used in mose cases these days.
- If we encounter a case of dead neurons in our networks the leaky ReLU function is the best choice.
- Always keep in mind that ReLU function should only be used in hidden layers.
- As a rule of thumb, we can begin with using ReLU function and move over to other activation function in cases ReLU
doesn’t provide with optimum result.

### References

A nice article on the topic

Wikipedia Article