# Deep Learning book notes

Mostly picking up things as and when I needed them but have always felt the need to fix my foundations so here's an attempt at that. As it turns out, transposing the result of a matrix multiplication is the same as multiplying the original transposed matrices in 2. In such a scenario, a better idea would be to return back to the point where the validation error was the least. “A Mind for Numbers: How to Excel at Math and Science”, Barbara OakleyFor example, is it better to spend two hours catching up on the life developments of all our FaceBook friends or to go out with one or two close friends for dinner? Brief Summary of Book: Deep Learning with Python by Francois Chollet Here is a quick description and cover image of book Deep Learning with Python written by Francois Chollet which was published in — . You should not give up unless you are forced to give up.Regularization can solve underdetermined problems. So, we might have large weights being compensated by extremely small weights to make the overall norm small. Indeed, you don’t need the actual count of people in either case, the proportion of the world population will suffice.

Suppose you know the probability that someone is German given that they speak German and also the probability that someone anywhere in the world speaks German and you want to know the probability that someone is both German and speaks German. We have come to view our free-time as simply the sub- and post-script to our days/weeks and so we spend more and more time working. It seems like you need to simply multiply the conditional and marginal probability:Just like with PMFs, a PDF can relate to two different random variables.

It aims to provide intuitions/drawings/python code on mathematical theories and is constructed as my understanding of these concepts. Thank you for the recommendation Arthur – I will definitely put those Deep Work principles to work! Next we will see why.This means that the length of every row of In fact, you could even say that the vector dot product is the same as the multiplication of a matrix with only one row with a matrix with only one column, and indeed this is why the notation used by the book for the vector dot product is not a⋅b but aTb.Eigen-stuff is accessible in numpy through Most (but not all) square matrices have a special matrix called an “inverse”, represented with a -1 superscript like so: A−1A−1.

For a classification task, we desire for the model to be invariant to certain types of transformations, and we can generate the corresponding Thus, if the hyperparameters are such that:where the constraint is on the the number of non-zero entries indicated byA final suggestion made by Hinton was to restrict the individual column norms of the weight matrix rather than the Frobenius norm of the entire weight matrix, so as to prevent any hidden unit from having a large weight. The prediction then becomes ∑ p(μ)p(y|x, μ). Check out the Book Crunch on Barbara Oakley’s “A Mind for Numbers” for more on that What follows are my brief notes on Cal’s main points and suggestions.Want 5 FREE Printable Productivity Templates?FREE Weekly Planner Template To Ace Your Weekly Schedule“Schedule every minute of your day… When you’re done … every minute should be part of a block. 1 Neural Networks We will start small and slowly build … It is a cool example of derivation but I won’t go through it here since it doesn’t really introduce new material, just shows how to use what we saw above.Secondly, since we’ve already talked about transposes, it is interesting to ask what the transpose of a multiplication is. From this we can infer that there’s a 72% chance that something that flies is delicious, which we couldn’t have done using logic.As usual, this post is based on a Jupyter notebook that can be found Indeed, I think it doesn’t, but it is still important for our model to return a To give a specific example, suppose you want to know whether someone comes from Germany, and you know that they speak fluent German. deep-learning-book-notes. This leads to representational sparsity, where many of the activation values of the units are zero. As it turns out, matrices can only ever span linear spaces such as points, lines, planes and hyperplanes (a plane in more than 2 dimensions). These are my notes on the Deep Learning book. This took a lot longer to write than I expected and I think we have reached a good natural break in the material (the rest is more of a grab-bag of various fun techniques), so I will finish my notes for chapter 3 in a subsequent post. According to the book, there are two reasons:The discrete distributions supported by scipy are found Specifically it is defined so that the probability of the value falling within an interval is the area under the curve of the PDF in that interval.

We won’t prove this here.So how are these vectors and values special? In the figure below, We get optimal θ by solving the Lagrangian. Well it seems like you need to know how many people in the world speak German as well as how many people there are in Germany in total. These notes cover about half of the chapter (the part on introductory probability), a followup post will cover the rest (some more advanced probability and information theory).There is a difference between the “frequentist” and “bayesian” interpretations of probability… I won’t go too much into that. They can also serve as a quick intro to probability. Upon termination of training, we return the last Thus, L¹ regularization has the property of sparsity, which is its fundamental distinguishing feature from L².