New Cookery Recipes

new cookery recipes here you will find everything you are interested in

Inverse Cooking: Recipe Generation from Food Images – Part 2



Hi Friends The paper which we are going to discuss is "Inverse Cooking: Recipe Generation from Food Images" Link to the paper and code can be found at the description section

We have divided the whole paper into 5 segments Those are Abstract, Introduction, Approaches, Experiments and Conclusion We have already covered Abstract and Introduction section in last part The link is provided in the description Let's start with Approaches

The Idea behind this paper Generating a recipe (title, ingredients and instructions) from an image is a challenging task, which requires a simultaneous understanding of the ingredients composing the dish as well as the transformations they went through, eg slicing, blending or mixing with other ingredients Instead of obtaining the recipe from an image directly, The paper argues that a recipe generation pipeline would benefit from an intermediate step predicting the ingredients list

The sequence of instructions would then be generated conditioned on both the image and its corresponding list of ingredients, where the interplay between image and ingredients could provide additional insights on how the latter were processed to produce the resulting dish Method The recipe generation system takes a food image as an input and outputs a sequence of cooking instructions, which are generated by means of an instruction decoder that takes as input two embedding The first one represents visual features extracted from an image, while the second one encodes the ingredients extracted from the image Recipe Generation Method The paper purposes a recipe generation method

It contains mainly 4 parts: Image Encoder, Ingredient Decoder, Ingredient Encoder and Instruction decoder We will talk about each part separately The overall idea is, Step 1 – Extract image features with the image encoder Step 2 – Use Ingredient decoder to predict the Ingredients and encoded the ingredients into embedding with Ingredient Encoder Step 3 – Generates a recipe title and a sequence of cooking steps using cooking instruction decoder

Image Encoder Given an input image with associated ingredients, The aim is to produce a sequence of instructions R = (r 1 , , r T ), where r T denotes a word in the sequence, by means of an instruction transformer Note that the title is predicted as the first instruction

This transformer is conditioned jointly on two inputs: the image representation and the ingredient embedding ResNet-50 encoder is used as image encoder to extract the image and obtain the ingredient embedding by means of a decoder architecture to predict ingredients, followed by a single embedding layer mapping each ingredient into a fixed-size vector Ingredient Decoder Which is the best structure to represent ingredients? On the one hand, it seems clear that ingredients are a set, since permuting them does not alter the outcome of the cooking recipe On the other hand, ingredients can be considered as a list (eg

list of ingredients), implying some order Moreover, it would be reasonable to think that there is some information in the order in which humans write down the ingredients in a recipe Therefore, in this subsection the paper considers both scenarios and introduce models that work either with a list of ingredients or with a set of ingredients A list of ingredients is a variable sized, ordered collection of unique meal constituents A set of ingredients is a variable sized, unordered collection of unique meal constituents

The drawback of this approach is that such model design penalizes for order To fix this, aggregate the outputs across different time-steps by means of a max pooling operation Instruction decoder The instruction decoder is composed of transformer blocks, each of them containing two attention layers followed by a linear layer The first attention layer applies self-attention over previously generated outputs,   whereas the second one attends to the model conditioning in order to refine the self-attention output The transformer model is composed of multiple transformer blocks followed by a linear layer and a softmax nonlinearity that provides a distribution over recipe words

Summary Whole training process of recipe transformer happen in two stages In the first stage, pre-train the image encoder and ingredients decoder Then, in the second stage, train the ingredient encoder and instruction decoder,   by minimizing the negative log-likelihood and adjusting the values of Ingredient Encoder and instruction decoder Optimization Note that while training, the instruction decoder takes as input the ground truth ingredients Thanks for watching this video

The link to the paper and code can be found at the description section Don't forget to subscribe this channel to see new videos Bye

Source: Youtube

This div height required for enabling the sticky sidebar
New Cookery Recipes
Ad Clicks : Ad Views : Ad Clicks : Ad Views :

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.