There are 40 human attributes listed below as input conditions.
For each image,its paired conditions are in the form of a vector of 40 length. Each value in the vector corresponds to a attribute,and is marked 1 if the person in that image has that attribute,otherwise is marked 0.
The attribute vector of 40 length is concatenated to
a noise vector of 100 length.
---Do linear scaling to a larger size.
---Reshape it to let it has spatial dimension.
---Go throug 4 layers of deconvolution,pooling,Relu.
---The final output has a size of [256,256,3],
which is the generated image.
The input images first go through 4
layers of convolution,pooling,Relu.
---Copy all the values in the attribute
vector to let it has a spatial dimension.
---Concatenate these two tensors.
---Reshape and linear-scale to size 1,
which represents its score
Finally, modify its original structure.
Identity Block from Residual Net is added before each deconv layer in discriminator.