1、torchvision源码解读之transformfrom _future_ import divisionimport torchimport mathimport randomfrom PIL import Image, ImageOps, ImageEnhancetry: import accimageexcept ImportError: accimage = Noneimport numpy as npimport numbersimport typesimport collectionsimport warningsfrom . import functional as F_all

2、_ = Compose, ToTensor, ToPILImage, Normalize, Resize, Scale, CenterCrop, Pad, Lambda, RandomCrop, RandomHorizontalFlip, RandomVerticalFlip, RandomResizedCrop, RandomSizedCrop, FiveCrop, TenCrop, LinearTransformation, ColorJitter, RandomRotation, Grayscale, RandomGrayscale#Compose这个类是用来管理各个transform的

3、,可以看到主要的_call_方法就是对输入图像img循环所有的transform操作class Compose(object): Composes several transforms together. Args: transforms (list of Transform objects): list of transforms to compose. Example: transforms.Compose( transforms.CenterCrop(10), transforms.ToTensor(), ) def _init_(self, transforms): self.tran

4、sforms = transforms def _call_(self, img): for t in self.transforms: img = t(img) return img#ToTensor类是实现:Convert a PIL Image or numpy.ndarray to tensor 的过程,# 在PyTorch中常用PIL库来读取图像数据,因此这个方法相当于搭建了PIL Image和Tensor的桥梁。# 在做数据归一化之前必须要把PIL Image转成Tensor,而其他resize或crop操作则不需要。class ToTensor(object): Convert

5、a PIL Image or numpy.ndarray to tensor. Converts a PIL Image or numpy.ndarray (H x W x C) in the range 0, 255 to a torch.FloatTensor of shape (C x H x W) in the range 0.0, 1.0. def _call_(self, pic): Args: pic (PIL Image or numpy.ndarray): Image to be converted to tensor. Returns: Tensor: Converted

6、image. return F.to_tensor(pic)#ToPILImage顾名思义是从Tensor到PIL Image的过程,和前面ToTensor类的相反的操作class ToPILImage(object): Convert a tensor or an ndarray to PIL Image. Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while preserving the value range. Args: mode (P

7、IL.Image mode_): color space and pixel depth of input data (optional). If mode is None (default) there are some assumptions made about the input data: 1. If the input has 3 channels, the mode is assumed to be RGB. 2. If the input has 4 channels, the mode is assumed to be RGBA. 3. If the input has 1

8、channel, the mode is determined by the data type (i,e, int, float, short). . _PIL.Image mode: http:/ def _init_(self, mode=None): self.mode = mode def _call_(self, pic): Args: pic (Tensor or numpy.ndarray): Image to be converted to PIL Image

9、. Returns: PIL Image: Image converted to PIL Image. return F.to_pil_image(pic, self.mode)#Normalize类是做数据归一化的,一般都会对输入数据做这样的操作,公式也在注释中给出了,比较容易理解。# 前面提到在调用Normalize的时候,输入得是Tensor,这个从_call_方法的输入也可以看出来了。class Normalize(object): Normalize an tensor image with mean and standard deviation. Given mean: (M1,.

10、,Mn) and std: (S1,.,Sn) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e. inputchannel = (inputchannel - meanchannel) / stdchannel Args: mean (sequence): Sequence of means for each channel. std (sequence): Sequence of standard deviations for each channel. def

11、 _init_(self, mean, std): self.mean = mean self.std = std def _call_(self, tensor): Args: tensor (Tensor): Tensor image of size (C, H, W) to be normalized. Returns: Tensor: Normalized Tensor image. return F.normalize(tensor, self.mean, self.std)#Resize类是对PIL Image做resize操作的,几乎都要用到。# 这里输入可以是int,此时表示将

12、输入图像的短边resize到这个int数,长边则根据对应比例调整,图像的长宽比不变。# 如果输入是个(h,w)的序列,h和w都是int,则直接将输入图像resize到这个(h,w)尺寸,# 相当于force resize,所以一般最后图像的长宽比会变化,也就是图像内容被拉长或缩短。# 注意,在_call_方法中调用了functional.py脚本中的resize函数来完成resize操作,# 因为输入是PIL Image,所以resize函数基本是在调用Image的各种方法。# 如果输入是Tensor,则对应函数基本是在调用Tensor的各种方法,这就是functional.py中的主要内容。

13、class Resize(object): Resize the input PIL Image to the given size. Args: size (sequence or int): Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height width, then image

14、 will be rescaled to (size * height / width, size) interpolation (int, optional): Desired interpolation. Default is PIL.Image.BILINEAR def _init_(self, size, interpolation=Image.BILINEAR): assert isinstance(size, int) or (isinstance(size, collections.Iterable) and len(size) = 2) self.size = size sel

15、f.interpolation = interpolation def _call_(self, img): Args: img (PIL Image): Image to be scaled. Returns: PIL Image: Rescaled image. return F.resize(img, self.size, self.interpolation)class Scale(Resize): Note: This transform is deprecated in favor of Resize. def _init_(self, *args, *kwargs): warni

16、ngs.warn(The use of the transforms.Scale transform is deprecated, + please use transforms.Resize instead.) super(Scale, self)._init_(*args, *kwargs)#CenterCrop是以输入图的中心点为中心点做指定size的crop操作,# 一般数据增强不会采用这个,因为当size固定的时候,在相同输入图像的情况下,N次CenterCrop的结果都是一样的。# 注释里面说明了size为int和序列时候尺寸的定义。class CenterCrop(object)

17、: Crops the given PIL Image at the center. Args: size (sequence or int): Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. def _init_(self, size): if isinstance(size, numbers.Number): self.size = (int(size), int(size) else: self.s

18、ize = size def _call_(self, img): Args: img (PIL Image): Image to be cropped. Returns: PIL Image: Cropped image. return F.center_crop(img, self.size)class Pad(object): Pad the given PIL Image on all sides with the given pad value. Args: padding (int or tuple): Padding on each border. If a single int

19、 is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively. fill: Pixel fill value. Default is 0. If a tu

20、ple of length 3, it is used to fill R, G, B channels respectively. def _init_(self, padding, fill=0): assert isinstance(padding, (numbers.Number, tuple) assert isinstance(fill, (numbers.Number, str, tuple) if isinstance(padding, collections.Sequence) and len(padding) not in 2, 4: raise ValueError(Pa

21、dding must be an int or a 2, or 4 element tuple, not a + element tuple.format(len(padding) self.padding = padding self.fill = fill def _call_(self, img): Args: img (PIL Image): Image to be padded. Returns: PIL Image: Padded image. return F.pad(img, self.padding, self.fill)class Lambda(object): Apply

22、 a user-defined lambda as a transform. Args: lambd (function): Lambda/function to be used for transform. def _init_(self, lambd): assert isinstance(lambd, types.LambdaType) self.lambd = lambd def _call_(self, img): return self.lambd(img)#相比前面的CenterCrop,这个RandomCrop更常用,差别就在于crop时的中心点坐标是随机的,并不是输入图像的中

23、心点坐标,# 因此基本上每次crop生成的图像都是有差异的。# 就是通过 i = random.randint(0, h - th)和 j = random.randint(0, w - tw)两行生成一个随机中心点的横纵坐标。# 注意到在_call_中最后是调用了F.crop(img, i, j, h, w)来完成crop操作,# 其实前面CenterCrop中虽然是调用 F.center_crop(img, self.size),# 但是在F.center_crop()函数中只是先计算了中心点坐标,最后还是调用F.crop(img, i, j, h, w)完成crop操作。class Ra

24、ndomCrop(object): Crop the given PIL Image at a random location. Args: size (sequence or int): Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. padding (int or sequence, optional): Optional padding on each border of the image. De

25、fault is 0, i.e no padding. If a sequence of length 4 is provided, it is used to pad left, top, right, bottom borders respectively. def _init_(self, size, padding=0): if isinstance(size, numbers.Number): self.size = (int(size), int(size) else: self.size = size self.padding = padding staticmethod def

26、 get_params(img, output_size): Get parameters for crop for a random crop. Args: img (PIL Image): Image to be cropped. output_size (tuple): Expected output size of the crop. Returns: tuple: params (i, j, h, w) to be passed to crop for random crop. w, h = img.size th, tw = output_size if w = tw and h

27、= th: return 0, 0, h, w i = random.randint(0, h - th) j = random.randint(0, w - tw) return i, j, th, tw def _call_(self, img): Args: img (PIL Image): Image to be cropped. Returns: PIL Image: Cropped image. if self.padding 0: img = F.pad(img, self.padding) i, j, h, w = self.get_params(img, self.size)

28、 return F.crop(img, i, j, h, w)class RandomHorizontalFlip(object): Horizontally flip the given PIL Image randomly with a probability of 0.5. def _call_(self, img): Args: img (PIL Image): Image to be flipped. Returns: PIL Image: Randomly flipped image. if random.random() 0.5: return F.hflip(img) retu

29、rn imgclass RandomVerticalFlip(object): Vertically flip the given PIL Image randomly with a probability of 0.5. def _call_(self, img): Args: img (PIL Image): Image to be flipped. Returns: PIL Image: Randomly flipped image. if random.random() 0.5: return F.vflip(img) return img#RandomResizedCrop类也是比较

30、常用的,个人非常喜欢用。#前面不管是CenterCrop还是RandomCrop,在crop的时候其尺寸是固定的,而这个类则是random size的crop。# 该类主要用到3个参数:size、scale和ratio,总的来讲就是先做crop(用到scale和ratio),再resize到指定尺寸(用到size)。# 做crop的时候,其中心点坐标和长宽是由get_params方法得到的,# 在get_params方法中主要用到两个参数:scale和ratio,首先在scale限定的数值范围内随机生成一个数,# 用这个数乘以输入图像的面积作为crop后图像的面积;# 然后在ratio限定的数值范围内随机生成一个数,表示长宽的比值,根据这两个值就可以得到c

