ICLR 2015
https://arxiv.org/pdf/1412.6622.pdf
코멘트: 보통 이미지 분야에서, 전송학습을 위한 이미지 임베딩을 학습하기 위해 쓰는 분류(softmax) 손실 대신, 3개의 이미지의 유사성으로 학습하는 triplet을 분류 데이터에 대해서 테스트한다. 이렇게 학습된 이미지 임베딩에 대해서 SVM 분류만 끼어도 상당히 좋은 성능이 나온다. 테스트한 평가 방식이 와닿지는 않아서, 유사 이미지 평가 데이터를 사용해서 평가해보면 좋을 듯하다.

DEEP METRIC LEARNING USING TRIPLET NETWORK

Deep learning has proven itself as a successful set of models for learning useful semantic representations of data.
딥 학습은 유용한 의미론적 데이터 표현을 학습하기 위한 성공적인 모델 집합으로 입증되었습니다.

These, however, are mostly implicitly learned as part of a classification task.
그러나 이들은 분류 작업의 일부로 대부분 암묵적으로 학습됩니다.

In this paper we propose the triplet network model, which aims to learn useful representations by distance comparisons.
본 논문에서는 거리 비교에 의한 유용한 표현을 배우는 것을 목표로 triplet 모델을 제안한다.

A similar model was defined by Wang et al, (2014), tailor made for learning a ranking for image information retrieval, Here we demonstrate using various datasets that our model learns a better representation than that of its immediate competitor, the Siamese network.
유사한 모델이 Wang et al, (2014)에 의해 정의되었는데, 이미지 정보 검색 순위를 배우기 위해 맞춤형으로 제작되었습니다. 여기에서는 우리 모델이 바로 경쟁자 인 샴 네트워크보다 더 나은 표현을 배우는 다양한 데이터 세트를 사용하여 시연합니다.

We also discuss future possible usage as a framework for unsupervised learning.
우리는 또한 미래의 가능한 사용법을 unsupervised 학습을위한 틀로서 논의합니다.

1. Introduction

For the past few years, deep learning models have been used extensively to solve various machine learning tasks.
지난 몇 년 동안 심층 학습 모델은 다양한 기계 학습 과제를 해결하기 위해 광범위하게 사용되었습니다.

One of the underlying assumptions is that deep, hierarchical models such as convolutional networks create useful representation of data (Bengio (2009); Hinton (2007)), which can then be used to distinguish between available classes.
기본 가정 중 하나는 콘볼루션 네트워크와 같은 심층적인 계층적 모델이 유용한 데이터 표현 (Bengio (2009); Hinton (2007))을 만들어 사용 가능한 클래스를 구별하는 데 사용할 수 있다는 것입니다.

This quality is in contrast with traditional approaches requiring engineered features extracted from data and then used in separate learning schemes.
이러한 품질은 데이터에서 추출된 엔지니어링된 피쳐를 필요로 하는 기존의 접근 방식과는 대조적으로 별도의 학습 방식으로 사용됩니다.

Features extracted by deep networks were also shown to provide useful representation (Zeiler & Fergus (2013a); Sermanet et al, (2013)) which can be, in turn, successfully used for other tasks (Razavian et al, (2014)).
깊은 네트워크에 의해 추출된 특징은 다른 작업 (Razavian et al, (2014))에서 성공적으로 사용될 수있는 유용한 표현 (Zeiler & Fergus (2013a), Sermanet 외, (2013))을 제공하는 것으로 나타났다.

Despite their importance, these representations and their corresponding induced metrics are often treated as side effects of the classification task, rather than being explicitly sought.
그들의 중요성에도 불구하고, 이러한 표현과 이에 상응하는 유도된 측정 지표는 명시적으로 추구되기보다는 분류 작업의 부수효과로 간주되는 경우가 많습니다.

There are also many interesting open question regarding the intermediate representations and their role in disentangling and explaining the data (Bengio (2013)).
중간 표상에 대한 많은 흥미로운 질문과 데이터의 풀림과 설명에 대한 역할 (Bengio (2013))이 있습니다.

Notable exceptions where explicit metric learning is preformed are the Siamese Network variants (Bromley et al, (1993); Chopra et al, (2005); Hadsell et al, (2006)), in which a contrastive loss over the metric induced by the representation is used to train the network to distinguish between similar and dissimilar pairs of examples.
명백한 메트릭 학습이 수행되는 주목할만한 예외는 Siamese Network 변종 (Bromley 외 (1993), Chopra 외 (2005), Hadsell 외 (2006))입니다. 표현은 유사하고 다른 쌍의 예제를 구별하기 위해 네트워크를 교육하는 데 사용됩니다.

A contrastive loss favours a small distance between pairs of examples labeled as similar, and large distances for pairs labeled dissimilar.
대조 손실은 비슷한 것으로 표시된 쌍의 예제 사이의 작은 거리와 비슷하지 않은 쌍의 큰 거리를 선호합니다.

However, the representations learned by these models provide sub-par results when used as features for classification, compared with other deep learning models including ours.
그러나 이 모델을 통해 얻은 표현은 우리를 비롯한 다른 심층 학습 모델과 비교할 때 분류의 피쳐로 사용될 때 좋지 않은 결과를 제공합니다.

Siamese networks are also sensitive to calibration in the sense that the notion of similarity vs dissimilarity requires context.
Siamese 네트워크는 또한 유사성과 비 유사성이라는 개념이 문맥을 필요로 한다는 점에서 보정에 민감합니다.

For example, a person might be deemed similar to another person when a dataset of random objects is provided, but might be deemed dissimilar with respect to the same other person when we wish to distinguish between two individuals in a set of individuals only.
예를 들어, 무작위 객체의 데이터 세트가 제공 될 때 사람은 다른 사람과 비슷하다고 간주 될 수 있지만, 개인의 집합에 속한 두 사람을 구별하기를 원할 때 동일한 다른 사람과는 다른 것으로 간주 될 수 있습니다.

In our model, such a calibration is not required.
우리 모델에서는 이러한 교정이 필요하지 않습니다.

In fact, in our experiments here, we have experienced hands on the difficulty in using Siamese networks.
실제로, 우리의 실험에서 우리는 샴 네트워크를 사용하는 데 어려움을 겪었습니다.

We follow a similar task to that of Chechik et al, (2010).
Chechik 외 (2010)와 비슷한 작업을 수행합니다.

For a set of samples and a chosen rough similarity measure given through a training oracle (e.g how close are two images of objects semantically) we wish to learn a similarity function induced by a normed metric.
트레이닝 오라클을 통해 주어지는 샘플 와 선택된 대략적인 유사성 측정치 (예를 들어, 객체의 2개의 이미지가 얼마나 의미론적으로 얼마나 가깝게 있는가)에 대해 우리는 노름된 메트릭에 의해 유도된 유사 함수 를 배우기를 원한다.

Unlike Chechik et al, (2010)’s work, our labels are of the form r(x, x1) > r(x, x2) for triplets of objects.
Chechik 외 (2010)의 연구와는 달리, 우리의 레이블은 객체의 쌍에 대한 형태입니다.

Alt text

Accordingly, we try to fit a metric embedding and a corresponding similarity function satisfying:.
따라서 우리는 metric embedding과 이에 상응하는 similarity function을 만족 시키려고 노력한다.

In our experiment, we try to find a metric embedding of a multi-class labeled dataset.
실험에서 다중 클래스 레이블이 지정된 데이터 세트의 메트릭 임베딩을 찾으려고합니다.

We will always take to be of the same class as and of a different class, although in general more complicated choices could be made.

Accordingly, we will use the notation and instead of , .

We focus on finding an embedding, by learning a function for which .

Inspired from the recent success of deep learning, we will use a deep network as our embedding function F(x).
최근의 깊은 학습의 성공에서 영감을 얻어 우리는 우리의 임베딩 함수 F(x)로서 깊은 네트워크를 사용할 것입니다.

We call our approach a triplet network.
우리는 우리의 접근법을 트리플렛 네트워크라고 부릅니다.

A similar approach was proposed in Wang et al, (2014) for the purpose of learning a ranking function for image retrieval, Compared with the single application proposed in Wang et al, (2014), we make a comprehensive study of the triplet architecture which is, as we shall argue below, interesting in and of itself.
Wang 등 (2014)은 이미지 검색을 위한 순위 함수를 학습하기 위해 유사한 접근법을 제안했다. Wang 외 (2014)에서 제안된 단일 응용 프로그램과 비교할 때, 우리는 다음과 같은 삼중 아키텍처를 포괄적으로 연구한다. 우리가 아래에서 논 하겠지만, 그 자체로 흥미 롭습니다.

In fact, we shall demonstrate below that the triplet approach is a strong competitor to the Siamese approach, its most obvious competitor.
사실, 우리는 triplet 접근 방식이 가장 명백한 경쟁자인 Siamese 접근 방식에 대한 강력한 경쟁자라는 점을 아래에서 입증해야합니다.

2 THE TRIPLET NETWORK

A Triplet network (inspired by "Siamese network") is comprised of 3 instances of the same feedforward network (with shared parameters).
Triplet 네트워크 ( “Siamese 네트워크"에서 영감을 얻음)는 동일한 피드 포워드 네트워크 (공유 매개 변수 포함)의 3 가지 인스턴스로 구성됩니다.

When fed with 3 samples, the network outputs 2 intermediate values - the L2 distances between the embedded representation of two of its inputs from the representation of the third.
3 개의 샘플이 공급되면 네트워크는 2 개의 중간 값을 출력합니다. 즉, 두 번째 입력의 임베딩 표현과 세 번째 표현의 임베딩 표현 간의 L2 거리입니다.

If we will denote the 3 inputs as , and , and the embedded representation of the network as , the one before last layer will be the vector:.
3 개의 입력을 , 및 로 표시하고 네트워크의 임베드된 표현을 Net (x)로 표시하면 마지막 레이어 앞에 있는 입력이 벡터가 됩니다.

Alt text

In words, this encodes the pair of distances between each of and against the reference x.

In words, this encodes the pair of distances between each of and against the reference x.
즉, 이것은 와 사이의 거리 쌍을 참조 x에 대해 인코딩합니다.

Alt text

2.1 TRAINING

Training is preformed by feeding the network with samples where, as explained above, x and x + are of the same class, and x − is of different class.
위에서 설명한대로 x와 x +가 같은 클래스이고 x가 다른 클래스 인 샘플을 네트워크에 공급하여 교육을 수행합니다.

The network architecture allows the task to be expressed as a 2-class classification problem, where the objective is to correctly classify which of x + and x − is of the same class as x.
네트워크 아키텍처는 태스크를 2 클래스 분류 문제로 표현할 수 있게 해줍니다. 여기서 목표는 x와 x가 같은 클래스인지 정확하게 분류하는 것입니다.

We stress that in a more general setting, where the objective might be to learn a metric embedding, the label determines which example is closer to x.
보다 일반적인 설정에서 목표가 메트릭 삽입을 학습하는 것이면 레이블은 어떤 예제가 x에 더 가까운 지 결정합니다.

Here we simply interpret “closeness" as “sharing the same label".
여기서 우리는 단순히 “친밀감"을 “같은 레이블 공유"로 해석합니다.

In order to output a comparison operator from the model, a SoftMax function is applied on both outputs - effectively creating a ratio measure.
모델로부터 비교 연산자를 출력하기 위해 SoftMax 함수가 두 출력에 적용되므로 효과적으로 비율 측정을 생성합니다.

Similarly to traditional convolutional-networks, training is done by simple SGD on a negative-loglikelihood loss with regard to the 2-class problem.
전통적인 convolutional 네트워크와 마찬가지로 training은 2-class 문제와 관련하여 negative-loglikelihood loss에 대한 간단한 SGD로 수행됩니다.

We later examined that better results are achieved when the loss function is replaced by a simple MSE on the soft-max result, compared to the (0, 1) vector, so that the loss is.
우리는 손실 함수가 (0, 1) 벡터에 비해 soft-max 결과에서 간단한 MSE로 대체 될 때 더 나은 결과가 얻어 졌음을 후에 검사했다. 그래서 losss는 다음과 같다.

Alt text

We note that which is the required objective.

By using the same shared parameters network, we allow the back-propagation algorithm to update the model with regard to all three samples simultaneously.
동일한 공유 매개 변수 네트워크를 사용하여 역 전파 알고리즘이 세 가지 샘플 모두에 대해 모델을 동시에 업데이트 할 수 있도록 합니다.

3 TESTS AND RESULTS

The Triplet network was implemented and trained using the Torch7 environment (Collobert et al, (2011)).
Triplet 네트워크는 Torch7 환경을 사용하여 구현 및 교육되었습니다 (Collobert 외, (2011)).

3.1 DATASETS

We experimented with 4 datasets.
우리는 4 개의 데이터 세트를 실험했습니다.

The first is Cifar10 (Krizhevsky & Hinton (2009)), consisting of 60000 32x32 color images of 10 classes (of which 50000 are used for training only, and 10000 for test only).
첫 번째는 Cifar10 (Krizhevsky & Hinton (2009))으로, 10 개의 클래스로 이루어진 60000 개의 32x32 컬러 이미지로 구성됩니다 (이 중 50000 개는 학습에만 사용되며 10000은 테스트 용).

The second dataset is the original MNIST (LeCun et al, (1998)) consisting of 60000 28x28 gray-scale images of handwritten digits 0-9, and a corresponding set of 10000 test images.
두 번째 데이터 세트는 원래의 MNIST (LeCun et al (1998))로, 손으로 쓴 자릿수 0-9의 60000 28x28 그레이 스케일 이미지와 10000 테스트 이미지의 해당 세트로 구성됩니다.

The third is the Street-View-House-Numbers (SVHN) of Netzer et al, consisting of 600000 32x32 color images of house-number digits 0-9.
세 번째는 Netzer 외의 Street-View-House-Numbers (SVHN)로 집 번호 자릿수 0-9의 600000 32x32 컬러 이미지로 구성됩니다.

The fourth dataset is STL10 of Coates et al, (2011), similar to Cifar10 and consisting of 10 object classes, only with 5000 training images (instead of 50000 in Cifar) and a bigger 96x96 image size.
네 번째 데이터 세트는 Cifar10과 비슷한 Coates 외 (2011)의 STL10이며 10 개의 개체 클래스로 구성되며 5000 개의 교육 이미지 (Cifar에서는 50000 대신)와 더 큰 96x96 이미지 크기로 구성됩니다.

It is important to note that no data augmentation or whitening was applied, and the only preprocessing was a global normalization to zero mean and unit variance.
데이터 오그먼테이션 또는 화이트닝은 적용되지 않았으며 유일한 전처리는 제로 평균 및 단위 분산에 대한 전역 정규화였습니다.

Each training instance (for all four datasets) was a uniformly sampled set of 3 images, 2 of which are of the same class (x and x +), and the third (x −) of a different class.
각 교육 인스턴스 (4 개 데이터 세트 모두)는 균등하게 샘플링 된 3 개의 이미지 집합으로 2 개는 동일한 클래스 (x 및 x +)이고 다른 하나는 다른 클래스 (x -)입니다.

Each training epoch consisted of 640000 such instances (randomly chosen each epoch), and a fixed set of 64000 instances used for test.
각 교육 기간은 640000 개의 인스턴스 (무작위로 각 epoch로 선택됨)와 테스트에 사용된 64000 인스턴스의 고정 된 세트로 구성됩니다.

We emphasize that each test instance involves 3 images from the set of test images which was excluded from training.
우리는 각 테스트 인스턴스가 교육에서 제외된 테스트 이미지 세트의 3 개의 이미지를 포함한다고 강조합니다.

3.2 THE EMBEDDING NET

For Cifar10 and SVHN we used a convolutional network, consisting of 3 convolutional and 2x2 max-pooling layers, followed by a fourth convolutional layer.
Cifar10 및 SVHN의 경우 convolutional 네트워크를 사용했으며, 3 개의 컨볼 루션 및 2x2 최대 풀링 레이어와 4 번째 convolutional 레이어로 구성되었습니다.

A ReLU non-linearity is applied between two consecutive layers.
ReLU 비선형성은 두 개의 연속된 레이어간에 적용됩니다.

Network configuration (ordered from input to output) consists of filter sizes {5,3,3,2}, and feature map dimensions {3,64,128,256,128} where a 128 vector is the final embedded representation of the network.
네트워크 구성 (입력에서 출력 순서)은 필터 크기 {5,3,3,2}와 피쳐 맵 크기 {3,64,128,256,128}로 구성되며, 여기서 128 벡터는 네트워크의 최종 embedded 표현입니다.

Usually in convolutional networks, a subsequent fully-connected layer is used for classification.
일반적으로 convolutional 네트워크에서는 후속 완전 연결 계층이 분류에 사용됩니다.

In our net this layer is removed, as we are interested in a feature embedding only.
우리의 네트에서이 레이어는 제거되었습니다. 우리는 오직 feature embedding에만 관심이 있습니다.

The network for STL10 is identical, only with stride=3 for the first layer, to allow the bigger input size.
STL10의 네트워크는 첫 번째 계층에 대해 stride = 3으로만 더 큰 입력 크기를 허용하면서 동일합니다.

The network used for MNIST was a smaller version consisting of smaller feature map sizes {1,32,64,128}.
MNIST에 사용 된 네트워크는 더 작은 feature 맵 크기 {1,32,64,128}로 구성된 더 작은 버전이었습니다.

3.3 RESULTS

Training on all datasets was done by SGD, with initial learning-rate of 0.5 and a learning rate decay regime.
모든 데이터 세트에 대한 교육은 초기 학습률 0.5와 learning rate decay regime으로 SGD에서 수행했습니다.

We used a momentum value of 0.9.
우리는 0.9의 momentum 값을 사용했습니다.

We also used the dropout regularization technique with p = 0.5 to avoid over-fitting.
또한 over-fitting을 피하기 위해 p = 0.5 인 드롭 아웃 regularization 기법을 사용했습니다.

After training on each dataset for 10-30 epochs, the network reached a fixed error over the triplet comparisons.
10-30 epochs에 대한 각 데이터 세트에 대해 교육을 한 후에 네트워크는 triplet 비교에 대한 고정 오류에 도달했습니다.

We then used the embedding network to extract features from the full dataset, and trained a simple 1-layer network model on the full 10-class classification task (using only training set representations).
그런 다음 embedding 네트워크를 사용하여 전체 데이터 세트에서 features을 추출하고 교육 세트 표현 만 사용하여 전체 10 클래스 분류 작업에서 간단한 1 계층 네트워크 모델을 교육했습니다.

The test set was then measured for accuracy.
그런 다음 테스트 세트의 정확성을 측정했습니다.

These results (Figure 2) are comparable to state-of-the-art results with deep learning models, without using any artificial data augmentation (Zeiler & Fergus (2013b); Goodfellow et al, (2013); Lin et al, (2013)).
이러한 결과 (그림 2)는 인공적인 데이터 augmentation (Zeiler & Fergus (2013b), Goodfellow 외 (2013), Lin 외 (2013)를 사용하지 않고 심층 학습 모델을 사용한 최첨단 결과와 유사합니다 )).

Alt text

Noteworthy is the STL10 dataset, in which the TripletNet achieved the best known result for non-augmented data.
TripletNet이 비 확장 데이터에 대해 가장 잘 알려진 결과를 얻은 STL10 데이터 세트가 주목할만한 점입니다.

We conjecture that data augmentation techniques (such as translations, mirroring and noising) may provide similar benefits to those described in previous works.
우리는 데이터 augmentation 기법 (예 : 번역, 미러링 및 노이즈)이 이전 연구에서 설명한 것과 유사한 이점을 제공 할 것이라고 추측합니다.

We also note that similar results are achieved when the embedded representations are classified using a linear SVM model or KNN classification with up to 0.5% deviance from the results in Figure 2.
임베디드 표현이 선형 SVM 모델 또는 KNN 분류를 사용하여 그림 2의 결과와 최대 0.5 %의 편차로 유사한 결과가 달성된다는 점에 주목합니다.

Another side-affect noticed, is that the representation seems to be sparse - about 25% non-zero values.
또 다른 부수효과는 표현이 약 25 %의 0이 아닌 값으로 희소하게 된다는 것입니다.

This is very helpful when used later as features for classification both computationally and with respect to accuracy, as each class is characterised by only a few non zero elements.
이것은 나중에 각 클래스가 소수의 0이 아닌 요소로 특징 지어지기 때문에 계산과 정밀도를 구분하는 기능으로 나중에 사용될 때 매우 유용합니다.

3.4 2D VISUALIZATION OF FEATURES

In order to examine our main premise, which is that the network embeds the images into a representation with meaningful properties, we use PCA to project the embedding into 2d euclidean space which can be easily visualized (figures 5 4 5).
네트워크가 의미있는 특성을 가진 표현으로 이미지를 embeds한다는 우리의 주요 전제를 조사하기 위해, 우리는 PCA를 사용하여 쉽게 시각화 할 수있는 2차원 유클리드 공간으로 임베딩을 투영합니다 (그림 5 4 5).

Alt text

We can see a significant clustering by semantic meaning, confirming that the network is useful in embedding images into the euclidean space according to their content.
우리는 의미론적 의미로 중요한 클러스터링을 볼 수 있으며, 네트워크가 내용에 따라 유클리드 공간에 이미지를 삽입하는 데 유용하다는 것을 확인할 수 있습니다.

Similarity between objects can be easily found by measuring the distance between their embedding and, as shown in the results, can reach high classification accuracy using a simple subsequent linear classifier.
물체 간의 유사성은 그 embedding 사이의 거리를 측정함으로써 쉽게 발견 될 수 있으며 결과에 나타난 바와 같이 간단한 후속 선형 분류기를 사용하여 높은 분류 정확도에 도달 할 수 있습니다.

3.5 COMPARISON WITH PERFORMANCE OF THE SIAMESE NETWORK

The Siamese network is the most obvious competitor for our approach.
Siamese 네트워크는 우리의 접근 방식에서 가장 명백한 경쟁자입니다.

Our implementation of the Siamese network consisted of the same embedding network, but with the use of a contrastive loss between a pair of samples, instead of three (as explained in Chopra et al, (2005)).
Siamese 네트워크의 구현은 동일한 임베딩 네트워크로 구성되었지만 세 개의 샘플 대신 한 쌍의 샘플간에 대조적 인 손실을 사용했습니다 (Chopra 외 (2005)에서 설명 됨).

The generated features were then used for classification using a similar linear model as was used for the TripletNet method.
생성 된 피쳐는 TripletNet 방법에 사용 된 것과 유사한 선형 모델을 사용하여 분류에 사용되었습니다.

We measured lower accuracy on the MNIST dataset compared to results gained using the TripletNet representations 2.
우리는 TripletNet 표현을 사용하여 얻은 결과와 비교하여 MNIST 데이터 세트에서 낮은 정확도를 측정했습니다 2.

We have tried a similar comparison for the other three datasets, but unfortunately could not obtain any meaningful result using a Siamese network.
우리는 다른 세 가지 데이터 세트와 비슷한 비교를 시도했지만, 불행히도 샴 네트워크를 사용하여 의미있는 결과를 얻을 수 없었습니다.

We conjecture that this might be related to the problem of context described above, and leave the resolution of this conjecture to future work.
우리는 이것이 위에서 설명한 문맥의 문제와 관련이 있다고 추측하고 이 추측의 해결책을 미래의 작업으로 남겨 둡니다.

4 FUTURE WORK

As the Triplet net model allows learning by comparisons of samples instead of direct data labels, usage as an unsupervised learning model is possible.
Triplet 네트 모델은 직접 데이터 레이블 대신 샘플을 비교하여 학습 할 수 있으므로 감독되지 않은 학습 모델로 사용할 수 있습니다.

Future investigations can be performed in several scenarios:.
향후 조사는 여러 시나리오에서 수행 할 수 있습니다.

• Using spatial information. Objects and image patches that are spatially near are also expected to be similar from a semantic perspective. Therefore, we could use geometric distance between patches of the same image as a rough similarity oracle , in an unsupervised setting.

Using temporal information. The same is applicable to time domain, where two consecutive video frames are expected to describe the same object, while a frame taken 10 minutes later is less likely to do so. Our Triplet net may provide a better embedding and improve on past attempts in solving classification tasks in an unsupervised environment, such as that of (Mobahi et al, (2009)).

It is also well known that humans tend to be better at accurately providing comparative labels.
인간이 비교 라벨을 정확하게 제공하는 경향이 있다는 것도 잘 알려져 있습니다.

Our framework can be used in a crowd sourcing learning environment.
우리의 프레임 워크는 군중 소싱 학습 환경에서 사용될 수 있습니다.

This can be compared with Tamuz et al, (2011), who used a different approach.
이것은 다른 접근법을 사용한 Tamuz 외 (2011)와 비교 될 수 있습니다.

Furthermore, it may be easier to collect data trainable on a Triplet network, as comparisons over similarity measures are much easier to attain (pictures taken at the same location, shared annotations, etc).
또한 유사성 측정에 대한 비교가 훨씬 쉽기 때문에 (같은 위치에서 찍은 사진, 공유 된 주석 등) Triplet 네트워크에서 학습 가능한 데이터를 수집하는 것이 더 쉽습니다.

5 CONCLUSIONS

In this work we introduced the Triplet network model, a tool that uses a deep network to learn useful representation explicitly.
이 작업에서 우리는 Triplet 네트워크 모델을 소개했습니다.이 모델은 딥 네트워크를 사용하여 유용한 표현을 명시적으로 학습하는 도구입니다.

The results shown on various datasets provide evidence that the representations that were learned are useful to classification in a way that is comparable with a network that was trained explicitly to classify samples.
다양한 데이터 세트에 표시된 결과는 배운 표현이 명시적으로 샘플을 분류하도록 훈련된 네트워크와 비교 가능한 방식으로 분류하는 데 유용하다는 증거를 제공합니다.

We believe that enhancement to the embedding network such as Network-in-Network model (Lin et al, (2013)), Inception models (Szegedy et al, (2014)) and others can benefit the Triplet net similarly to the way they benefited other classification tasks.
Network-in-Network 모델 (Lin et al (2013)), Inception 모델 (Szegedy 외, (2014)) 같은 임베딩 네트워크로 확장을 믿습니다. 그리고 다른 분류 태스크에 이익을 주는 방법과 비슷하게 트리플렛에도 이익을 줄 수 있을 것입니다.

Considering the fact that this method requires to know only that two out of three images are sampled from the same class, rather than knowing what that class is, we think this should be inquired further, and may provide us insights to the way deep networks learn in general.
이 방법은 3 개의 이미지 중 2 개의 이미지가 같은 클래스에서 샘플링된다는 것을 알 필요가 있다는 사실을 고려하면 이 클래스가 무엇인지 알기보다는 이를 깊이 조사해야하며 깊은 네트워크를 학습하는 방법에 대한 통찰력을 제공 할 수 있습니다

We have also shown how this model learns using only comparative measures instead of labels, which we can use in the future to leverage new data sources for which clear out labels are not known or do not make sense (e.g hierarchical labels).
일반적으로 우리는 이 모델이 레이블 대신 비교 측정 만 사용하여 학습하는 방법을 보여주었습니다.이 모델은 장래에 명확한 레이블이 알려져 있지 않거나 이해가되지 않는 새로운 데이터 원본 (예 : 계층 레이블)을 활용하기 위해 사용할 수 있습니다.

'머신러닝 딥러닝 번역' 카테고리의 다른 글

콘볼루션 넷: 모듈 관점 (Conv Nets: A Modular Perspective) (0)	2016.08.15
딥러닝, NLP, 표현(Deep Learning, NLP, and Representations) (0)	2016.08.14
딥러닝 챗봇 , PART 2 – IMPLEMENTING A RETRIEVAL-BASED MODEL IN TENSORFLOW(한글번역) (0)	2016.08.14
딥러닝 챗봇, PART 1 – INTRODUCTION (한글번역) (1)	2016.08.14
IMPLEMENTING A CNN FOR TEXT CLASSIFICATION IN TENSORFLOW (한글 번역) (0)	2016.08.14

Maching Learning 덕질

(번역) DEEP METRIC LEARNING USING TRIPLET NETWORK

DEEP METRIC LEARNING USING TRIPLET NETWORK

1. Introduction