1、F3F4SummaryAs is known to all, splicing of paper scraps is a complex issue, which exerts a important role in judicial evidence recovery, restoration of historical documents and access to military intelligence. This paper focuses on splicing problem of paper scraps, establishing shredding distance mo
2、del and restoration TSP model.At same time,we design one-dimensional and multi-dimensional pieces restoration algorithm and then solves it by using MATLAB.For question one, we extract information from the Appendix 1 and 2,designing to recover the one-dimensional shredding algorithms,which is charact
3、erized by a text character size, line spacing structure. And then we can transfor the shredding problem into recovery TSP problem, thus obtaining the correct recovery graphics and sequences. For question two, we firstly standardize shredding pictures from the Appendix 1 and 2, and then extract the n
4、ormalized image-level features. For that pictures can not be classified via machine,we use the developed programs of GUI to improve the efficiency of labor.For question three, three-dimensional design shredding restoration algorithm, a first surface and the surface of Annex 5 b integrate pictures, g
5、et 416 pieces of shredding pictures, the picture also standardized level feature extraction, classification and other operations, will reduced dimensions of the one-dimensional problem and solved to obtain the correct recovery images and sequences of positive and negative Annex 5. Taking into accoun
6、t the problem of quantitative evaluation algorithm, this paper presents minimal intervention model to improve the algorithm in place, that is, through the computer to recognize the order and sequence in reverse order to recover the number of manual intervention to achieve a minimum number of advanta
7、ges and disadvantages of the algorithm is portrayed. Keywords:Reconstruct documents;TSP ;Shredding Distance Model;Shredding Restoration AlgorithmContentI Introduction 2II Symbol Definitions 2III Assumptions and Notations 2 For question one 34.1 Image Preprocessing 34.2 Shredding Feature Extraction 3
8、4.3 Recognition Sequence Based on Text Features 44.4 The Definition of Shredding Distance 54.5 Recovery of TSP 54.6 Simulate Anneal(SA) Algorithm 54.7 One-Dimensional Shredding Restoration Algorithm 64.8 The Solution of Model 6 For question two 75.1 Shredding Standardization And Level Feature Extrac
9、tion 75.2 The Classification of Level Feature 85.3 TwoDimensional Shredding Restoration Algorithm 85.4 The Solution of Model 9 For question three 106.1 Dimensionality Reduction 106.2 ThreeDimensional Shredding Restoration Algorithm 116.3 The Solution of Model 11 Strengths and Weaknesses 127.1 Streng
10、ths 127.2 Weaknesses 12 The Refinement of our Model 138.1 Improved Apply for Colorful Images 138.2 Minimal Intervention Degree Algorithm 13Reference 13Appendix 15Appendix 16Appendix 17Introduction Traditionally, reconstructing shredded documents completed by hand is with higher accuracy, but ineffic
11、iency,especially when a huge amount of complicated work to complete in a short time. With the development of computer technology, people is trying to develop automatic splicing technique for reconstructing documents, as to improve the recovery efficiency of splicing.In addition, this is a kind of st
12、aff which is related to our daily life. The factors to be considered in reality far more than the subject itself, and how to make the model more realistic and provide effective splicing information in this article is a major problem. Faced by lot of information offered and reasonable assumptions for
13、 shredding recovery ,we are able to conduct the research for shredding recovery. Symbol DefinitionsSymbol Definitions Pixel values before binarization The distance between shred A and shred B Left recognition sequence Right recognition sequence Width of characters Total distance of TSP The length of
14、 recognition sequenceAssumptions and Notations For the sake of convenience of the following discussions, we firstly assume that:(1) Text direction is horizontal(2) Positive and negative print margins are in the same format(3) Ignore the efficiency of labor productivity For question one4.1 Image prep
15、rocessingAccording to the relevant knowledge ,we need to process the picture pixels.Generally, the image pixel values are positioned within 0,255, and then are distinguished between blank position and font by setting the threshold. As for non-color pictures,we just need to distinguish blank and non-
16、blank.To make the picture can clearly describe the empty space and the character position, we use MATLAB for preprocessing and put the image into MATLAB as to obtain the corresponding pixel matrix. At last, we make pixel matrix binarization and then have 1, qij = 255 Pij = 255, others 4.2 Shredding
17、feature extractionGenerally speaking,shredding feature extraction is divided into two categories.One is to extract shredding feature by splicing shape features,and the other is characterized by extracting text shredding based on features. According to the problem, the shape of this paper belongs to
18、the second category. Figure 1. One-dimensional shreddingFigure 2.Characters featuresIn summary, the text feature extraction as follows:Step 1: the pictures binarization.text is white, blank is black.Step 2: find all line spacing and empty place of pictures, and mark it as grayStep 3: find out all th
19、e kerning, and mark it as grayStep 4: calculate the character width by spacing, empty, kerning and other features. According to the problem, this paper extracts text feature by importing the image pixels and using MATLAB program 4.3 Recognition sequence based on text featuresThrough the analysis of
20、Chinese characters and English letters, we sign the character width of C. The width is divided in two parts, respectively CR and R, and for not being cut character, still retains the width C.Figure 3. Character segmentationAccording to the definition of character-based segmentation,we construct reco
21、gnition sequences based on the characteristics of the textLeft RightFigure 4. Recognition sequencesFor the recognition sequence in Figure 4, the place with no character position is 0, and the other nodes represent the corresponding character length(for the full C and the incomplete is CR or R).4.4 T
22、he definition of shredding distanceAccording to the definition of recognition sequence, we define the distance between shredding A and B and we get X = 0 or 1From these equations, we know that the greater the degree of agreement of the two recognition sequences, the smaller the distance between two
23、kinds of recognition sequence.Under the conditions, when the two recognition sequences are fully consistent ,the distance will be 0.4.5 Recovery of TSPTSP is one of the most famous problems in graph theory. If we see each of the shredding as a point, there is a distance between points. In essence, w
24、e need to find the smallest total distance path, which is to find an optimal TSP path.So,the recovery of shredding can be abstracted into the recovery of TSP.Therefore, we have the junction S.twhere D is total distance of TSPis distance from i to i+1By solving TSP problem, you can get access to each
25、 point in the sequence, and finally use MATLAB to get original paper.4.6 Simulate Anneal(SA)algorithmSimulated annealing(SA) algorithm is an iterativesolution strategy on the random search algorithm, it is based on the physical annealing process of solid material and the general similarity of combin
26、atorial optimization problems. The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to become unstuck from their initial positions and wa
27、nder randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one. The SA can be described as follows:Step 1. Initialization. Given the scope of model for each parameters, randomly selected an initial sol
28、ution , and calculate the corresponding target value E (); set the initial temperature , final temperature, make a random number (0,1) as a probability threshold, set the cooling function T(+1) = T(), in which, is annealing coefficient, is the number of iterations.Step 2. At a certain T temperature,
29、 make aperturbation x , then a new solution is = + produced, calculate the difference E()=E() E().Step 3. If E(x) 0, is accepted according to probability p = exp(E/T), is a constant and usually taken the value 1. If p , is accepted.When accepted, = Step 4. In a certain temperature, repeat steps 3.St
30、ep 5. Reduce the temperature T by slow cooling function.Step 6. Repeat steps 2 to step 5, until the condition is meet.By using SA to solve TSP, we can regard each sequence as each solution,as to find the optimal scheduling sequence.4.7 One-dimensional shredding restoration algorithmIn summary, through a one-dimensional shredding recovery algorithm,it can automatically recover the one-dimensional shredding.Algorithm steps is as follows: Extracting image pixel matrix
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1